<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Scraping Google Groups</title>
	<atom:link href="http://saturnboy.com/2010/03/scraping-google-groups/feed/" rel="self" type="application/rss+xml" />
	<link>http://saturnboy.com/2010/03/scraping-google-groups/</link>
	<description>Code, Work, and Life</description>
	<lastBuildDate>Thu, 10 May 2012 17:13:39 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: justin</title>
		<link>http://saturnboy.com/2010/03/scraping-google-groups/comment-page-1/#comment-338</link>
		<dc:creator>justin</dc:creator>
		<pubDate>Fri, 21 Oct 2011 17:35:28 +0000</pubDate>
		<guid isPermaLink="false">http://saturnboy.com/?p=1105#comment-338</guid>
		<description>@Neal: Best would be if Google made a Google Groups API that allowed you to get forum data back out.  Until they do that, everything is basically a big hack (like my scraping code posted above).</description>
		<content:encoded><![CDATA[<p>@Neal: Best would be if Google made a Google Groups API that allowed you to get forum data back out.  Until they do that, everything is basically a big hack (like my scraping code posted above).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neal McBurnett</title>
		<link>http://saturnboy.com/2010/03/scraping-google-groups/comment-page-1/#comment-337</link>
		<dc:creator>Neal McBurnett</dc:creator>
		<pubDate>Thu, 20 Oct 2011 17:14:47 +0000</pubDate>
		<guid isPermaLink="false">http://saturnboy.com/?p=1105#comment-337</guid>
		<description>Thanks - I love it when people help liberate data!

My notion is that it would be good to scrape it into a standard data format, e.g. the ATOM syndication standard from the IETF: 

http://tools.ietf.org/html/rfc4287

Then people who write forum and blog software could support imports from ATOM, and this could be come a really easy thing to do.

For some archives, I guess ATOM might not be able to directly represent everything, so  additions to the schema might be necessary, but I haven&#039;t looked at that.

Would that make sense?  Do you know how what you produced differs from ATOM?</description>
		<content:encoded><![CDATA[<p>Thanks &#8211; I love it when people help liberate data!</p>
<p>My notion is that it would be good to scrape it into a standard data format, e.g. the ATOM syndication standard from the IETF: </p>
<p><a href="http://tools.ietf.org/html/rfc4287" rel="nofollow">http://tools.ietf.org/html/rfc4287</a></p>
<p>Then people who write forum and blog software could support imports from ATOM, and this could be come a really easy thing to do.</p>
<p>For some archives, I guess ATOM might not be able to directly represent everything, so  additions to the schema might be necessary, but I haven&#8217;t looked at that.</p>
<p>Would that make sense?  Do you know how what you produced differs from ATOM?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

