<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Query7 &#187; Programming</title>
	<atom:link href="http://query7.com/tag/programming/feed" rel="self" type="application/rss+xml" />
	<link>http://query7.com</link>
	<description>PHP, Javascript, Python and Web Development</description>
	<lastBuildDate>Sat, 25 Jun 2011 21:29:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Scrape the First Paragraph &amp; Image from a Wikipedia Entry</title>
		<link>http://query7.com/scrape-the-first-paragraph-image-from-a-wikipedia-entry</link>
		<comments>http://query7.com/scrape-the-first-paragraph-image-from-a-wikipedia-entry#comments</comments>
		<pubDate>Mon, 26 Jul 2010 14:10:40 +0000</pubDate>
		<dc:creator>Cary F</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Wikipedia]]></category>

		<guid isPermaLink="false">http://www.webdevelopmentbits.com/?p=652</guid>
		<description><![CDATA[Automate fetching Wikipedia descriptions and images for webpage content. Render content dynamically based on specific keywords.]]></description>
			<content:encoded><![CDATA[<p>by Kannan Ramakrishnan</p>
<p>One day, while you&#8217;re organizing the content for your latest and greatest website, you may find yourself wishing for an automated way to fetch a description for some of that content.  Perhaps an image, too, to pull into a header or elsewhere on your page.  And since you’ve got 100 other things on your plate, you’d like this content to be implemented automatically, dynamically appearing on certain pages based on a few keywords.  For novice Web developers, this sounds impossible, I know.  But… it’s totally doable.</p>
<p>So off you go in search of good, usable, public content.  Before long you’ll probably realize it yourself, but I’ll save you an hour of digging with this quick pro tip: Wikipedia is your friend. The bottomless content at Wikipedia is a perfect match for what we want to do.  The caveat, though, is it’s so vast we might scrape the wrong content.  And especially if you’re automating a task like this, constantly looking over your shoulder to confirm the content would defeat the entire purpose.  So relying <em>soley </em>on the Wiki gremlins is not the best way to go.</p>
<p><a href="http://query7.com/wp-content/uploads/250px-Red_Apple3.jpg"><img class="aligncenter size-full wp-image-663" title="250px-Red_Apple" src="http://query7.com/wp-content/uploads/250px-Red_Apple3.jpg" alt="" width="252" height="229" /></a></p>
<p>Let’s take for example a Wiki search for a popular technology company, say, <a href="http://www.apple.com">Apple</a>.  For a Web-savvy person like yourself, a Wiki search for ‘Apple’ turns up what you probably expect: the entry for <a href="http://en.wikipedia.org/wiki/Apple">a scrumptious kind of fruit</a>.  And as Web developers, we <em>should </em>be more astute in our search terms, but sometimes yes, well, there are those gremlins.</p>
<p>So&#8230; to cut the risk of unwanted fruit creeping onto your page, we’re going to combine a Wiki search with a Google search.</p>
<p>Following is the code for scraping the first paragraph and image of the entry from a page in Wikipedia.  As long as your keywords aren’t really crazy, this should get the job done!</p>
<p>Finally, make sure you don&#8217;t forget to credit Wikipedia on your page&#8230; and if you have any questions, <a href="mailto:webdevelopmentbits@sourcebits.com">drop us a line here</a> any time!!</p>
<p>Here&#8217;s the code:</p>
<pre>require  'hpricot'
require 'open-uri'

def fetch_description(query_item)
    page_title, uri_title = get_wiki_name(query_item)
    return get_wiki_description(page_title, uri_title)
end

def upload_photo(wiki_photo)
    begin
      base_uri = URI.parse(wiki_photo)
      uploaded_data = open(base_uri)
      def uploaded_data.original_filename; base_uri.path.split('/').last; end
      return uploaded_data.original_filename.blank? ? nil : uploaded_data
    rescue
      return nil
    end
end

#Method to fetch wiki page and strip first two

 Tags
def get_wiki_description(page_title, uri_title)
    url =  uri_title
    final_content = ""
    if url.size &gt; 10
      buffer = Hpricot(open(url, "UserAgent" =&gt; "reader"+rand(10000).to_s).read)
      #Capture first two paragraphs of text
      content = buffer.search("//div[@id='content']").search("//div[@id='bodyContent']").search("//p")[0..2]

      #Remove the extra spaces and strip html tags from the fetched content
      content.each do |c|
        final_content+=c.inner_html.gsub(/&lt; \/?[^&gt;]*&gt;/, '').gsub(/&amp;#\d+;/,'').gsub(/\([^\)]+\)/,'').gsub(/\[[^\]]+\]/,'').gsub(/ +/,' ')+"\n"
      end
    end
    return final_content
end

#Method to get the link for wikipedia from google search results
def get_wiki_name(query_item)
    search_keywords = query_item.strip.gsub(/\s+/,'+')
    url = "http://www.google.com/search?q=#{search_keywords}+site%3Aen.wikipedia.org"
    begin
      doc = Hpricot(open(url, "UserAgent" =&gt; "reader"+rand(10000).to_s).read)
      result = doc.search("//div[@id='ires']").search("//li[@class='g']").first.search("//a").first unless doc
    rescue
      return '',''
    end
    if result
      return result.inner_html.gsub(/&lt; \/?[^&gt;]*&gt;/,"").gsub(/./,""),result.attributes["href"]
    else
      return '',''
    end
end

wiki_description, wiki_photo = fetch_description("Apple")
upload_photo(wiki_photo)</pre>
<p>And this is how it looks implemented live, in context:</p>
<p><a href="http://query7.com/wp-content/uploads/smallerScreen-shot-2010-07-26-at-4.50.03-PM.png"><img class="aligncenter size-full wp-image-666" title="smallerScreen shot 2010-07-26 at 4.50.03 PM" src="http://query7.com/wp-content/uploads/smallerScreen-shot-2010-07-26-at-4.50.03-PM.png" alt="" width="500" height="245" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://query7.com/scrape-the-first-paragraph-image-from-a-wikipedia-entry/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FLOW3 is arriving</title>
		<link>http://query7.com/flow3-is-arriving</link>
		<comments>http://query7.com/flow3-is-arriving#comments</comments>
		<pubDate>Fri, 03 Apr 2009 12:01:45 +0000</pubDate>
		<dc:creator>Ramses Paiva</dc:creator>
				<category><![CDATA[Web Development]]></category>
		<category><![CDATA[Frameworks]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.webdevelopmentbits.com/?p=464</guid>
		<description><![CDATA[<p>Not yet released, FLOW3 starts making noises on the mass: what&#8217;s TYPO3 up to?</p>
<p>As a result of the already proven TYPO3 CMS, the upcoming 5th version of the system is bringing a solid PHP framework, which can be used apart from the whole system for developing applications of any kind.</p>
<p>The FLOW3 subsite stats, against what&#8217;s commonly seen out there, FLOW3 is not a pick&#8217;n'mix store of motley components. It&#8217;s a framework which helps you with the infrastructure of your application. Object Lifecycle Management, Package Management, Resource Management and Security are on his home field. Real business logic is left over to third-party packages.</p>
<p>All the most common features we can find in other PHP frameworks are going to be provided by FLOW3, like MVC architecture, Validation, Filters, Persistence Object Manager and much more.</p>
<p>Next week, I&#8217;ll post a getting started with a simple application and provide my personal review of this framework, which &#8211; OMHO &#8211; is going to rock!</p>
]]></description>
			<content:encoded><![CDATA[<p>Not yet released, FLOW3 starts making noises on the mass: what&#8217;s TYPO3 up to?</p>
<p>As a result of the already proven TYPO3 CMS, the upcoming 5th version of the system is bringing a solid PHP framework, which can be used apart from the whole system for developing applications of any kind.</p>
<p>The FLOW3 subsite stats, against what&#8217;s commonly seen out there, FLOW3 is not a pick&#8217;n'mix store of motley components. It&#8217;s a framework which helps you with the infrastructure of your application. Object Lifecycle Management, Package Management, Resource Management and Security are on his home field. Real business logic is left over to third-party packages.</p>
<p>All the most common features we can find in other PHP frameworks are going to be provided by FLOW3, like MVC architecture, Validation, Filters, Persistence Object Manager and much more.</p>
<p>Next week, I&#8217;ll post a getting started with a simple application and provide my personal review of this framework, which &#8211; OMHO &#8211; is going to rock!</p>
]]></content:encoded>
			<wfw:commentRss>http://query7.com/flow3-is-arriving/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to hold a more effective code review</title>
		<link>http://query7.com/how-to-hold-a-more-effective-code-review</link>
		<comments>http://query7.com/how-to-hold-a-more-effective-code-review#comments</comments>
		<pubDate>Mon, 27 Oct 2008 05:55:01 +0000</pubDate>
		<dc:creator>Ramses Paiva</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[code review]]></category>

		<guid isPermaLink="false">http://www.webdevelopmentbits.com/?p=176</guid>
		<description><![CDATA[<p>I found a very interesting article about Code Review that I&#8217;d like to share.</p>
<p>The article was written by Andrew Stellman and you can read it at Head First Labs in the address http://www.oreillynet.com/headfirst/blog/2008/09/how_to_hold_a_more_effective_c.html.</p>
<p>Enjoy!</p>
]]></description>
			<content:encoded><![CDATA[<p>I found a very interesting article about Code Review that I&#8217;d like to share.</p>
<p>The article was written by Andrew Stellman and you can read it at Head First Labs in the address http://www.oreillynet.com/headfirst/blog/2008/09/how_to_hold_a_more_effective_c.html.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://query7.com/how-to-hold-a-more-effective-code-review/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

