When I first looked at Nokogiri, it was a redefining moment(atleast for me!) on how to screen scrap. Recently I found my love with cucumber and capybara-webkit. For newbies to capybara-webkit, it is a capybara driver which simulates a webkit browser for running tests. Perks? You get a simulated browser running in a headless mode, it supports javascript and its bloody fast! For more info, please checkout a previous article on how to get started. I was extremely bored this weekend, and all of a sudden an idea was born. I created a simple search spider using capybara-webkit which would fetch search results from google. And here is how I did it.
require 'ruby gems'
require 'capybara'
require 'capybara/dsl'
require 'capybara-webkit'
Capybara.run_server = false
Capybara.current_driver = :webkit
Capybara.app_host = "http://www.google.com/"
module Spider
class Google
include Capybara::DSL
def search
visit('/')
fill_in "q", :with => ARGV[0] || "I love Ruby!"
click_button "Google Search"
all("li.g h3").each do |h3|
a = h3.find("a")
puts "#{h3.text} => #{a[:href]}"
end
end
end
end
spider = Spider::Google.new
spider.search