Nokogiri是一個HTML、XML、SAX等解析器。它可以通過CSS3或者XPath來檢索文檔。具有如下特徵:
- XML/HTML DOM parser which handles broken HTML
- XML/HTML SAX parser
- XML/HTML Push parser
- XPath 1.0 support for document searching
- CSS3 selector support for document searching
- XML/HTML builder
- XSLT transformer
安裝命令:
gem install nokogiri
這裏截取官網上的一段代碼示例:
require 'nokogiri'
require 'open-uri'
# Fetch and parse HTML document
doc = Nokogiri::HTML(open('http://www.nokogiri.org/tutorials/installing_nokogiri.html'))
puts "### Search for nodes by css"
doc.css('nav ul.menu li a', 'article h2').each do |link|
puts link.content
end
puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
puts link.content
end
puts "### Or mix and match."
doc.search('nav ul.menu li a', '//article//h2').each do |link|
puts link.content
end
運行結果如下: