Tag Archives: crawler

Ruby – Web Crawler with Spidr Gem

0
Filed under ruby
Tagged as , , , , , , , , , , , ,

Author: Kunto Aji
Site: http://www.railsmine.net
Summary: Ruby script to get all URLs from target site. You may need install Spidr gem first. This script is tested on Linux.
Usage: ruby filename.rb

#!/usb/bin/ruby

#  _________      .__       .____   _______
# /   _____/ ____ |__|_____ |    |  \   _  \    ____
# \_____  \ /    \|  \____ \|    |  /  /_\  \  / ___\
# /        \   |  \  |  |_> >    |__\  \_/   \/ /_/  >
#/_______  /___|  /__|   __/|_______ \_____  /\___  /
#        \/     \/   |__|           \/     \//_____/
# http://www.railsmine.net

require 'rubygems'
require 'spidr'

i = 1
url_file = File.open('spider.txt', 'w')
Spidr.start_at('http://www.railsmine.net/') do |spider|
  spider.every_url { |url|
  puts "#{i}. #{url}"
        if (url_file)
                url_file.puts("#{i}. #{url}")
        end
  i = i + 1
  }
end
url_file.close
puts "Done. All URLs has been saved to spider.txt"

ruby,code,code,snippet