I was recently playing around with Gimite's wonderful Google Drive client library for Ruby (link) and wanted to make use of some of the amazing things you can do with Google Sheets. The `importXml` formula pulls in search results from Google Search into a Google spreadsheet: `=importXml("https://www.google.co.uk/search?start=1&num=100&as_sitesearch=adamjamesbull.co.uk","//cite")` Which produces a set of results that looks something like this in Google Sheets: Unfortunately due to the search architecture, you can only retrieve 100 search results at a time, but with some programmatic goodness, you can loop over the result of the `importXml` formula. I originally started off this little pet project in C# but soon switched over to Ruby and it was so much easier to get the job done! Anyhoo, here is the source code:
require "rubygems"
require "google/api_client"
require "google_drive"

# Authorizes with OAuth and gets an access token.
client = Google::APIClient.new
auth = client.authorization
auth.client_id = "xxx" #client id here
auth.client_secret = "xxx" #client secret here
auth.scope =
    "https://www.googleapis.com/auth/drive " +
auth.redirect_uri = "urn:ietf:wg:oauth:2.0:oob"
print("1. Open this page:n%snn" % auth.authorization_uri)
print("2. Enter the authorization code shown in the page: ")
auth.code = $stdin.gets.chomp
access_token = auth.access_token

domain = "xxx" #domain here

# Creates a session.
session = GoogleDrive.login_with_oauth(access_token)

# Uploads a local file.
file = session.upload_from_file("spreadsheet-template.xlsx", domain, :convert => true)

key = file.key

print("3. The document has been created. The url is https://docs.google.com/spreadsheet/ccc?key=" + key + "n")

# first worksheet
ws = file.worksheets[0];

# establish result counter and other counters
resultcounter = 1
jumpcounter = 100

domainprefix = "https://www.google.co.uk/search"

ws[resultcounter,1] = %Q[=importXml("#{domainprefix}?start=#{resultcounter.to_s()}&num=100&as_sitesearch=#{domain}","//cite")]


while ws[resultcounter + jumpcounter - 1, 1].length > 0
	ws[resultcounter + jumpcounter, 1] = %Q[=importXml("#{domainprefix}?start=#{resultcounter.to_s()}&num=100&as_sitesearch=#{domain}","//cite")]
	resultcounter = resultcounter + jumpcounter
Steps to use:
  1. Register a new app at https://console.developers.google.com
  2. Fill in the client id and secret in the source code above
  3. Fill in the domain that you would like to scrape
  4. Create a blank xlsx file named `spreadsheet-template.xlsx` using Microsoft Excel and save it into the project folder.
  5. Open up a Terminal window, `cd` to the containing folder and run `ruby app.rb` and the application will begin.
Hopefully someone will find this useful! Adam