Posted by on 24 Aug 2011 in Reversing, Software | 3 comments

While using Google Related some JSON formatted data is requested from Google about each page you visit; that data comes from a URL similar to the one below:

https://toolbarqueries.google.com/tbr
     ?client=navclient-auto
     &features=GR
     &ch=8e991fe19
     &q=info:http%3A%2F%2Fwww.bronco.co.uk%2F
     &oe=UTF-8
     &grv=0.6.9

(split over several lines for readability)

The response is much too wordy for me to paste here, but what jumped out at me was how similar that URL is to those used by Google Toolbar to request the PageRank of a particular URL. Indeed, the checksum (8e991fe19 in the above example) calculation is exactly the same, and in fact simply replacing the “GR” in the features line with the word “Rank” turns it into a valid request for PageRank, responding something like:

Rank_1:1:5

This example URL has a PageRank value of 5 (the ’1′ in between the colons indicates the length of the returned value, in this case just one character). It is even possible to combine the two, for example:

https://toolbarqueries.google.com/tbr
     ?client=navclient-auto
     &features=Rank:GR
     &ch=8e991fe19
     &q=info:http%3A%2F%2Fwww.bronco.co.uk%2F
     &oe=UTF-8
     &grv=0.6.9

Responds with both PageRank and Google Related data, separated by newlines:

Rank_1:1:5
GR_1:24019:{"server_output_object_version":"","request_id": .....

(cut for brevity)

Calculating the correct checksum for a given URL is fairly straightforward – my Python code to do so is shown below.

GPR_HASH_SEED = "Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer."

def google_hash(value):
    magic = 0x1020345
    for i in xrange(len(value)):
        magic ^= ord(GPR_HASH_SEED[i % len(GPR_HASH_SEED)]) ^ ord(value[i])
        magic = (magic >> 23 | magic << 9) & 0xFFFFFFFF
    return "8%08x" % (magic)

I am not currently aware of any combination of ‘features’ that result in a reply other than “GR” and “Rank”.