Clearly, Yahoo knows the keys to better living.
What I find interesting about these lists from an IR perspective is that what information they’re leaking about the ranking algorithms. Before I get into that, let me explain the following tables. For each search engine, the first column contains what the type ahead lists on the website (as evidenced by the screenshots.) The second column contains what the type ahead drop down in Safari’s toolbar lists. For Yahoo, the third column lists what is given if type a space after the “through”. For Google and Bing, the space did nothing.
Dentistry Dentistry Design Design Chemistry Sewing Sewing Chemistry Chemistry Lyrics Circuitry Chemistry Movie Catastrophe Lyrics Chems Economics
Bing
Chemistry Chemistry Design Design Chemistry Lyrics Chemistry Lyrics Killing Killing Beowulf Beowulf TV TV Coffee Coffee Mathematics Mathematics
Yahoo
Chemistry Chemistry Chemistry Design Design Design Circuitry Circuitry Circuitry Killing Killing Killing Dentistry Dentistry Dentistry Chemistry Lyrics Chemicals Hypnosis Technology Chocolate Software Better Information Sarcasm Recreational Sims
The first thing I noticed was that Google had different ranking between the toolbar and the web page. Also, Google is really emphasizing local search. “Better Living Through Dentistry” is a dentist in San Francisco. Putting it first is really strange, since the most famous (and frequently parodied) phrase is “Better Living Through Chemistry.”
Surprisingly, Bing and Yahoo aren’t returning the same results, even though Bing powers both searches. I know Yahoo Search still exists, but apparently they’re still doing custom ranking. Also, Yahoo is using their own ranking to drive the type ahead results. Yahoo isn’t tokenizing their type ahead searches, while Bing and Google are tokenizing theirs on a word basis. Otherwise, typing a space, wouldn’t give generate all new results for Yahoo.
Since I had a set to differing lists, I decided to combine the lists into a single ranked list. To do this, I ordered the terms by the averaged the rank they appeared. When the term did not appear in a list, I used the rank (MaxRankForEngine + 1) + (MaxRankForEngine / NumTermsUnseen). I’m not sure that is the best way to combining federated search results, but since this is just a blog post, I’m not too worried about it.
Bold entries are unique.
- Chemistry
- Design
- Killing
- Circuitry
- Chemistry Lyrics
- Dentistry
- Sewing
- Beowulf
- TV
- Chemicals
- Coffee
- Chemistry Movie
- Technology
- Hypnosis
- Catastrophe Lyrics
- Software
- Chocolate
- Mathematics
- Chems
- Sarcasm
- Better Information
- Economics
- Sims
- Recreational