
An introspective look at Google was published today in The New York Times entitled “Google Keeps Tweaking Its Search Engine.”
At first thought, Duh! But after spending a couple of minutes of reading, it really gives a great look into the inner workings of the most influential (and most important, in my mind) group of engineers employed by Google, search quality.
I recommend that you take a few minutes and read it for yourself, but here are a couple of highlights that I thought were unbelievable:
Why they keep changing their algorithms:
Yet however easy it is to wax poetic about the modern-day miracle of Google, the site is also among the world’s biggest teases. Millions of times a day, users click away from Google, disappointed that they couldn’t find the hotel, the recipe or the background of that hot guy. Google often finds what users want, but it doesn’t always.
Google values Mr. Singhal and his team so highly for the most basic of competitive reasons. It believes that its ability to decrease the number of times it leaves searchers disappointed is crucial to fending off ever fiercer attacks from the likes of Yahoo and Microsoft and preserving the tidy advertising gold mine that search represents.
“Search over the last few years has moved from ‘Give me what I typed’ to ‘Give me what I want,’ ” says Mr. Singhal, a 39-year-old native of India who joined Google in 2000 and is now a Google Fellow, the designation the company reserves for its elite engineers.
“Expectations are higher now,” said Udi Manber, who oversees Google’s entire search-quality group. “When search first started, if you searched for something and you found it, it was a miracle. Now, if you don’t get exactly what you want in the first three results, something is wrong.”
Why are SEO practices different for different verticals?
Freshness, which describes how many recently created or changed pages are included in a search result, is at the center of a constant debate in search: Is it better to provide new information or to display pages that have stood the test of time and are more likely to be of higher quality? Until now, Google has preferred pages old enough to attract others to link to them.
Mr. Singhal introduced the freshness problem, explaining that simply changing formulas to display more new pages results in lower-quality searches much of the time. He then unveiled his team’s solution: a mathematical model that tries to determine when users want new information and when they don’t. (And yes, like all Google initiatives, it had a name: QDF, for “query deserves freshness.”)
THE QDF solution revolves around determining whether a topic is “hot.” If news sites or blog posts are actively writing about a topic, the model figures that it is one for which users are more likely to want current information. The model also examines Google’s own stream of billions of search queries, which Mr. Singhal believes is an even better monitor of global enthusiasm about a particular subject.
Google uses “signals” with user history to rank different pages for the same query (i.e. personalized search):
As Google compiles its index, it calculates a number it calls PageRank for each page it finds. This was the key invention of Google’s founders, Mr. Page and Sergey Brin. PageRank tallies how many times other sites link to a given page. Sites that are more popular, especially with sites that have high PageRanks themselves, are considered likely to be of higher quality.
Mr. Singhal has developed a far more elaborate system for ranking pages, which involves more than 200 types of information, or what Google calls “signals.” PageRank is but one signal. Some signals are on Web pages — like words, links, images and so on. Some are drawn from the history of how pages have changed over time. Some signals are data patterns uncovered in the trillions of searches that Google has handled over the years.
“The data we have is pushing the state of the art,” Mr. Singhal says. “We see all the links going to a page, how the content is changing on the page over time.”
Increasingly, Google is using signals that come from its history of what individual users have searched for in the past, in order to offer results that reflect each person’s interests. For example, a search for “dolphins” will return different results for a user who is a Miami football fan than for a user who is a marine biologist. This works only for users who sign into one of Google’s services, like Gmail.
(Google says it goes out of its way to prevent access to its growing store of individual user preferences and patterns. But the vast breadth and detail of such records is prompting lust among the nosey and fears among privacy advocates.)
Once Google corrals its myriad signals, it feeds them into formulas it calls classifiers that try to infer useful information about the type of search, in order to send the user to the most helpful pages. Classifiers can tell, for example, whether someone is searching for a product to buy, or for information about a place, a company or a person. Google recently developed a new classifier to identify names of people who aren’t famous. Another identifies brand names.
These signals and classifiers calculate several key measures of a page’s relevance, including one it calls “topicality” — a measure of how the topic of a page relates to the broad category of the user’s query. A page about President Bush’s speech about Darfur last week at the White House, for example, would rank high in topicality for “Darfur,” less so for “George Bush” and even less for “White House.” Google combines all these measures into a final relevancy score.
The sites with the 10 highest scores win the coveted spots on the first search page, unless a final check shows that there is not enough “diversity” in the results. “If you have a lot of different perspectives on one page, often that is more helpful than if the page is dominated by one perspective,” Mr. Cutts says. “If someone types a product, for example, maybe you want a blog review of it, a manufacturer’s page, a place to buy it or a comparison shopping site.”
If this wasn’t excruciating enough, Google’s engineers must compensate for users who are not only fickle, but are also vague about what they want; often, they type in ambiguous phrases or misspelled words.
Check out the full article.
related posts >>
- 2008 MLB Postseason
- Google’s Webmaster Central Reflects on 2006
- Caught Gaming Google
- Yahoo’s Brand Universe
- Sitelinks? Google’s New User Help











