Google on its transparency and not-so-secret formula

Recently the European Commission opened a preliminary inquiry into competition complaints. Part of the complaint alleges that Google operates without sufficient transparency into how and why web sites rank in our search results. The notion that Google isn’t transparent is tough for me to swallow. Google has set the standard in how we communicate with web site publishers. Let me tell you about some of the ways we explain to sites how we rank them and why.

One of the most widely-discussed parts of Google’s scoring has always been PageRank. That “secret ingredient” is hardly a secret. Here it is. That early paper not only gave the formula for PageRank, but mentioned many of the other signals in Google’s ranking, including anchor text, the location of words within documents, the relative proximity of query words in a document, the size and type of fonts used, the raw HTML of each page, and capitalization of words. Google has continued to publish literally hundreds of research papers over the years. Those papers reveal many of the “secret formulas” for how Google works and document essential infrastructure that Google uses. Some of these papers have spurred not only open-source projects but entire in their own right.

