What the Google Leak Tells Us About SEO

On May 31, 2024

Over Memorial Day weekend, a trove of leaked Google Search documents sent shockwaves through the SEO community. On May 5th, Rand Fishkin, Co-Founder of Moz and Co-Founder and CEO of Sparktoro, received an email from Erfan Azimi, CEO of EA Digital, with over 2,500 pages of Google API documentation. After weeks of confirming the documents as authentic and accurate, Fishkin and Mike King of iPullRank announced the leak. Google confirmed the leak as real on May 29th, 2024, and that it had happened on March 29th, 2024.

These documents exposed unprecedented insights into the inner workings of Google’s ranking algorithm. SEO professionals have speculated and tested various theories about what factors truly impact search rankings for years. Now, with these leaks, we have a clearer picture, revealing both expected and surprising elements that could reshape our approach to SEO.

This new information contradicts Google’s past statements, raising questions about the accuracy of their public communications. For SEO practitioners, strategies to enhance user engagement—like improving page load times, crafting compelling meta descriptions, and optimizing content for readability—are more critical than ever.

Clicks and User Engagement Matter

One of the most significant revelations from the leak is the importance of clicks and user engagement signals, such as dwell time. Previously, Google has downplayed the role of clicks and user data in determining search rankings. However, the leaked documents suggest that these factors are indeed crucial.

Google’s search team realized early on that to improve search engine results for a significant portion of the web population, they needed full clickstream data or every URL a browser visited. A mechanism known as “NavBoost” originally collected data from Google’s Toolbar PageRank (as VP of Search Pandu Nayak stated in his DOJ case evidence). The primary driving force behind the development of the Chrome browser was the need for more clickstream data. Using NavBoost, google tracks user activity on searches before, during, and after a primary search inquiry (also known as a “NavBoost query”).

An example of this process is that if many users search for “Lisa Stone” and don’t find the Graziani Multimedia website, but immediately change their query to “Graziani Multimedia” and click grazianimultimedia.com in the search result, then grazianimultimedia.com (and websites mentioning “Graziani Multimedia”) will receive a boost in the search results for the “Lisa Stone” query.

To determine trending search demands, NavBoost looks at the number of searches for a certain phrase, the number of clicks on a search result, and the proportion of long versus short clicks. Additionally, NavBoost geo-fences click data based on country, state/province, and mobile vs. desktop usage. However, if Google lacks data for specific locations or user agents, they may apply the method to all query results.

NavBoost also rates searches based on user intent. For example, certain levels of attention and clicks on movies or photos will result in video or image characteristics for that query and similar NavBoost-associated inquiries. Google also uses cookie history, logged-in Chrome data, and pattern detection (referred to in the leak as “unsquashed” clicks versus “squashed” clicks) to battle click spam.

Domain Authority is a Real Ranking Factor

Another major revelation from the documents is the confirmation of a “site authority” or authority score used for ranking purposes. The SEO community has speculated about the existence of Domain Authority (DA) as a ranking signal for a very long time, despite Google’s denials. For years, SEO analysts have used Moz’s Domain Authority and Ahrefs Domain Rating to optimize websites, and they have always seemed to be effective in boosting ranking. The leaked documents now provide concrete evidence that site authority is indeed a factor in Google’s algorithm. Google has a metric called siteAuthority that it computes for every website, which determines a site’s capacity to get indexed in Google searches based on its relevance to a specific subject area or industry.

The SEO community has expressed concerns about the difficulty of smaller websites & authors appearing in search results due to the weight placed on domain authority by Google, and now we understand why. It is becoming increasingly evident that Google favors the major players in search rankings, so SEO is of the utmost importance. The authority of your brand and domain has a direct impact on your position in search results and your ability to advance through the ranks.

Additionally, it has come to light that Google whitelists certain websites to appear at the top of search results during major events, such as the Covid pandemic and democratic elections. The presence of markers for “isCovidLocalAuthority” and “isElectionAuthority” in multiple locations implies that Google is whitelisting specific domains that are appropriate to display for highly controversial or potentially problematic inquiries. Similarly, a module on “Good Quality Travel Sites” suggests that Google has a travel-specific whitelist.

The confirmation of a site authority metric underscores the importance of building a strong, authoritative site. Strategies such as acquiring high-quality backlinks, producing valuable content, and establishing a robust online presence are essential for improving site authority and, consequently, search rankings.

Other Potential Ranking Signals

The leaked documents also shed light on several other potential ranking signals, including:

  • Title Match Score: The titlematchScore measures the sitewide title match score to signal how well titles match user queries.
  • Average Weighted Font Size: The avgTermWeight metric tracks the average weighted font size for keywords, and the same is true for backlink anchor text.
  • Link Relevance: The sourceType shows a loose relationship between where a page is indexed and how valuable it is. Both inbound and outbound links need to be high quality, relevant, and recent.
  • Exact Match Domains: The API demotes domains that exactly match search queries but do not provide quality content.
  • Change History Tracking: Google tracks every modification that has ever been made to a page but only considers the past 20 modifications to a URL for ranking.
  • Link Freshness: Google looks at dates in the byline (bylineDate), URL (syntacticDate), and on-page content (semanticDate).
  • Core Topics: To determine whether a document is or isn’t a core topic of the website, Google vectorizes pages and sites, then compares the page embeddings (siteRadius) to the site embeddings (siteFocusScore)

These findings indicate that a comprehensive approach to SEO that concentrates on both technical and content considerations is essential to obtaining ideal search rankings, even though these are only a handful of the 14,000 ranking metrics featured in the leak.

Google’s Lack of Transparency

The leak not only reveals critical ranking factors but also highlights Google’s lack of transparency regarding its algorithm. For years, SEO professionals have voiced frustrations about Google’s obscure communications and its efforts to “discredit” SEO findings. Much of the information gleaned from the leaked documents is in direct opposition to public statements made by Google employees over the years, including the company’s repeated denials that click-centric user signals are used, that subdomains are taken into account independently when ranking websites, that a sandbox for newly launched websites exists, that the age of a domain is gathered or taken into account, and much, much more.

Although it’s commonly accepted that Google obscures SEO criteria to prevent spam and manipulated site ranks, this lack of transparency can make it more difficult for marketers to fully comprehend and optimize for Google’s search algorithm. The leaked documents highlight Google’s need to provide clearer, more consistent information about what influences search rankings.

The Need for Testing and Skepticism

In the always-changing field of search engine optimization, staying ahead will require constant testing and data analysis. With over 2,500 pages and over 14,000 ranking metrics released in the leak, it may be some time before all of the information can be truly gleaned. SEO professionals must take a critical and testing-focused stance in the wake of these findings. SEO specialists should give priority to their own research and reliable SEO sources rather than depending exclusively on Google’s public remarks.

 

As the entire impact of the breach becomes clear, SEO practitioners must remain alert, testing speculations and adapting techniques based on the most trustworthy data available. SEO strategies may change as a result of the insightful information revealed by the leaked Google Search documents. The significance of clicks and user interaction, the validation of site authority as a metric, and the uncovering of many other indicators that impact search ranks are among the key takeaways. This newly gained information is an invaluable asset, but it also emphasizes the continued importance of transparency and honesty in the field of search engine optimization.

Lisa Stone
Graziani Multimedia's resident wordsmith. Her relentless dedication to research, mad coffee mastery, and word-slinging skills mean that all the content that she creates is enjoyable, engaging, and effective.

Related Posts

Get our newsletter

We'll keep you posted on the latest news in marketing and and triple bottom line business. Great content, no SPAM. We're cool like that.