
In 2012, Google officially unveiled the Knowledge Graph, famously announcing a fundamental shift in their mission: from matching “strings” (sequences of letters) to understanding “things” (real-world entities and their relationships).
For the public, this was the dawn of a new, intelligent search engine. For Google’s engineers, it was the culmination of a dream that began in 1999.
This is perhaps the most profound “déjà vu moment” in Koray Gübür’s presentation, “Semantic Search Engine & Query Parsing.”
The technology that allows Google to show you a rich panel about Leonardo da Vinci with his birth date, artworks, and students wasn’t invented in 2012.
Its DNA can be traced directly to Google’s very first Semantic Search Patent, filed by co-founder Sergey Brin himself in March 2000, just one year after Tim Berners-Lee’s vision for a “Semantic Web.”
This is the origin story of Google’s brain.
The Problem: A Web of Unstructured Chaos
In the late ’90s, the web was a vast, chaotic library with no card catalog. Information was scattered across millions of pages in countless different formats.
A list of books on one site uses different HTML tags than a list on another.
As the 1999 patent filing states:
“The World Wide Web provides a vast source of information of almost all types… However, this information is often scattered among many Web servers and hosts, in many different formats. If these chunks of information could be extracted from the World Wide Web and integrated into a structured form, they would form an unprecedented source of information.”
This quote is the mission statement for the entire Knowledge Graph.
The goal was to extract facts from unstructured web pages and assemble them into a massive, organized database of “things.”
The Solution: Pattern Recognition and Open Information Extraction
Sergey Brin’s early patent outlined a surprisingly simple-sounding, yet revolutionary, method.
The system would:
-
Identify Patterns: It would look for recurring HTML structures on pages. For example, it might notice that many pages list a book title in bold (title) followed by the word “by” and then the author’s name.
-
Extract Tuples: It would extract these pairs of data (e.g., [Author: Isaac Asimov, Title: The Robots of Dawn]) as “tuples.”
-
Recognize Entities and Relations: Even at this early stage, the system was designed to handle basic Named Entity Recognition and Relation Detection (Slide 43). It was learning to connect two “things” (an author and a book) with a specific relationship (“wrote”).
This process is the direct ancestor of Open Information Extraction (OIE). OIE is about “fact extraction’ around nouns.”
Google continued to invest heavily in this area, acquiring Wavii, a news aggregator built on OIE technology, for a reported $30 million in 2013.
The goal of OIE is to find relational tuples in plain text, like [Noun 1] - [Verb/Adverb] - [Noun 2].
For example, in the sentence “Steve Jobs created Apple,” the system extracts the tuple [Steve Jobs] - [created] - [Apple]. By doing this millions of times across the web, it builds a graph of how entities are connected.
The Birth of the Entity-Oriented Search Engine
This ability to extract and connect facts laid the groundwork for a new kind of search engine, one based on entities.
The final piece of this puzzle is the Entity-seeking Query, a query type that is not looking for a webpage, but for an attribute of a specific “thing.”
Examples from Koray’s presentation show this in action:
-
country look like boot → Italy (The query seeks an entity based on a visual attribute).
-
hotel look like sail → Burj Al Arab hotel (Another query seeking an entity by attribute).
-
person discovered americas → Christopher Columbus (The query seeks an entity based on an action/relation).
A traditional search engine would struggle with these, looking for pages that contain those exact keywords.
An entity-oriented search engine, powered by a structured database of facts (the Knowledge Graph), can directly answer the question by retrieving the entity that matches those attributes.
What does this mean concretely?
Understanding the origins of the Knowledge Graph reveals the long-term vision of Google and provides a clear blueprint for success.
Structure your data:
Use structured data (Schema.org markup) whenever possible. It’s the modern equivalent of the clear HTML patterns Sergey Brin’s patent was designed to find. You are essentially pre-packaging your facts for easy inclusion in the Knowledge Graph.
Be Unambiguous:
In your content, be clear about who and what you are talking about. Don’t just mention “John Smith”; specify “John Smith, the CEO of ExampleCorp.”
This helps with entity reconciliation and ensures Google connects your content to the correct “thing.”
Build Your Content Around Entities, Not Just Keywords:
When writing about a topic, think about the primary entity and its key attributes.
If you’re writing about the Eiffel Tower, your content should naturally include attributes like its height, location (Paris, France), architect (Gustave Eiffel), and construction date.
This aligns your content with the structure of the Knowledge Graph.
Answer Attribute-Based questions:
People increasingly search for entities based on their characteristics.
Think about the unique attributes of your products, services, or topics and create content that explicitly answers those questions.
The Knowledge Graph wasn’t a sudden innovation; it was the realization of a foundational goal set at Google’s inception. By creating content that is clear, structured, and focused on real-world entities and their relationships, you are aligning with the very core of how Google has worked to understand the world for over two decades.
In my next article, I will bring it all together with a detailed case study from Koray’s presentation: a step-by-step deconstruction of how Google parses the query “when was Martin Luther King Jr. born,” revealing the entire semantic engine in action.