Paper Cited: Web Search For A Planet: The Google Cluster Architecture
Google Search handles the largest volume of data in the world. Upon each query, the engine will manipulate the whole database to search for the suitable answer. Therefore, it must use efficient enough method to support faster data retrievement. How is the architecture of the system?
Most imported factors that influence its design:
- Energy Efficiency
- Price-performance Ratio
Google provides reliability in the environment at the software level
- Services replicated across many different machines
- Automatically detecting and handling failures
Serving a Google Query
- Browser first performs a domain name system(DNS) lookup to map the URL to a particular IP address
- A DNS-based load-balancing system selects a cluster by accounting for the user's geographic proximity to each physical cluster.
- Browser then sends a Hypertext Transport Protocol(HTTP) request to one of those clusters.
- Hardware-based load balancer in each cluster monitors the available set of Google Web Servers and performs local load balancing of requests across a set of them.
- GWS machine coordinates the query execution and formats the results into a Hypertext Markup Language(HTML) page response to the browser.
Query Execution can be divided into two phases
Phase 1
- Index server consult an inverted index that maps each query word to a matching list of documents(hit list).
- Index server then determine a set of relevant document by intersecting the hit lists of the individual query word, and then compute a relevance score for each document -> order of the results on the output page.
- Phase 1 produces an ordered list of "docids".
Phase 2
- Document server takes the list of docids and computes the actual title and uniform resource locator(URL) of these documents, along with a query-specific document summary. (As you see on the result page of Google Search)
- In other words, Document server fetches each document from disk to extract the title and the keyword in-context snippet.
Both phases can be load balanced
- Randomly distributing indices and documents into smaller shards.
- Having multiple server replicas responsible for handling each shard, and
- Routing requests through a load balancer
- Those divided smaller tasks can be done independently, and few communication are needed among different machines. Therefore the time is linearly distributed and the speed will be accelerated linearly.
Other Work
Other ancillary tasks upon receiving a query, such as sending the query to a spell-checking system and to an ad-serving system to generate relevant advertisements.
When all the phases are completed, a GWS generates the appropriate HTML page and returns it to the browser.
The CPU speed of the individual index servers does not directly influence the search's overall performance, because we can increase the number of shards to accommodate slower CPUs, and vice versa. Consequently, our hardware selection process focuses on machines that offer an excellent request throughput for our application, rather than machines that offer the highest single-thread performance.
Google cluster key design principles
- Software Reliability: focusing on tolerating failures in software instead of using fault-tolerant hardware
- Use replication for better request throughput and availability
- Price/performance beats peak performance
- Use commodity PCs to reduce the cost of computation
For Google
- They use cheap PCs to have large capacity computation handled.
- Any computation-intensive and server stateless cluster can use this architecture.