Google Indexing
 
 
Google Indexing

Google runs on a distributed network and can therefore carry out fast parallel processing. Googlebot is a web crawler that finds and fetches web pages and gives the indexer the full text of the pages it finds. These pages are then stored in the Google's index database. Each index entry is stored as list of documents in which the search term appears and the location where it appears. The index is sorted alphabetically by search term.

This data structure allows us to have a rapid access to documents that contain user query terms. The indexer ignores some punctuation and multiple spaces. It also ignores converting all letters to lowercase so that Google's performance is improved. The indexer sorts every word on every page and stores the resulting index of words in a huge database.