|
Google Indexing
|
|
|
Google runs on a distributed
network and can therefore carry out fast parallel
processing. Googlebot is a web crawler that finds
and fetches web pages and gives the indexer the full
text of the pages it finds. These pages are then
stored in the Google's index database. Each index
entry is stored as list of documents in which the
search term appears and the location where it
appears. The index is sorted alphabetically by
search term.
This data structure allows us to have a rapid
access to documents that contain user query terms.
The indexer ignores some punctuation and multiple
spaces. It also ignores converting all letters to
lowercase so that Google's performance is improved.
The indexer sorts every word on every page and
stores the resulting index of words in a huge
database.
|
|