Search Engine Crawling and Indexing
You really cannot begin any SEO effort until you have at least a basic understanding of how search engines craw, index and rank the millions of websites on the internet. The big search engines operate in pretty much the same way, so the information contained in this article will apply equally to Google, Yahoo, and Microsoft’s Bing.
WWW is an acronym for “World Wide Web”
The easiest way to understand how search engines work, is to think of the internet as a web. A web not too dissimilar to a spider web. In this web, many documents are interconnected, both within a site, and also with other sites on the internet.
A page on the internet might have internal links (links contained within a specific site) as well as external links (links to pages on other websites). These links need not just be to other HTML (web) pages. Links can be to images, to PDF and other documents, to MP3 music and podcasts, and even to online video.
Once the pages have been found, the search engines use proprietary algorithms to extract the useful content, and store it in the huge arrays of servers and hard drives they have in their data centres. All the search engines have data centres that are distributed across the globe that copies of these indexes will be stored in. This allows them to provide search across billions of pages that is nearly instantaneous, irrespective of your location on the planet.
Additionally, and then make copies of all the content they find, allowing Google for example, to provide a cached (or saved) copy of a page when the original site is not available.
Given the sheer size of the internet, the various search engines have algorithms to determine how often they should re-index your website. While search engines need to be as up to date as possible, they need to balance this with the cost of continually retrieving copies of your website.
Generally, the more often your website is update, the more frequently the search engines will crawl it.