Search Engine Crawling and Indexing

You really cannot begin any SEO effort until you have at least a basic understanding of how search engines craw, index and rank the millions of websites on the internet. The big search engines operate in pretty much the same way, so the information contained in this article will apply equally to Google, Yahoo, and Microsoft’s Bing.

WWW is an acronym for “World Wide Web”

The easiest way to understand how search engines work, is to think of the internet as a web. A web not too dissimilar to a spider web. In this web, many documents are interconnected, both within a site, and also with other sites on the internet.

A page on the internet might have internal links (links contained within a specific site) as well as external links (links to pages on other websites). These links need not just be to other HTML (web) pages. Links can be to images, to PDF and other documents, to MP3 music and podcasts, and even to online video.

When a search engine craws the web, it follows these links, from one page to the next. Because of the links between the pages, the search engines crawlers are able to find virtually every publically available website and document on the internet. The computers or automated robots used by search engines for this task, are therefore referred to variously as crawlers, spiders and ‘search bots’.

Once the pages have been found, the search engines use proprietary algorithms to extract the useful content, and store it in the huge arrays of servers and hard drives they have in their data centres. All the search engines have data centres that are distributed across the globe that copies of these indexes will be stored in. This allows them to provide search across billions of pages that is nearly instantaneous, irrespective of your location on the planet.

Additionally, and then make copies of all the content they find, allowing Google for example, to provide a cached (or saved) copy of a page when the original site is not available.

The entire process is often referred to as indexing. If you think of the entire internet as one giant book (like an encyclopaedia), trying to find anything would be almost impossible without an index. Using the entries in the index, one can easily find (often multiple) references to a particular topic. The term ‘indexing’ clearly makes (almost literal) sense.

Given the sheer size of the internet, the various search engines have algorithms to determine how often they should re-index your website. While search engines need to be as up to date as possible, they need to balance this with the cost of continually retrieving copies of your website.

Generally, the more often your website is update, the more frequently the search engines will crawl it.

No Comments

    Leave a reply


    Powered by searchengineswebsite.com