Important Information About Understanding Indexing

by pittfall on February 27, 2007

Recently, on both Yahoo! Search Blog and the Official Google Blog recently posted on indexing websites and how to use some of the tools that everyone has access to keep pages out of the search engine’s indexes.

Google
Controlling how search engines access and index your website

I’m often asked about how Google and search engines work. One key question is: how does Google know what parts of a website the site owner wants to have show up in search results? Can publishers specify that some parts of the site should be private and non-searchable? The good news is that those who publish on the web have a lot of control over which pages should appear in search results.

The key is a simple file called robots.txt that has been an industry standard for many years. It lets a site owner control how search engines access their web site. With robots.txt you can control access at multiple levels — the entire site, through individual directories, pages of a specific type, down to individual pages. Effective use of robots.txt gives you a lot of control over how your site is searched, but its not always obvious how to achieve exactly what you want. This is the first of a series of posts on how to use robots.txt to control access to your content.

and this was followed up recently:
The Robots Exclusion Protocol

This is the second in a short series of posts about the Robots Exclusion Protocol, the standard for controlling how web pages on your site are indexed. This post provides more details and examples of mechanisms to control access and indexing of your website by Google.

In the first post in this series, I introduced robots.txt and robots META tags, giving an overview of when to use them. In this post, I’ll look at some examples of the power of the protocol. These examples illustrate the detailed and fine-grain control online publishers have over how their websites are indexed.

Yahoo!
Keeping Ad Tracking and Dead URLs out of Yahoo! Search

We’re often asked how Yahoo! Search determines which pages get indexed and which pages are left un-crawled. First and foremost, we honor the industry-standard robots.txt file format, which gives Webmasters several layers of control over which sites, pages and specific URLs should be indexed. Lately we’ve heard from a number of Webmasters asking how best to prevent ad tracking URLs and dead URLs from getting indexed, so we thought we’d respond via this post.

The robots.txt and META Tags controlling the indexing behavior of the search engine crawlers is very important to SEO. Ensuring what is indexed by search engines is the first step to visibility to users and should always have a priority in your optimization efforts.

Related Posts >>


Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>