Subscribe to Inc. magazine
ONLINE BUSINESS

What Is a Software Spider?

Advertisement

A "software spider" is an unmanned program operated by a search engine that surfs the Web just like you would. As it visits each Web site, it records (saves to its hard drive) all the words on each site and notes each link to other sites. It then "clicks" on a link, and off it goes to read, index, and store another Web site.

The software spider often reads and then indexes the entire text of each Web site it visits into the main database of the search engine it is working for. Recently many engines such as AltaVista have begun indexing only up to a certain number of pages of a site, often about 500 total, and then stopping. Apparently, this is because the Web has become so large that it's unfeasible to index everything. How many pages the spider will index is not entirely predictable. Therefore, it's a good idea to specifically submit each important page in your site that you want to be indexed, such as those that contain important keywords.

A software spider is like an electronic librarian who cuts out the table of contents of each book in every library in the world, sorts them into a gigantic master index, and then builds an electronic bibliography that stores information on which texts reference which other texts. Some software spiders can index more than a million documents a day! It is important to understand that search engines' spiders do just two things:

  • They index text.
  • They follow links.

At a recent Search Engine Strategies conference put on by SearchEngineWatch.com, one of the guest speakers, Shari Thurow of Grantastic Designs, made this point and repeated it several times to illustrate its significance: "Search engines index text and follow links. They index text, and they follow links. That's all they do."

Her point is important and central to understanding the nature of search engines' spiders. If the text of your Web site is contained within a graphic, the search engines cannot index it. If all of your important keywords for which you hope to attain rankings are included in the graphics, not in the HTML text, your site will not attain rankings. Remember, search engines do not index pictures or read pictures, they index text and follow links. That's all. If you have no text on your viewable page, no amount of keywords in your keyword metatag will help you to attain rankings.

What the spider sees on your site will determine how your site is listed in its index. Search engines determine a site's relevancy based on a complex scoring system that the search engines try to keep secret. This system adds or subtracts points based on such things as how many times the keyword appeared on the page, where on the page it appeared, and how many total words were found. The pages that achieve the most points are returned at the top of the search results; the rest are buried at the bottom, never to be found.

As a software spider visits your site, it notes any links on your page to other sites. In any search engine's vast database are recorded all the links between sites. The search engine knows which sites you linked to, and more important, which ones linked to you. Many engines will even use the number of links to your site as an indication of popularity, and will then boost your ranking based on this factor.

Copyright © 2000 iProspect.com




Register on Inc.com today to get full access to:
All articles  |  Magazine archives | Livestream events | Comments
EMAIL
PASSWORD
EMAIL
FIRST NAME
LAST NAME
EMAIL
PASSWORD

Or sign up using: