Search Engines
Related Terms: Internet Domain Name; Web Site Design
What is almost certain is that this entry on search engines will soon be obsolete—so rapid and dynamic are the changes that affect this central technology and service on the Internet. Thus in the entry published in the last edition of this volume the name Google did not even appear, but just a few years later Google has become the leading search engine provider the world over. So what is a search engine?
GENERAL ASPECTS
Search engines are software systems that associate search words entered by a user, looking for information, with websites on the World Wide Web that contain the words of the query. To accomplish this linking, search engines must be backed by databases that hold words that Web sites use as linked lists. Search words may produce just a handful or a very large number of Web sites. The search word "supercalifragilisticexpialidocious" produced around 294,000 hits in 2006 on Google; the somewhat obscure and specialized word "nunciature" (the office or period of office of a nuncio) produced 82,400 hits; the word "nuncio" itself (an ambassador for the papacy) yielded 1,050,000 hits. The name Chu Yuan-chang, the 14th century founder of the Ming Dynasty in China, produced 725 hits. It is difficult to find stand-alone search words with a low number of hits; even misspellings bring rich results—because words are often misspelled on Web pages too and dutifully indexed by the search engines. This very wealth of hits makes it necessary for search engines to store additional information about every Web site in order to enable the engine somehow to present results in some kind of rationally ranked order. Complex algorithms are used to rank hits. The principal method is to present those sites first which have been clicked on most frequently in the past; and sites with more links to other sites get preference, all things equal.
A search engine, thus, requires its own internal logic and functionality, the software, and a database. But this database must first be built, maintained, updated, and grown as new sites are added to the Internet. Search engines, therefore, have a massive data acquisition function. In the early days the databases were built by people who scanned the web, followed links on Web sites, and indexed new pages they found. This technique is still in use with specialized Web sites and, until October 2002, was used by the world's second-ranking search engine, Yahoo. In the mid-2000s the databases of almost all search engines are built and maintained by search robots that seek out sites and capture their contents for indexing—unless the site itself prohibits this activity. The robots are themselves software programs. They are known as "crawlers" because they "crawl the Web" acquiring information. Alternatively, Web site owners can also register their sites with search engines—a technique used by commercial sites eager to be found.
Search engines are 1) technologies of searching, 2) databases in support of searching, and 3) services provided to users. Search engine owners can cover their costs by all three means. The technology they own can be licensed or deployed for others at a fee; the databases can be made available for money; and the services provided can be paid for using advertising. The most effective linking of the search function itself with advertising was pioneered by Google under the name "Adwords." Specific words are sold to advertisers. When searches using the words appear, the advertisers' small ads are displayed with search results. Advertisers pay a fee when the engine users "click through" to the advertiser's own site. Other techniques make use of search words or phrases and display closely matching spot ads on the Web page.
ENGINES AND THE INTERNET
The Internet owes its dramatic growth to the development of search engines. The first such engine was Lycos, launched in mid-1994 with 54,000 documents. Using its crawler technology, it had expanded its database to 1.5 million documents by early 1995 and had 60 million by the end of 1996. Another claimant to the founding role was AltaVista, introduced in 1995 and still active on the Web. Until Lycos and AltaVista appeared, access to the Internet required advanced knowledge of Web addresses, and roaming the Internet involved following links from site to site as these referred to each other.
The services provided by search engines become obvious with a few statistics. According to the Internet Systems Consortium (ISC), which conducts four surveys every year, in January 2006 around 395 million Internet hosts were in operation, each one hosting multiple sites, each site consisting of several Web pages on average. Extremely simple searches on leading engines provided up to 17 billion hits on Google in 2006 (for the word "the," for instance); AltaVista produced 7.4 billion, Ask.com 2.1 billion, and MSN 2.4 billion hits on the word. AltaVista uses Yahoo technology; Yahoo itself, asked to search for "the," simply shrugged off the labor and provided a single hit on a corporation with the THE acronym. Some estimates put the number of pages on the Internet at hundreds of billions, but as the ISC points out from a depth of survey experience, it is not possible to determine the actual size of the Internet. In any case, several million hosts, never mind 17 billion pages, are already astronomically big numbers. The ability of search engines to provide access to such magnitudes in matters of a second or so makes the Internet the useful phenomenon that it is. The rankings of hits, which actually reflect frequency of use by others, makes using very massive search results practical. Who, after all, can afford to review 60,000 hits—or even 700.
STRUCTURE OF THE INDUSTRY
Search Engine Watch, a Web journal concentrating on search engines and related matters, began operations in 1997, thus three years after the first search engine appeared. The company offers prizes, has public information as well as a membership service, and is an excellent source of developments in this field. Search Engine Watch (hereafter referred to as SEW) produces rankings and technical information about this industry. What follows has been gleaned largely from searchenginewatch.com.
ADVERTISEMENT
FROM OUR PARTNERS
Select Services
- Forced to pay more?
- Salesforce costs up to 65% more than Microsoft Dynamics CRM. Compare.
- Collaborate in the cloud with Office, Exchange, SharePoint and Lync videoconferencing.
- Begin your free trial at Microsoft.com/office365
- Get on the same page
- Show and tell by sharing your screen instantly at join.me. Free.
- Shred No-Handed!
- Hands Free Shredding From Swingline Lets You Do More Productive Things!
- Winning new customers?
- SMB experts share their secrets at PersonallyPB.com/smb
- Turn Fans into Customers
- Social Campaigns from Constant Contact. Sign up now - it's free!







community

