Pages

Saturday, March 5, 2011

Types of Search Engines

There are four major search engine types:

1. Crawler-Based (Traditional, Common) Search Engines: Crawler-based SEs, also referred to as spiders or Web crawlers, use special software to automatically and regularly visit websites to create and supplement their giant Web page repositories. This software is referred to as a "bot", "robot", "spider", or "crawler".

These programs run on the search engines. They browse pages that already exist in their repositories, and find our site by following links from those pages. Alternatively, after we have submitted pages to a search engine, these pages are queued for scanning by a spider; it finds our page by looking through the lists of pages pending review in this queue.

After a spider has found a page to scan, it retrieves this page via HTTP (like any ordinary Web surfer who types an URL into a browser's address field and presses "enter"). Just like any human visitor, the crawling software leaves a record on our server about its visit. Therefore, it's possible to know from our server log when a search engine has dropped in on our online estate.

Our Web server returns the HTML source code of our page to the spider. The spider then reads it (this process is referred to as "crawling" or "spidering") and this is where the difference begins between a human visitor and crawling software.
A human visitor can appreciate the quality graphics and impressive Flash animation we have loaded onto our page but a spider won't. But now google search engine can crowl these, this is the new update of google.

A human visitor does not normally read the META tags, a spider can. Only seasoned users might be curious enough to read the code of the page when seeking additional information about the Web page.

A human visitor will first notice the largest and most attractive text on the page. A spider, on the other hand, will give more value to text that's closest to the beginning and end of the page, and the text wrapped in links.

Perhaps, we have spent a fortune creating a killer website designed to immediately captivate our visitors and gain their admiration. We have even embedded lots of quality Flash animation and JavaScript tricks. Yet, a search engine spider is a robot which only sees that there are some images on the page and some code embedded into the "script" tag that it is instructed to skip. These design elements are additional obstacles on its way to our content. What's the result? The spider ranks our page low, no one finds it on the search engine, and no one is able to appreciate the design.

After reading our pages, it will compress that pages in a way that is convenient to store in a giant repository of Web pages called a search engine index.

Search Engine Index: The data are stored in the search engine index the way that makes it possible to quickly determine whether this page is relevant to a particular query and to pull it out for inclusion in the result page shown in response to the query. The process of placing our page in the index is referred to as "indexing".

2. Human-edited directories: The pages that are stored in their repository are added solely through manual submission. The directories, for the most part, require manual submission and use certain mechanisms (particularly, CAPTCHA images) to prevent pages from being submitted automatically. After completing the submission procedure, our URL will be queued for review by an editor, who is, luckily, a human.

When directory editors visit and read our site, the only decision they make is to accept or reject the page. Most directories do not have their own ranking mechanism - they use various obvious factors to sort URLs, such as alphabetic sequence or Google Page Rank. It is very important to submit a relevant and precise description to the directory editor, as well as take other parts of this manual submission seriously.

Spider-based engines often use directories as a source of new pages to crawl. While a crawler-based engine would visit our site regularly and detect any change we make to our pages but in a directory, result listings are influenced by humans. We enter a short description of our website, when searching, only these descriptions are scanned for matches, so website changes do not affect the result listing at all. The best-known and most important directories are Yahoo and DMOZ.

3. Hybrid Engines: Some engines also have an integrated directory linking to them. They contain websites which have already been discussed or evaluated. When sending a search query to a hybrid engine, the sites already evaluated are usually not scanned for matches; the user has to explicitly select them. Whether a site is added to an engine's directory generally depends on a mixture of luck and content quality. Sometimes you may "apply" for a discussion of our website, but there's no guarantee that it will be done.

Yahoo (www.yahoo.com) and Google (www.google.com), although mentioned here as examples of a directory and crawler respectively, are in fact hybrid engines, as are nowadays most major search machines. As a rule, a hybrid search engine will favor one type of listing over another.

For example: Yahoo is more likely to present human-powered listings, while Google prefers its crawled listings.

4. Meta Search Engines: Another approach to searching the vast Internet is the use of a multi-engine search, or meta-search engine that combines results from a number of search engines at the same time and lays them out in a formatted result page. A common or natural language request is translated to multiple search engines, each directed to find the information the searcher requested. The search engine's responses thus obtained are gathered into a single result list. This search type allows the user to cover a great deal of material in a very efficient way, retaining some tolerance for imprecise search questions or keywords.

Examples: MetaCrawler (http://www.metacrawler.com) and DogPile (http://www.dogpile.com). MetaCrawler refers our search to seven of the most popular search engines (including AltaVista and Lycos), then compiles and ranks the results.

Paid Inclusion Engines: With these engines we have no way other than to pay a recurring or one-time fee to keep our site either listed, re-spidered, or top-ranked for keywords of our choice. There are very few search engines that solely focus on paid listings, most search engines offer a paid listing option as a part of their indexing and ranking system. Yahoo and Google are the largest paid listing providers, and Live Search (formerly MSN) also sells paid placement listings.

1 comment:

  1. Thanks for the wealth of information you’ve provided here! I appreciate how easily accessible all of this advice is, and have definitely picked up more than one idea for how I’m going to move forward with my own Internet Marketing For more information Internet Marketing Company .

    ReplyDelete