The Evolution of Search Engine Technology
How A Search Engine Works
A search engine takes advantage of the hyperlinks that connect Web sites on the Internet. A software program called a Web crawler automatically browses the Web in a methodical way and sends out inquiries that “crawl” from site to site. Web crawlers are also called Web spiders or Web bots (from “robots”). Web crawlers are not objects that physically move, but rather programs that compile information in specific ways. Web crawlers send out requests to Web addresses on other computers. And it is not one crawler, but 10,000.
Since the crawler is a software program, it is given different instructions on different computers. For instance, WebCrawler, a program launched in 1994, was the first software to index entire Web sites rather than just page titles. Search engine crawlers operate within different sets of instructions or parameters, such as to search titles and first paragraphs only, or to search entire documents, including the metadata. (Metadata, data about data, includes tags that may not be visible when looking at the page and can include the document's title, keywords, the publisher, and other information.) In an effort to more universally create Web content, the Dublin Core Metadata Initiative (DCMI) in 1995 established standards for the Web in fifteen different categories. These standards would eventually facilitate search engine keyword searches and are today often included as a part of the HTML (hypertext markup language) code that is commonly used to create Web pages.
The information the crawler software collects is automatically put into an index. When a query is submitted to a search engine, the query is submitted to that search engine's index. Each search engine has its own index. Thus the index searched by Google is not the same as the one searched by Yahoo! or MSN (Microsoft Network), though some companies have shared indexes. Different search engines vary in the quantity of information they return and what type of information they index. One of the challenges they face is the sifting of data because Web sites are sorted by machines that cannot read or interpret data like humans can. Computers cannot evaluate data like human indexers, who understand exactly what something means or can determine its relevance.
Job Descriptions and Careers, Career and Job Opportunities, Career Search, and Career Choices and ProfilesCool Science CareersThe Evolution of Search Engine Technology - How A Search Engine Works, Filtering Unwanted Data, Keywords, Early Progress, Early Problems