|on 24 Sep Posted by Admin Category: Search Engines|
Submitted by pupshur
As search engines and directories have increased in number, sophistication, and complexity, so too has the art and science of search engine positioning. Existing search engines and directories are constantly changing their search engine algorithms and striking new alliances, and new search engines keep emerging. Search engines have evolved from academic and research tools with intriguing names like Archie, Gopher, Veronica, and Jughead to some of the most-visited sites on the Web (still with novel names like Yahoo! and Google). Originally designed by scientists to find information across small networks, these tools have evolved tremendously. Today, they are used by millions of consumers looking for information as well as vendors and products in the burgeoning world of e-commerce.
1990 - 1993: The Early Entrants
In 1990, before the World Wide Web as we know it even existed, Alan Emtage at McGill University created Archie, the first search tool. Back then, the primary method for sharing data was through anonymous FTP (file transfer protocol) of files stored on FTP servers. These files were frequently stored on FTP archive sites where individuals could deposit and retrieve files for sharing. However, there is no method for searching these FTP servers and their archives, and many of the servers had just a few files available. To locate a file, the researcher had to know its exact server address .... very inconvenient.
Emtage altered the landscape permanently when he designed Archie to search the archive sites on FTP servers. He initially wanted to call the program "Archives." but the confines of Unix syntax forced him to shorten the name to Archie. Short, catchy name have typified search engines ever since.
Archie used a script-based data retriever to gather site listings of anonymous FTP files and gave users access to the database through an expression matcher. The development of a database with searchable results from multiple anonymous sources paved the way for the search tools and massive databases we use today.
Archie was so popular for finding files on FTP sites that other researchers began developing tools for searching other types of available electronic information. In the early 1990s, many plain text documents were available on Gopher servers, where they could be retrieved and read anonymously. In 1193, the University of Nevada System Computing Services group developed Veronica. This search device was similar to Archie but searched Gopher servers for text files.
The problem with Veronica was that it didn't group the results it returned in any way that gave users an understanding of the possible content of the pages. The results returned for Java could just as easily be for code as for coffee.
Soon after the development of Archie, the comic-strip family was complete with the entry of Judhead, a search tool with functionality similar to Veronica. But the information tidal wave of the Web would soon blow away all three comic characters.
In 1989, Tim Berners-Lee at the European Laboratory for Particle Physics (CERN) invented HTML (Hypertext Markup Language), which allowed users to structure pages that could include images, sound, and video along with text. With its hyperlink capability, HTML made it easier to link together documents from different servers. Then the Web tidal wave was really hit with the development of Mosiaic, a browser that could take advantage of this functionality. A team of developers at the National Center for Supercomputing Applications (NCSA) at the University of Illinois developed Mosaic and made the browser available for free across the Internet, in accordance with NCSA regulations.
This led to the rapid development of the Internet as we know it today. With the ability to include images, sound, and video clips in easily viewable hypertext, HTML and the Web rapidly replaced Gophers and FTP file repositories.
In June 1993, Matthew Grey of MIT launched the first "spider" robot on the Web. The World Wide Web Wanderer was designed as a research tool to track the size and growth of the Web. At first, it simply counted the rapidly growing searchable Web database, the Wandex.
The Wanderer created quite a stir because early versions of this spider (and similar research spiders) were able to quickly overwhelm networks and create networks performance degradation. Because the spiders could make multiply document request in a short period of time, they acted as if a large volume of users had all logged in at once. In the limited bandwidth environment of the early Web, this created huge problems. Unless the spidered server could handle this traffic, a rampant spider would quickly overwhelm it. Early spiders frequently visited the same site multiple times in a single day, creating serious havoc.
By the end of 1993, programmers were rapidly answering the call for developing newer and better search technology. These included the robot-driven search engines JumpStation, World Wide Web Worms, and the Repository-Based Software Engineering (RBSE) spiders developed by NASA. JumpStation's robot gathered document titles and heading and then indexed them by searching the database and matching keywords. This methodology worked for smaller databases, but not for the rapidly growing Internet. The WWW Worm indexed Title tags and URLs.
JumpStation and World Wide Worm did not sift the results in any way. They would simply deliver a large number of matches. RBSE, however, spidered by content and included the first relevancy algorithms in its search results, based on keyword frequency in the document. Keyword frequency, Title tags, and URLs are all still leveraged by SEP professionals.
1994 - 1999: Along Came a Spider...
By 1995, the original 200 Web servers had grown exponentially. This hugh growth created a fertile field for search developers, who sought better ways for searchers to find information in the ever-growing database and created the popular search engines of today. This growth in capabilities and popularity also paralleled the rapid expansion of the number of commercial browsers available and of Web usage.
Between 1994 and 1994, new technology for browsing was introduced and popularized. The introduction of Netscape and Microsoft's Internet Explorer added a user-friendly point-and-click interface for browsing. This, coupled with the broad availability of computers with Web connectivity, democratized the Web. If left the halls of academic forever and moved into offices and homes.
During this same period, the sales of computers outstripped the sales of televisions in the United States for the first time in history. All that stood between these PCs and the Web were modems and connections. (By 1995, the commercial backbones and infrastructure to support the Web pages were firmly in place, and NSFNET, the original Internet, simply retired.