Just about everyone knows what a search engine is. Whenever you have a question, want to look up the address of your favorite restaurants or need to make a qualified online purchase, chances are, you visit a search engine on the Internet.
If you’ve ever used two different search engines to conduct the same search query, then you will have noticed that the results were not the same. So why will the same query on different search engines produce different results?
Part of the answer is because not all search engine indexes are going to be exactly the same, as it depends on what the robots find or what information humans have submitted to the database. But more importantly, not every search engine uses the same algorithm to search through their databases.
An algorithm is what the search engines use to determine the relevance of the information in the database to what the user is searching for.
A search algorithm is defined as a math formula that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. A search engine algorithm uses keywords as the input problem, and returns relevant search results as the solution, matching these keywords to the results stored in its database. These keywords are determined by search engine robots that analyze web page content and keyword relevancy based on a math formula that will vary from one search engine to the next.
Types of Information that Factor into Algorithms
Some services collect information on the queries individual users submit to search services, the pages they look at subsequently, and the time spent on each page. This information is used to return results pages that most users visit after initiating the query. For this technique to succeed, large amounts of data need to be collected for each query. Unfortunately, the potential set of queries to which this technique applies is small, and this method is open to spamming.
Another approach involves analyzing the links between pages on the web on the assumption that pages on the topic link to each other, and authoritative pages tend to point to other authoritative pages. By analyzing how pages link to each other, an engine can both determine what a page is about, and whether that page is considered relevant. Similarly, some search engine algorithms figure internal link navigation into the picture. Search engine spiders follow internal links to weigh how each page relates to another, and considers the ease of navigation. If a spider runs into a dead-end page with no way out, this can be weighed into the algorithms as a penalty.
Original search engine databases were made up of all human classified data. This is a fairly archaic approach, but there are still many directories that make up search engine databases, like the Open Directory (also known as DMOZ), that are entirely classified by people. Some search engine data are still managed by humans, but after the algorithmic spiders have collected the information.
One of the elements that a search engine algorithm scans for is the frequency and location of keywords on a web page. Those with higher frequency are typically considered more relevant. This is referred to as keyword density. It’s also figured into some search engine algorithms where the keywords are located on a page.
Like keywords and usage information, meta tag information has been abused. Many search engines do not factor in meta tags any longer, due to web spam. But some still do, and most look at Title and Descriptions. There are many other factors that search engine algorithms figure into the calculation of relevant results. Some utilize information like how long the website has been on the Internet, and still others may weigh structural issues, errors encountered, and more.
The first technology to be defined as a 'search engine' goes back to 1990. It was named Archie and created by Alan Emtage, a student at McGill University in Montreal. The commercial search engines that we are familiar with today started to appear in the mid 90's - Yahoo Directory in 1994, LookSmart in 1995, Ask Jeeves in 1997 and Alta Vista in 1995. Google was launched in 1998.
It took website owners only a short time after the first appearance of commercial search engines to recognize the value of having their sites highly ranked in search engine results. The more visible the site, the more people clicked on the site. Clicks tuned into cash as the Internet became money-driven through advertising revenues, e-commerce and other commercial opportunities. Webmasters sought to find ways by which their sites would appear at the top of search returns and in so doing created what has since become the multi million-dollar Search Engine Optimisation (SEO) industry. As ever-resourceful webmasters and search marketers sought to find ways by which their sites would appear at the top of search returns, the search companies worked hard to keep ahead of them.
Over the last decade search marketing techniques used to ensure top positions in search results have changed repeatedly, responding to the search companies evolving algorithms applied to determine which website is relevant to any given query. This game of cat and mouse reveals the essential tension between the search engines and web marketers. The search companies work hard to keep ahead of ever resourceful webmasters and search engine marketers, including some more unscrupulous search engine marketers, try to outwit the engines to achieve highest possible rankings. In the early days shrewd deployment of descriptive file names, page titles and Meta descriptions proved to be effective optimisation techniques. As search algorithms advanced in an effort to combat this, search returns choked up with irrelevant pages and spam. As a result, page factors such as keyword densities grew more important.
Off-page factors such as 'PageRank' (based upon the number and strength of inbound links) became an increasingly important factor. As each of these search criteria became used and abused, the algorithms evolved as the search companies worked to stay one step ahead. These days the search engines use sophisticated, secret and complex algorithms incorporating numerous criteria to determine site ranking and generate relevant results.
Google co-founder Larry Page defines the 'perfect search engine,' as something that, "understands exactly what you mean and gives you back exactly what you want"; an ambition shared by all major search engines. Simply put, the job of a search engine is to index the web and provide the best quality and most relevant content possible. Returns that fulfil the users search desires provide the ultimate query/return match. While each search engine is different, they all share the same goal - to be the best.
In order to preserve the ideal of relevance the search engine algorithms are very closely guarded secrets. These codes are the search companies primary differential; the means by which they claim their competitive advantage over each other, thriving or failing as organisations through the ability of the codes to deliver what users want, namely relevance.
If marketers knew the exact algorithms then they could manipulate rankings as they wished until the search results became so irrelevant that the search engines became worthless and devaluing the relevance of the companies that provide them. In order to combat this the search companies strategically adjust their algorithms regularly in order to combat the threat.
The Search Marketing industry has evolved around trying to interpret the algorithms and provide web information in the way that the search engines dictate (some companies employing more legitimate methods than others). Any organisation claiming to know the search engine algorithms is being talking crap. It's through experience, expertise and an holistic search marketing approach that quality returns are generated, not through trying to rig the system or play tricks. On the professional level effective search marketing is not about quick fixes, smoke and mirrors. Any organisation serious about it's search engine positioning should make a point of using a reputable a search marketing consultancy - like Peter Woolf - for long lasting and effective optimisation.
That Google appears to have won the hearts and minds of web users, some say, is through their focus on understanding relevance as the critical fact of Internet life much better than other companies who have compromised the integrity of their search returns with short-term financial returns. Google's philosophy states that if you focus on the user then all else will follow. They claim that "while many companies claim to put their customers first, few are able to resist the temptation to make small sacrifices to increase shareholder value". They may have a point - Google drives the majority of web search traffic and enjoys approximately 70% of world wide searches (75% in the UK) with over 7 billion having been conducted in the USA alone. WPP-owned research company Millward Brown reports that a combination of brand recognition and financial performance gave Google the top spot in a list of global brands. New research estimates its value to be $86bn (£43bn), a 30% year-on-year increase.
Here are two aspects of Search Marketing that has become increasingly important across all the search engines that within each, contain numbers of signals that contribute to the search algorithms.
PageRank: considers millions of variables and billions of terms to determine the importance of pages. Important pages are more likely to appear at the top of the search results by receiving a higher PageRank. Inbound links from sites of authority carry great weight, casting positive votes and giving recipient pages greater value.
Hypertext-Matching Analysis: Search engines pay very close attention to page content. As well as scanning for page-based text (which can be manipulated by site publishers through meta-tags), their technology analyzes the full page content including fonts, subdivisions and the exact locations of each word. Neighbouring web pages are also scanned for content to ensure the results returned are contextual and the most relevant to a user's query.
Anatomy of a large-scale hypertext web search engine
In this rare paper, Google co-founders Sergey Brin and Lawrence Page explain the anatomy of a large-scale hypertextual web search engine and its main functions.
In order to develop an independant and objective ranking system that has integrity, is both fair to everyone and is efficient for all end users searching on a specific keyword or keyphrase, Google has developed the Page Rank (PR) Algorithm.
The Google Page Rank value relies on the uniquely democratic nature of the Internet by using its vast global link structure as a prime indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives. It also analyzes the page that casts the vote. Votes cast by pages that are themselves important or are favorably viewed as "established firms" in the Web community weigh more heavily and help to make other pages look established too.
Established, high-quality sites receive a higher Page Rank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query. So, Google combines Page Rank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.
Google's complex, automated algorithm makes human tampering with their results extremely difficult. And although Google does run relevant ads above and next to their results, Google does not sell placement within the results themselves (i.e., no one can buy a higher PageRank). A Google search is an easy, honest and objective way to find high-quality websites with information relevant to users searching specific products, services or information on a particular subject.