Latent Semantic Indexing

by Admin


20 Feb
 None    Search Engines


by Jose Nuñez


by Jose Nuñez
http://www.hirank.com

In this article I will introduce Latent Semantic Indexing (LSI) to those not familiar with the term, and the basic notion of what it is and why it matters. As Google, Yahoo and MSN increase the use of this evaluation method in their quest for more relevant and higher quality results, LSI or related applications will become active components of this process.
An introduction to LSI.
Translated in easier terms, 'latent' means hidden, 'semantic' is meaning, therefore 'latent semantic indexing' means hidden meaning indexing. This application of information retrieval technology, which is based on the vector space model of document classification, evaluates the content of pages within an entire site, and determines the common theme of that site. This application is slowly taking more importance from on-page factors like keyword density, or off-page factors like page rank, in the evaluation/ranking process of the search engines.

Quick facts about LSI.
LSI is 30% more effective than popular word matching methods, specially in cross language retrievals. LSI can retrieve relevant information that does not contain query words, using a fully automatic statistical method called singular value decomposition. LSI also considers documents that have many words in common to be semantically close, and ones that have few words in common to be distant.

The LSI vector process.
LSI assumes that there is some underlying or 'hidden' structure in word usage that is partially obscured by variability in word choice. So, a truncated singular value decomposition is used to estimate the structure in word usage across documents. At this point, retrieval is performed using the database of singular values and vectors obtained from the truncated decomposition. Data shows that these statistically derived vectors are more robust indicators of meaning than of individual terms.

Why does LSI matter.
Since Google has a 'Sandbox' or 'Trustbox' filter applied to new sites with no trust in the form of quality links from their neighbors, the application of LSI becomes a good resource. Search engines are able to use this LSI in their databases to associate certain terms with concepts when ranking pages. If applied, sites with excellent content are not penalized because of being new or not having enough trusted links.

How to benefit from LSI.
If you write your content with your theme in mind, and focus on your visitors, you will have a much greater chance of ranking higher in LSI driven SERPs. Develop your site around a theme, using relevant, related synonyms. Because of this, you will begin to rank well for terms that are not even on your page. Stemming is a clear way of how Google is applying this methodology on a daily basis.


Jose Nuñez is a Scientific SEO/SEM Consultant. He is the CTO of HiRank, an Online Resource focusing on Search Engines (SE) and Artificial Intelligence (AI).

Find out more high search engine ranks info at http://www.hirank.com or contact him at -email- or via phone at (801) 388-5961


News Categories

Ads

Ads

Subscribe

RSS Atom