Real Time Search Logs Expose Different Behaviors

by Admin


19 Apr
 None    Internet Related


by Ian Everdell


by Ian Everdell
http://www.enquiro.com

That real time search is different from web search probably isn’t that surprising to you. But now we have the proof: Jim Jansen, a professor at Penn State and good friend of Enquiro, collected 6 months worth of real time search log data and is presenting the results at CHI 2010 this week.


The Methodology
Jim and his colleagues recorded real time search logs from Collecta from June to December 2009. This resulted in just over 1 million search queries that they analyzed at a term (individual words) and query (a string of two or more terms) level for frequency, length, and mutual information, which is a measure of the association between two terms.

Search Source
Jim and his colleagues found that 60% of the searches on Collecta actually came through the Collecta API, rather than the web interface. This is very different from traditional web search engines.

Unique Searches
Studies of typical web search behavior show that the number of unique search queries can be as high as 59%, but Jim and his colleagues found that in real time search this number is much lower at 30%. They suggest that this is because many users are searching for the same timely information, particularly about entertainment, technology, and politics. An interesting corollary to this is that they also found that the same query being submitted repeatedly by the same user over a period of time.

Query Length & Type
This is one measure where real time and normal web search show similar patterns: the average query length was 2.32 terms. Popular queries, as I mentioned above, included entertainment, tech, and political topics. Unlike web search, however, real time searchers are not often looking for things that are pornographic in nature.

Search Terms
The number of unique terms and unique term pairs was similar to web search, but unlike web search, the individual terms that were most strongly associated with each other (that is, were most often found in the same query) tended to be related to people who were in the news during the data collection period.

What does it mean?
Jim and his colleagues sum up that:

  • there is heavy use of real time search through APIs
  • many of the searches through the API are repetitive over a period of time
  • real time searchers are looking for different information than web searchers


and suggest that real time could leverage this by allowing users to save searches and employing faceted search interfaces.

It’s exciting to see research into real time search making its way to the public. If you want to read the whole paper, you can read it on Jim’s website (PDF).


© 2009 Enquiro Search Solutions.




News Categories

Ads

Ads

Subscribe

RSS Atom