Ad verba per numeros

Artículos
Wednesday, August 5, 2009, 09:59 PM
I've just found this recent paper by some Googlers:

Gazpacho and summer rash: Lexical relationships from temporal patterns of web search queries. E. Alfonseca, M. Ciaramita and K. Hall. Empirical Methods in Natural Language Processing (EMNLP). 2009.

In this paper we investigate temporal patterns of web search queries. We carry out several evaluations to analyze the properties of temporal profiles of queries, revealing promising semantic and pragmatic relationships between words. We focus on two applications: query suggestion and query categorization. The former shows a potential for time-series similarity measures to identify specific semantic relatedness between words, which results in state-of-the-art performance in query suggestion while providing complementary information to more traditional distributional similarity measures. The query categorization evaluation suggests that the temporal profile alone is not a strong indicator of broad topical categories.

I've found it really enjoying, specially because one of my students (Manuel Tejeiro) recently finished his final year project and it was not another thing that a framework to perform time series analysis. In fact, for most of the testing he was using the AOL 2006 query log obtaining fairly interesting results (in bold the input query):

california lottery
lottery, ny lottery, georgia lottery, michigan lottery, mass lottery, calottery.com, ohio lottery, new jersey lottery, njlottery, ...
academy awards
oscars, crash, oscar winners, box office, walk the line, ...
disney channel
www.disneychannel.com, disneychannel, cartoonnetwork, disneychannel.com, ikea (?), nick.com, hilary duff, ...
Thus, in addition to the paper I talked above I'd suggest you the following readings (Manuel's project was a really nice integration of ideas and methods from all of them):Enjoy the reading!

P.S. If you are still hungry you can find yet another paper related to query log analysis and including food in the title :) "From 'dango' to 'japanese cakes': Query Reformulation Models and Patterns" by Paolo Boldi, Francesco Bonchi, Carlos Castillo and Sebastiano Vigna.



Next