Text size
  • Small
  • Medium
  • Large
Contrast
  • Standard
  • Blue text on blue
  • High contrast (Yellow text on black)
  • Blue text on beige

    Towards a better understanding of language model information retrieval

    2nd BCS IRSG Symposium: Future Directions in Information Access 2008

    London, 22nd September 2008

    AUTHORS

    M. van der Heijden, I.G. Sprinkhuizen-Kuyper & Th.P. van der Weide

    ABSTRACT

    Language models form a class of successful probabilistic models in information retrieval. However, knowledge of why some methods perform better than others in a particular situation remains limited. In this study we analyze what language model factors influence information retrieval performance. Starting from popular smoothing methods we review what data features have been used. Document length and a measure of document word distribution turned out to be the important factors, in addition to a distinction in estimating the probability of seen and unseen words. We propose a class of parameter-free smoothing methods, of which multiple specific instances are possible. Instead of parameter tuning however, an analysis of data features should be used to decide upon a specific method. Finally, we discuss some initial experiments.

    PAPER FORMATS

    PDF filePDF Version of this Paper (103kb)