Text size
  • Small
  • Medium
  • Large
Contrast
  • Standard
  • Blue text on blue
  • High contrast (Yellow text on black)
  • Blue text on beige

    Normalization and Matching in the DORO System

    21st Annual BCS-IRSG Colloquium on IR

    Glasgow. 19th - 20th April 1999

    AUTHORS

    C.H.A. Koster, C. Derksen, D. van de Ende & J. Potjer

    ABSTRACT

    This paper is concerned with the use of linguistically motivated phrases as indexing terms in Information Retrieval applications.

    Apart from the conventional noun phrases, we propose to use verb phrases as index terms for text classification. Techniques for phrase matching through syntactic normalization and semantical matching are described.

    We discuss the realization of the syntactic normalization of phrases by transduction to frames. Semantical normalization is based on lexico-semantical relations, taking into account certain properties of the classification algorithms used.

    The ideas described here are being implemented in the Document Routing system DORO, in which statistical learning algorithms are applied to document profiles consisting of phrases.

    This paper describes the rationale behind work in progress, rather than presenting final results.

    PAPER FORMATS

    PDF filePDF Version of this Paper (82kb)