Text size
  • Small
  • Medium
  • Large
  • Standard
  • Blue text on blue
  • High contrast (Yellow text on black)
  • Blue text on beige

    Feature Selection: A Useful Preprocessing Step

    19th Annual BCS-IRSG Colloquium on IR

    Aberdeen, UK. 8th - 9th April 1997


    I. Moulinier


    Statistical classification techniques and machine learning methods have been applied to some Information Retrieval (IR) problems: routing, filtering and categorization.

    Most of these methods are usually awkward and sometimes intractable in highly dimensional feature spaces.

    In order to reduce dimensionality, feature selection has been introduced as a pre-processing step.

    In this paper, we assess to what extent feature selection can be used without causing a loss in effectiveness. This problem can be tackled since a couple of recent learners do not require a preprocessing step.

    On a text categorization task, using the Reuters-22,173 collection, we give empirical evidence that feature selection is useful: first, the size of the collection index can be drastically reduced without causing a significant loss in categorization effectiveness.

    Then, we show that feature selection speeds up the time required to automatically build the categorization system.


    PDF filePDF Version of this Paper (108kb)