Research on Rare Events

My work on rarity shows a natural progression from fundamental work on rare cases and small disjuncts to work on class distribution and then changing class distributions. However, in between those two research areas I worked on the problem of predicting very rare events. This research was motivated by the problem of predicting telecommunication equipment failures from sequences of alarm messages, which I faced while working at AT&T Labs. In this work the events to be predicted were extremely rare and this impacted the requirements for the learning system. Furthermore, because the underlying data was time-series data and because virtually all classification programs require an example-based representation of the data, I faced a sizeable challenge. Because the approach of aggregating the data from time-series data to examples using a sliding window approach would most likely obscure subtle patterns in the data, I designed and implemented Timeweaver, a new genetic algorithm-based classification system that could operate natively on time-series data. The fitness function that I utilized, combined with the search capabilities of the genetic algorithm, made it possible to find subtle patterns in the data that were useful for predicting rare events. This work was described in two main papers, one of which focused on the general problem of event prediction and how to formulate this as a classification problem (Weiss & Hirsh, 1998) and a second paper that focused on the genetic-algorithm based prediction system (Weiss, 1999). This work was notable because it was one of the early efforts in using genetic algorithms for data mining and because the event prediction problem had not previously been formulated clearly and comprehensively as a classification problem. These papers have both been cited quite substantially and the KDD paper on event prediction (Weiss & Hirsh, 1998) has been cited by over 125 other papers.