|
  |
  |
|
|
  |
|
|
|
|
Research on Rare Events
My work on rarity shows a natural progression from fundamental work on rare
cases and small disjuncts to work on class distribution and then changing
class distributions. However, in between those two research areas I worked
on the problem of predicting very rare events. This research was motivated
by the problem of predicting telecommunication equipment failures from
sequences of alarm messages, which I faced while working at AT&T Labs. In
this work the events to be predicted were extremely rare and this impacted
the requirements for the learning system. Furthermore, because the underlying
data was time-series data and because virtually all classification programs
require an example-based representation of the data, I faced a sizeable
challenge. Because the approach of aggregating the data from time-series
data to examples using a sliding window approach would most likely obscure
subtle patterns in the data, I designed and implemented Timeweaver, a new
genetic algorithm-based classification system that could operate natively
on time-series data. The fitness function that I utilized, combined with the
search capabilities of the genetic algorithm, made it possible to find subtle
patterns in the data that were useful for predicting rare events. This work
was described in two main papers, one of which focused on the general problem
of event prediction and how to formulate this as a classification problem
(Weiss & Hirsh, 1998)
and a second paper that focused on the genetic-algorithm based
prediction system
(Weiss, 1999).
This work was notable because it was one of the early
efforts in using genetic algorithms for data mining and because the event
prediction problem had not previously been formulated clearly and
comprehensively as a classification problem. These papers have both been
cited quite substantially and the KDD paper on event prediction
(Weiss & Hirsh, 1998)
has been cited by over 125 other papers.
|
|
|
|
|
|
| |
|
|
| |