Gary Weiss's Research

I have conducted research in several areas:

  • Class Distribution: In machine learning and data mining, the data is typically assumed to come from some distribution D. However, the training data need not be drawn from this distribution. If the class distribution of the training data does not match D, how will this affect learning? Research I have conducted along with Foster Provost shows that the naturally occurring class distribution (i.e., the class distribution associated with D) does not always yield the best results. In particular, we show that often other class distributions will yield substantial improvements in classifier performance. We further show that when data is expensive to procure, one can minimize the impact of a reduced training set size by carefully selecting the class distribution of the training data. A progressive, adaptive, sampling algorithm is presented that chooses data such that the resulting data set will have a class distribution that performs well.

    This research is described in "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction", Journal of Artificial Intelligence Research, 19: 315-354, which is available from my publication page.

  • Small Disjuncts: When machine learning is used to learn for examples (induction), the concept that is learned is often expressed as something that is either a disjunction or similar to a disjunction of subconcepts. Some of these disjuncts in the learned concept cover very few examples, and these are referred to as small disjuncts. In my research I have investigated how and why small disjuncts are formed, how they affect learning, how they interact with noise to make learning especially difficult and how they relate to pruning I have also performed the most comprehensive study of small disjuncts by analyzing 30 datasets with respect to small disjuncts.

    I also maintain a small disjuncts web page that provides a comprehensive bibliography and summary of research in this area.

    My main related publications, available from my publication page are: "Learning with Rare Cases and Small Disjuncts" (ICML '95), "The Problem with Noise and Small Disjuncts" (ICML '98) and "A Quantitative Study of Small Disjuncts" (AAAI '00).

  • Event Prediction: This research involves learning to predict the occurrence of specific types of events, given an incoming, timestamped, stream of events. For example, in the application I focussed on, the task was to predict the failure of equipment within a telecommunication switch based on an incoming steam of alarm messages. Because existing machine learning tools and methods are not able to handle this type of problem, I developed Timeweaver, a genetic algorithm-based system capable of identifying predictive patterns of events. A more detailed description of Timeweaver and more information is available from the Timeweaver web page.

    My main related publications, available from my publication page are: "Learning to Predict Rare Events in Event Sequences" (KDD '98) and "Timeweaver: a Genetic Algorithm for Identifying Predictive Patterns in Sequences of Events" (GECCO '98). This work is also described in a chapter I wrote for the "Handbook of Knowledge Discovery and Data Mining", Oxford University Press (2002).

  • Expert System and Object Technology: This research was primarily triggered by my work at AT&T between 1996-1998. During that time I worked on the design and development of an expert system to help maintain the 4ESS switches in the AT&T network. What made this effort of some note is that fact that we implemented the system using object-oriented rules, which are provided for by R++, which is an extension to the C++ language.

    The main related publications, available from my publication page are: "ANSWER" Network Monitoring using Object-Oriented Rules" (IAAI '98) and "Implementing Design Patterns with Object Oriented Rules" (Journal of Object Oriented Programming).

  • Telecommunications: I have worked extensively in the area of telecommunications. Besides the previously described expert system work in this field, I have experience in applying data mining methods to solve telecommunication problems. I have published several book chapters related to telecommunications (available from my publication page. These include:

    • "Data Mining in Telecommunication" and "Mining Rare Cases" in Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers from Kluwer Academic Publishers (2005).

    • "Predicting Telecommunication Equipment Failures from Sequences of Network Alarms", in the Handbook of Knowledge Discovery and Data Mining from Oxford University Press (2002).

    • "Intelligent Telecommunication Technologies", in Knowledge-based Intelligent Techniques from CRC Press (1998).
 
top | My homepage