|
|
Gary Weiss's Research
I have conducted research in several areas:
-
Class Distribution:
In machine learning and data mining, the data is typically assumed to
come from some distribution D. However, the training data need not be
drawn from this distribution. If the class distribution of the training
data does not match D, how will this affect learning? Research I have
conducted along with
Foster Provost
shows that the naturally occurring
class distribution (i.e., the class distribution associated with D) does not
always yield the best results. In particular, we show that often other
class distributions will yield substantial improvements in classifier
performance. We further show that when data is expensive to procure,
one can minimize the impact of a reduced training set size by carefully
selecting the class distribution of the training data. A progressive,
adaptive, sampling algorithm is presented that chooses data such that
the resulting data set will have a class distribution that performs well.
This research is described in "Learning When Training Data are Costly: The
Effect of Class Distribution on Tree Induction", Journal of Artificial
Intelligence Research, 19: 315-354, which is available from my
publication page.
-
Small Disjuncts: When machine
learning is used to learn for examples (induction), the concept that
is learned is often expressed as something that is either a disjunction
or similar to a disjunction of subconcepts. Some of these disjuncts
in the learned concept cover very few examples, and these are referred
to as small disjuncts. In my research I have investigated how and why
small disjuncts are formed, how they affect learning, how they interact
with noise to make learning especially difficult and how they relate to
pruning I have also performed the most comprehensive study of small
disjuncts by analyzing 30 datasets with respect to small disjuncts.
I also maintain a small disjuncts web page
that provides a comprehensive bibliography and summary of research in this
area.
My main related publications, available from my
publication page are: "Learning with Rare
Cases and Small Disjuncts" (ICML '95), "The Problem with Noise and Small
Disjuncts" (ICML '98) and "A Quantitative Study of Small Disjuncts"
(AAAI '00).
-
Event Prediction: This research involves learning to predict the
occurrence of specific types of events, given an incoming, timestamped,
stream of events. For example, in the application I focussed on,
the task was to predict the failure of equipment within a telecommunication
switch based on an incoming steam of alarm messages. Because existing
machine learning tools and methods are not able to handle this type
of problem, I developed Timeweaver, a genetic algorithm-based system
capable of identifying predictive patterns of events. A more detailed
description of Timeweaver and more information is available from the
Timeweaver web page.
My main related publications, available from my
publication page are:
"Learning to Predict Rare Events in Event Sequences" (KDD '98) and
"Timeweaver: a Genetic Algorithm for Identifying Predictive Patterns in
Sequences of Events" (GECCO '98). This work is also described
in a chapter I wrote for the "Handbook of Knowledge Discovery and Data Mining",
Oxford University Press (2002).
-
Expert System and Object Technology: This research was primarily
triggered by my work at AT&T between 1996-1998. During that time I
worked on the design and development of an expert system to help maintain
the 4ESS switches in the AT&T network. What made this effort of some
note is that fact that we implemented the system using object-oriented
rules, which are provided for by R++, which is an extension to the C++
language.
The main related publications, available from my
publication page are:
"ANSWER" Network Monitoring using Object-Oriented Rules" (IAAI '98)
and "Implementing Design Patterns with Object Oriented Rules" (Journal of
Object Oriented Programming).
-
Telecommunications: I have worked extensively in the area of
telecommunications. Besides the previously described expert system
work in this field, I have experience in applying data mining methods
to solve telecommunication problems. I have published several book
chapters related to telecommunications (available from my
publication page. These include:
-
"Data Mining in Telecommunication" and "Mining Rare Cases" in Data Mining
and Knowledge Discovery Handbook: A Complete Guide for Practitioners and
Researchers from Kluwer Academic Publishers (2005).
-
"Predicting Telecommunication Equipment Failures from Sequences of Network
Alarms", in the Handbook of Knowledge Discovery and Data Mining from
Oxford University Press (2002).
-
"Intelligent Telecommunication Technologies", in Knowledge-based Intelligent
Techniques from CRC Press (1998).
|
|