Workshop on
Utility-Based Data Mining
August 21, 2005 in Chicago, Illinois

Held in conjunction with
The 11th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD 2005)

The workshop proceedings are now available (2.5MB).
You can also view individual papers (see the program below).

New: Consider submitting an article to DMKD's special issue on Utility-Based Data Mining
There will be a second UBDM workshop held in conjunction with KDD-06

Description Topics Program Submissions Important Dates Organizers Program Committee

Workshop Description

Early work in predictive data mining did not address the complex circumstances in which models are built and applied. It was assumed that a fixed amount of training data were available and only simple objectives, namely predictive accuracy, were considered. Over time, it became clear that these assumptions were unrealistic and that the economic utility of acquiring training data, building a model, and applying the model had to be considered. The machine learning and data mining communities responded with research on active learning, which focused on methods for cost-effective acquisition of information for the training data, and research on cost-sensitive learning, which considered the costs and benefits associated with using the learned knowledge and how these costs and benefits should be factored into the data mining process.

All the different stages of the data mining process are affected by economic utility. In the data acquisition phase we have to consider the costs of obtaining training data, such as the cost of labelling additional examples or acquiring new feature values. In applying the data mining algorithm, we have to consider the running time of the algorithm and the costs and benefits associated with cleaning the data, transforming the data and constructing new features. Economic utility also impacts the assessment of the decisions made based on the learned knowledge. Simple assessment measures like predictive accuracy have given way to more complex economic measures, including measures of profitability. These considerations can in turn impact policies for model induction. The latter topic has received more attention in the context of cost-sensitive learning.

Almost all work that considers the impact of economic utility on data mining focuses exclusively on one of the stages in the data mining process. Thus, economic factors have been studied in isolation, without much attention to how they interact. This workshop will begin to remedy this deficiency by bringing together researchers who currently consider different economic aspects in data mining, and by promoting an examination of the impact of economic utility throughout the entire data mining process. This workshop will attempt to encourage the field to go beyond what has been accomplished individually in the areas of active learning and cost-sensitive learning (although both of these areas are within the scope of this workshop). In addition, existing research which has addressed the role of economic utility in data mining has focused on predictive data mining tasks. This workshop will begin to explore methods for incorporating economic utility considerations into both predictive and descriptive data mining tasks.

This workshop will be geared toward researchers with an interest in how economic factors affect data mining (e.g., researchers in cost-sensitive learning and evaluation and active learning) and practitioners who have real-world experience with how these factors influence data mining. Attendance is not limited to the paper authors and we strongly encourage interested researchers from related areas to attend the workshop. This will be a full-day workshop and will include invited talks, paper presentations, short position statements and two panel discussions.

Workshop Topics
  • Types of economic factors in data mining
    • What economic factors arise in the context of data mining and to what stage of the data mining process do they apply?
    • What assessment metrics are used in response to these economic factors?
    • Can the use of economic utility help address previously studied problems in data mining, such as the problems of learning rare classes and learning from skewed distributions?
  • Algorithms
    • Utility-based approaches for information acquisition, data preprocessing, mining and knowledge application. This includes work in active learning/sampling and cost-sensitive learning.
    • This workshop will also address how predictive and descriptive data mining tasks such as predictive modeling, clustering and link analysis can be adapted to incorporate economic utility.
  • Consideration of economic utility throughout the data mining process
    • Work towards a comprehensive framework for incorporating economic utility to benefit the entire data mining process. This work includes utility-based data mining techniques which take into account the dependencies between different phases of the data mining process to maximize the utility of more than a single phase. For example, methods for acquiring training data which take into account the costs of errors in addition to the cost of training data; or methods for the extraction of predictive patterns which take into account the cost of test features necessary at prediction time.
  • Applications
    • What existing data mining applications have taken economic utility into account?
    • What methods do these applications use to take economic utility into consideration?
    • How does economic utility and the methods for dealing with it vary according to the specific problem addressed (e.g., by industry)?

8:30 - 8:45 Opening Remarks and Welcome
8:45 - 9:15    Invited Talk: Toward Economic Machine Learning and Utility-based Data Mining   (Slides)
Foster Provost
9:15 - 9:35 Budgeted Learning of Bounded Active Classifiers
Aloak Kapoor and Russell Greiner
9:35 - 9:45 Reinforcement Learning for Active Model Selection
Aloak Kapoor and Russell Greiner
9:45 - 10:05 Economical Active Feature-value Acquisition through Expected Utility Estimation
Prem Melville, Maytal Saar-Tsechansky, Foster Provost and Raymond Mooney
10:05 - 10:30    Break
10:30 - 11:00 Invited Talk: Cost-Sensitive Classifier Evaluation   (Powerpoint slides)
Robert Holte (work in conjunction with Chris Drummond)
11:00 - 11:10 Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets
Nitesh Chawla, Lawrence Hall and Ajay Joshi
11:10 - 11:20 Noisy Information Value in Utility-Based Decision Making
Clayton Morrison and Paul Cohen
11:20 - 11:40 Learning Policies for Sequential Time and Cost Sensitive Classification
Andrew Arnt and Shlomo Zilberstein
11:40 - 12:00 Improving Classifier Utility by Altering the Misclassification Cost Ratio
Michelle Ciraco, Michael Rogalewski and Gary Weiss
12:00 - 1:30 Lunch
1:30 - 1:50 One-Benefit Learning: Cost-Sensitive Learning with Restricted Cost Information
Bianca Zadrozny
1:50 - 2:00 Utility based Data Mining for Time Series Analysis - Cost Sensitive Learning for Neural Network Predictors
Sven Crone, Stefan Lessmann and Robert Stahlblock
2:00 - 2:10 Does Cost-Sensitive Learning Beat Sampling for Classifying Rare Classes?
Kate McCarthy, Bibi Zabar and Gary Weiss
2:10 - 2:40 Invited Talk: Machine Learning Paradigms for Utility-Based Data Mining (Slides)
Naoki Abe
2:40 - 2:50 Interruptible Anytime Algorithms for Iterative Improvement of Decision Trees
Saher Esmeir and Shaul Markovich
2:50 - 3:00 Position Paper: Contextual Recommender Problems
Omid Madani and Dennis DeCoste
3:00 - 3:30 Break
3:30 - 3:50 A Fast High Utility Itemsets Mining Algorithm
Ying Liu, Wei-keng Liao and Alok Choudary
3:50 - 4:30 Panel Discussion: Utility Based Data Mining-Research Challenges and Issues in Industry
4:30 - 4:40 Concluding Remarks

Submission Guidelines
All submissions should be submitted electronically, by the submission deadline of June 24, 2005, to the workshop contact, Gary Weiss. Please send it to the following 2 email addresses: gweiss@cis.fordham.edu and gaweiss@fordham.edu. All submissions should be made in PDF or PostScript format. Submissions should be a maximum of 10 pages and should use the ACM SIG Proceedings Templates. Note that in addition to technical papers, we encourage the submission of position papers (which generally should be at most 6 pages).

Submitted papers will be reviewed by members of the program committee and accepted papers will be presented at the workshop and published in the workshop proceedings and in the ACM digital library. To appear in the ACM digital library, you must fill out and send in the ACM copyright form. Authors will be notified of the acceptance or rejection of their paper by July 6. Camera-ready version of the papers are due July 15, 2005.

Please do not hesitate to email the workshop contact if you have any questions.

Important Dates

June 24, 2005 Deadline for electronic submission of full papers
July 6, 2005 Notification of accepted papers
July 15, 2005 Camera Ready Copies
August 21, 2005 UBDM Workshop
The above dates reflect the extended deadlines. Because of the tight reviewing schedule, no exceptions to the dates above can be made without prior approval.

Workshop Co-Chairs

   Note: for inquiries please send email to gweiss @ cis . fordham . edu
Gary M. Weiss Fordham University, Bronx, New York
Maytal Saar-Tsechansky   University of Texas at Austin
Bianca Zadrozny IBM T.J. Watson Research Center, Yorktown Heights, NY

Program Committee

Naoki Abe IBM Research
Valentina Bayer-Zubek    Aureon Biosciences
Nitesh Chawla University of Notre Dame
Ian Davidson State University of New York at Albany
Chris Drummond National Research Council (Ottawa)
Charles Elkan UC San Diego
Wei Fan IBM Research
Tom Fawcett Stanford Computational Learning Laboratory
Russ Greiner University of Alberta
Rob Holte University of Alberta
Nathalie Japkowicz University of Ottawa
Aleksander Kolcz AOL Inc.
Charles Ling University of Western Ontario
Dragos Margineantu Boeing Company
Prem Melville University of Texas at Austin
Ion Muslea Language Weaver, Inc.
Claudia Perlich IBM Research
Prasad Tadepalli Oregon State University
Kai Ming Ting Monash University
Xingquan Zhu University of Vermont