Fordham University Data Mining Research Group
Our research group is interested in all aspects of machine learning and
data mining. Most of our research group members are introduced to data mining
via coursework in data mining, at either the undergraduate or graduate level.
We own state-of-the-art data mining tools, including SAS Enterprise Miner and
C5.0. If you are interested in joining our research group, please contact
Dr. Weiss.
What is Data Mining?
The textbook definition says that data mining is
"the nontrivial extraction of implicit, previously unknown, and potentially
useful information from data". What that really means is that
data mining involves the extraction of knowledge from data
using intelligent, at least partly automated, techniques. Data mining
is interdisciplinary in nature and borrows heavily from
machine learning, artificial intelligence, statistics, and databases as well
as a few other disciplines. Data mining is concerned with the issues
that arise when analyzing very large data sets.
Members
Faculty:
Gary Weiss (director)
Undergraduate Students:
Michele Ciraco (graduated),
Michael Rogalewski (graduated),
Kate McCarthy (graduated),
Bibi Zabar (graduated)
Graduate Students:
Jizhou Ai,
Ye Tian
Research
While we are interested all research related to data mining, our
current research focus is on Utility-Based Data Mining (UBDM), which
deals with how economic factors impact the data mining process.
For more information on UBDM, see the web pages for the
first UBDM
workshop or the
second UBDM workshop
organized by Dr. Weiss.
Below is a list of recent research publications by members of our group:
-
Maximizing Classifier Utility when Training
Data is Costly
Gary. M. Weiss and Ye Tian
Proceedings of the ACM SIGKDD Second International Workshop on
Utility-Based Data Mining,
3-11, ACM Press, 2006.
-
Improving Classifier Utility by Altering the Misclassification Cost Ratio
Michelle Ciraco, Michael Rogalewski and Gary Weiss
Proceedings of the ACM SIGKDD First International Workshop on Utility-Based Data Mining,
46-52, ACM Press, 2005.
-
Does Cost-Sensitive Learning Beat Sampling for Classifying Rare Classes?
Kate McCarthy, Bibi Zabar and Gary Weiss
Proceedings of the ACM SIGKDD First International Workshop on Utility-Based Data Mining,
69-77, ACM Press, 2005.
Related Courses
If you are interested in data mining, the following courses offerred by the
Fordham Computer and Information Science department will be of interest to
you:
-
Applied Algorithms for Data Analysis, CSGA 6950 (graduate)
-
Data Mining, CSGA 6930 (graduate)
-
Data Mining, CSRU 4631 (undergraduate)
-
Machine Learning, CSRU 4621 (undergraduate)
Invited Talks
There will occasionally be talks by invited speakers on data mining. The past
and upcoming talks are listed below.
Resources
-
Data Mining Tools: We have access to the following two
data mining tools:
-
SAS Enterprise Miner is a state-of-the-art data mining package
that includes a host of data mining algorithms (decision trees, neural
networks, instance-based learning, etc.) that are all usable via a
graphical user interface. We have this package running on about 20 PCs
in the Distributed Computing Lab (JMH 331 on the Rose Hill campus). Some
additional licensed compies are available, including for use on home PCs.
See Dr. Weiss
if you are interested in obtaining a copy.
-
C5.0 is a decision tree tool from
Rulequest research that has been installed on storm. Anyone with an account
on storm can use this software. For information on using the tool, see
the provided
on-line
documentation. The software is installed on storm under
~gweiss/shared/c5 (the executable is under the bin subdirectory and called
c5.0.
-
Data Sets
-
A number of data sets are available for analysis. Many of these data sets are
from the UCI repository, and have been downloaded to storm, while others are
from AT&T (and have been used in previous research). These data sets are
available on storm under ~gweiss/shared/datasets.
-
Some Data Mining Papers
-
Storm and C5.0/C4.5 usage tutorials
-
Here is a simple tutorials on using
UNIX on storm and also an example of how to run C5.0 or C4.5 on storm.
|