Fordham University Data Mining Research Group

Our research group is interested in all aspects of machine learning and data mining. Most of our research group members are introduced to data mining via coursework in data mining, at either the undergraduate or graduate level. We own state-of-the-art data mining tools, including SAS Enterprise Miner and C5.0. If you are interested in joining our research group, please contact Dr. Weiss.

What is Data Mining?

The textbook definition says that data mining is "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data". What that really means is that data mining involves the extraction of knowledge from data using intelligent, at least partly automated, techniques. Data mining is interdisciplinary in nature and borrows heavily from machine learning, artificial intelligence, statistics, and databases as well as a few other disciplines. Data mining is concerned with the issues that arise when analyzing very large data sets.

Members

Faculty: Gary Weiss (director)
Undergraduate Students: Michele Ciraco (graduated), Michael Rogalewski (graduated), Kate McCarthy (graduated), Bibi Zabar (graduated)
Graduate Students: Jizhou Ai, Ye Tian

Research

While we are interested all research related to data mining, our current research focus is on Utility-Based Data Mining (UBDM), which deals with how economic factors impact the data mining process. For more information on UBDM, see the web pages for the first UBDM workshop or the second UBDM workshop organized by Dr. Weiss.

Below is a list of recent research publications by members of our group:

Related Courses

If you are interested in data mining, the following courses offerred by the Fordham Computer and Information Science department will be of interest to you:

  • Applied Algorithms for Data Analysis, CSGA 6950 (graduate)
  • Data Mining, CSGA 6930 (graduate)
  • Data Mining, CSRU 4631 (undergraduate)
  • Machine Learning, CSRU 4621 (undergraduate)

Invited Talks

There will occasionally be talks by invited speakers on data mining. The past and upcoming talks are listed below.

Resources

  • Data Mining Tools: We have access to the following two data mining tools:

    • SAS Enterprise Miner is a state-of-the-art data mining package that includes a host of data mining algorithms (decision trees, neural networks, instance-based learning, etc.) that are all usable via a graphical user interface. We have this package running on about 20 PCs in the Distributed Computing Lab (JMH 331 on the Rose Hill campus). Some additional licensed compies are available, including for use on home PCs. See Dr. Weiss if you are interested in obtaining a copy.

    • C5.0 is a decision tree tool from Rulequest research that has been installed on storm. Anyone with an account on storm can use this software. For information on using the tool, see the provided on-line documentation. The software is installed on storm under ~gweiss/shared/c5 (the executable is under the bin subdirectory and called c5.0.

  • Data Sets
    • A number of data sets are available for analysis. Many of these data sets are from the UCI repository, and have been downloaded to storm, while others are from AT&T (and have been used in previous research). These data sets are available on storm under ~gweiss/shared/datasets.

  • Some Data Mining Papers

  • Storm and C5.0/C4.5 usage tutorials
    • Here is a simple tutorials on using UNIX on storm and also an example of how to run C5.0 or C4.5 on storm.