CISC 4631 L01 Data Mining
Department of Computer and Information Science
Dr. Weiss, Fall 2023

CLASS SCHEDULE

Class Times and Location: Monday, Thursday 2:30 - 3:45pm in LL-307

Instructor: Dr. Gary Weiss (gaweiss@fordham.edu)

Office: LL-610d

Teaching Assistant: Brooke Warner (bwarner4@fordham.edu)

Lectures: This class is an in-person class and lectures will only occur in-person. Please try to actively participate in class since this will make the class more interesting for everybody. Feel free to ask questions.

Office Hours (see blackboard info tab for zoom links)
  Dr. Weiss: Tuesday 4-5pm (via zoom), Thursday 4-5pm, and by appointment
  TA: Wednesday 6-7pm (via zoom)

Required Text: "Introduction to Data Mining, 2nd Edition", Tan, Steinbach, Karpatne, and Kumar. Pearson. 2019. ISBN-13: 978-0-13-312890-1.

Course Website: http://storm.cis.fordham.edu/~gweiss/classes/cisc4631

Course Description: This course will cover data mining algorithms for analyzing large data sets as well as the practical issues that arise when applying these algorithms to real-world problems. It will balance theory and practice--the principles of data mining methods will be discussed but students will also acquire hands-on experience. Each student will select and complete an application-oriented or research-oriented course project.

Topics

  • Introduction to Data Mining
  • Data: Types of Data, Preprocessing (e.g., feature selection), Quality Issues (e.g., missing values), Similarity Metrics
  • Classification and Prediction Fundamentals: Evaluation Metrics, Class Imbalance, etc.
  • Classification Algorithms: Decision Trees, Rule Learning, Naive Bayes, Nearest Neighbor, Neural Networks, Ensembles
  • Clustering: K-Means, Hierarchical, DBSCAN, Evaluation
  • Association Analysis: Apriori algorithm

Prerequisites: There are no formal course prerequisites. This course is open to non-CS majors and any necessary concepts will be introduced within the course. Knowledge of programming is useful and students with programming experience are free to use Python's data mining modules for their project and any assignments for which Weka is not specifically required; other students may use WEKA for all data mining exercises and for the project (Weka does not require any programming). Because Python requires much more effort than Weka student's using Python may receive a small amount of extra credit.

Learning Objectives:

  • To develop a basic understanding of data mining so that you can recognize what problems can be addressed by data mining and which data mining methods are most appropriate for a given task.
  • To gain a basic understanding of how classification, prediction, clustering, and association analysis techniques operate at the algorithmic level.
  • To gain experience using data mining toolkits and software suites, and to apply data mining to a significant real-world dataset.
  • Improve technical writing skills and be able to document a data mining project in a format suitable for conference publication.

    Academic Honesty and AI/ChatGPT Policy: All work produced in this course should be your own unless it is specifically stated that you may work with others. You may discuss the homework problems with other students generally, but may not provide complete solutions to one another; copying of homework solutions is always unacceptable and will be considered a violation of Fordham's academic integrity policy. An academic integrity violation on the final exam will result in an "F" for the course. All violations of this policy will be reported to the university and will be handled in accordance with existing policies. Based on recommendations from the university in response to the recent wide-scale academic integrity violation, unless I notify you otherwise, you will be required to keep your webcam on during the final exam.

    This course adopts Fordham's Limited-AI approach toward the use of AI tools for completing the course project paper. Limited usage of generative AI tools is allowed for the course project and no AI may be used for the text of the homework assignments. These tools are allowed for enabling exploration of ideas, complex data analysis, and creative solution development. When using these tools, it is mandatory to clearly indicate the sections of your work that were generated using them for proper attribution and transparency, and indicate the prompts and software versions that were used. It is critical to adhere to ethical standards Please consult with the instructor for more specific advice.

    Course Project: The course will include a course project. You may work individually or in teams of 2 (special permission is needed to work in larger teams). You may address a research question or analyze a real world data set. Consider working on a project that relates to a hobby or interest of yours. A good start is to try to find high quality data-- once you have a data set you can often find a data mining problem related to it. You are responsible for coming up with your project topic but a list of sample projects will be provided.To following mapping will be used to convert a numerical grade to a letter grade. A curve may be applied so the mapping below represents the minimum grade you would receive given your weighted numerical average.

    Grading: The percentages given below are guidelines and minor changes may be made during the course (students will be informed prompty of any such changes).

    Homeworks/Labs  21%
    Project  25%
    Project Proposal  2%
    Midterm Exam20%
    Final Exam  28%
    Participation4%

    To following mapping will be used to convert a numerical grade to a letter grade. A curve may be applied so the mapping below represents the minimum grade you would receive given your weighted numerical average.
    A:93-100   C+:77-80
    A-:90-93   C:74-77
    B+:87-90   C-:70-74
    B:84-87   D:65-70
    B-:80-84   F:<65

  •