CISC 4631 R02 Data Mining
Department of Computer and Information Science
Dr. Weiss, Spring 2020

CLASS SCHEDULE

Class Times: Monday, Thursday 2:30-3:45pm, JMH 405

Instructor: Dr. Gary Weiss (my homepage)
Office: JMH 308a
Email: gaweiss@fordham.edu
Phone: 718-817-0785
Office Hours:Monday 1:00pm, Wednesday 12pm (noon) via Google Meet

Google Meet office hours: meet.google.com/kzd-goyn-eyu (meeting id kzd-goyn-eyu)
Please connect as close to the start time as possible. If no one in virtual office hours within 30 minutes, I will leave. If you need to meet later or at another time, email me. If you want an invite so this gets added to your google calendar, then let me know.

Required Texts: "Introduction to Data Mining, 2nd Edition", Tan, Steinbach, Karpatne, and Kumar.
Pearson. 2019. ISBN-13: 978-0-13-312890-1.

Course Website: http://storm.cis.fordham.edu/~gweiss/classes/cisc4631

Course Description: This course will cover data mining algorithms for analyzing large data sets as well as the practical issues that arise when applying these algorithms to real-world problems. It will balance theory and practice--the principles of data mining methods will be discussed but students will also acquire hands-on experience. Each student will select and complete an application-oriented or research-oriented course project.

Topics

  • Introduction to Data Mining
  • Data: Types of Data, Preprocessing (e.g., feature selection), Quality Issues (e.g., missing values), Similarity Metrics
  • Classification and Prediction Fundamentals: Evaluation Metrics, Class Imbalance, etc.
  • Classification Algorithms: Decision Trees, Rule Learning, Naive Bayes, Nearest Neighbor, Neural Networks, Ensembles
  • Clustering: K-Means, Hierarchical, DBSCAN, Evaluation
  • Association Analysis: Apriori algorithm

Prerequisites: There are no formal course prerequisites. This course is open to non-CS majors and any necessary concepts will be introduced within the course. Knowledge of programming is useful and students with programming experience are free to use Python's data mining modules; other students may use WEKA, which does not require any programming skills.

Learning Objectives:

  • To develop a basic understanding of data mining so that you can recognize what problems can be addressed by data mining and which data mining methods are most appropriate for a given task.
  • To gain a basic understanding of how classification, prediction, clustering, and association analysis techniques operate at the algorithmic level.
  • To gain experience using data mining toolkits and software suites, and to apply data mining to a significant real-world dataset.
  • Improve technical writing skills and be able to document a data mining project in a format suitable for conference publication.

    Attendance and Class Participation: It is important to attend every class and to be prepared for every class. Being prepared means completing the assigned readings and homeworks on time and being ready to discuss the material. Please actively participate in class since this will make the course more interesting for everyone. If you are going to miss class or will not have a homework completed on time, whenever possible let me know beforehand-- I tend to be more lenient in such cases (at least if you have a reasonable excuse). Your class participation grade will be based on both your attendance and the degree to which you participated in class.

    Academic Honesty: All work produced in this course should be your own unless it is specifically stated that you may work with others. You may discuss the homework problems with other students generally, but may not provide complete solutions to one another; copying of homework solutions is always unacceptable and will be considered a violation of Fordham's academic integrity policy. Violations of this policy will be handled in accordance with university policy which can include automatic failure of the assignment and/or failure of the course.

    Grading: The percentages given below are guidelines for both the student and instructor and minor changes may be made during the course (students ill be informed prompty of any such changes).

    Homeworks & Labs  12%
    Midterm Exam23%
    Cumulative Final Exam  35%
    Course Project  25%    Proposal worth 2%
    Participation5%

    To map a numerical grade to a letter grade, I use the following mapping (which is the default built into Blackboard). However, in some cases I may curve grades upward.
    A:94-100   C+:77-80
    A-:90-94   C:74-77
    B+:87-90   C-:70-74
    B:84-87   D:65-70
    B-:80-84   F:<65

    Course Project The course will include a course project. You may work individually or in teams of 2 (special permission is needed to work in larger teams). You may address a research question or analyze a real world data set. Consider working on a project that relates to a hobby or interest of yours. A good start is to try to find high quality data-- once you have a data set you can often find a data mining problem related to it. You are responsible for coming up with your project topic but I can help you if you are having trouble.

    I run the WISDM WIreless Sensor Data Mining Lab and much of the work that we do in the lab could form the basis of a course project. If you are interested in joining the lab or learning about data that from the lab that could be used for a course project, let me know as soon as possible.

     

  •  
     
    top | Home | CIS Dept. | Fordham