CISC 4631 L01 Data Mining
Department of Computer and Information Science
Dr. Weiss, Fall 2024
CLASS SCHEDULE
Class Times and Location: Monday, Thursday 2:30 - 3:45pm in LL-311
Instructor: Dr. Gary Weiss (gaweiss@fordham.edu)
Office: LL-610d
Teaching Assistant: Kayla Laufer (klaufer1@fordham.edu)
Lectures:
This class is an in-person class and lectures will only occur in-person.
Please try to actively participate in class since this will make the class more interesting
for everybody. Feel free to ask questions.
Office Hours (see blackboard info tab for zoom links)
Dr. Weiss: Wednesday noon-1pm (via zoom starts 9/11) and Thursday 4-5pm in person; also by appointment
Kayla Laufer (TA): Tuesday 4-5pm (via zoom)
Required Text: "Introduction to Data Mining, 2nd Edition",
Tan, Steinbach, Karpatne, and Kumar.
Pearson. 2019. ISBN-13: 978-0-13-312890-1.
Course Website: http://storm.cis.fordham.edu/~gweiss/classes/cisc4631
Course Description:
This course will cover data mining algorithms for analyzing large data sets as
well as the practical issues that arise when applying these algorithms to
real-world problems. It will balance theory and practice--the principles
of data mining methods will be discussed but students will also acquire
hands-on experience. Each student will select and complete an
application-oriented or research-oriented course project.
Topics
-
Introduction to Data Mining
-
Data: Types of Data, Preprocessing (e.g., feature selection), Quality Issues (e.g., missing values), Similarity Metrics
-
Classification and Prediction Fundamentals: Evaluation Metrics, Class Imbalance, etc.
-
Classification Algorithms: Decision Trees, Rule Learning, Naive Bayes, Nearest Neighbor, Neural Networks,
Ensembles
-
Clustering: K-Means, Hierarchical, DBSCAN, Evaluation
-
Association Analysis: Apriori algorithm
Prerequisites: There are no formal course prerequisites. This course
is open to non-CS majors and any necessary concepts will be introduced within
the course. Knowledge of programming is useful and students with programming
experience are free to use Python's data mining modules for their project and any assignments for which
Weka is not specifically required; other students may use WEKA for all data mining exercises and for the project
(Weka does not require any programming). Because Python requires much more effort than Weka student's using
Python may receive a small amount of extra credit.
Learning Objectives:
To develop a basic understanding of data mining so that you can recognize what
problems can be addressed by data mining and which data mining methods
are most appropriate for a given task.
To gain a basic understanding of how classification, prediction, clustering, and
association analysis techniques operate at the algorithmic level.
To gain experience using data mining toolkits and software suites, and to apply
data mining to a significant real-world dataset.
Improve technical writing skills and be able to document a data mining project
in a format suitable for conference publication.
Academic Honesty and AI/ChatGPT Policy: All work produced in this course should be your
own unless it is specifically stated that you may work with others. You
may discuss the homework problems with other students generally, but
may not provide complete solutions to one another; copying of homework
solutions is always unacceptable and will be considered a violation
of Fordham's academic integrity policy. An academic integrity violation
on the final exam will result in an "F" for the course. All violations of
this policy will be reported to the university and will be handled in accordance
with existing policies.
This course adopts Fordham's Limited-AI approach toward the use of AI tools for
completing the course project paper. Limited usage of generative AI tools is allowed for
the course project and no AI may be used for the text of the homework assignments. These tools
are allowed for enabling exploration of ideas, complex data analysis,
and creative solution development. When using
these tools, it is mandatory to clearly indicate the sections of your work that were
generated using them for proper attribution and transparency, and indicate the prompts
and software versions that were used. It is critical to adhere to ethical standards
Please consult with the instructor for more specific advice.
Course Project: The course will include a course project. You may work individually or in
teams of 2 (special permission is needed to work in larger teams). You may
address a research question or analyze a real world data set. Consider working on a
project that relates to a hobby or interest of yours. A good start is to try to
find high quality data-- once you have a data set you can often find a data mining
problem related to it. You are responsible for coming up with your project topic
but a list of sample projects will be provided.
Grading:
The percentages given below are guidelines and minor changes may be made during the course (students
will be informed prompty of any such changes).
Homeworks/Labs | 21% |
Project | 25% |
Project Proposal | 2% |
Midterm Exam | 20% |
Final Exam | 28% |
Participation | 4% |
To following mapping will be used to convert a numerical grade to a letter grade. A curve may be applied so the mapping below represents the minimum grade you would receive given your weighted numerical average.
A: | 93-100 | | C+: | 77-80 |
A-: | 90-93 | | C: | 74-77 |
B+: | 87-90 | | C-: | 70-74 |
B: | 84-87 | | D: | 65-70 |
B-: | 80-84 | | F: | <65 |
|