CISC 4631 R02 Data Mining
Department of Computer and Information Science
Dr. Weiss, Spring 2020
Class Times: Monday, Thursday 2:30-3:45pm, JMH 405
Instructor: Dr. Gary Weiss
Office: JMH 308a
Office Hours:Monday 1:00pm, Wednesday 12pm (noon) via Google Meet
Google Meet office hours: meet.google.com/kzd-goyn-eyu (meeting id kzd-goyn-eyu)
Please connect as close to the start time as possible. If no one in virtual office hours within
30 minutes, I will leave. If you need to meet later or at another time, email me. If you want
an invite so this gets added to your google calendar, then let me know.
Required Texts: "Introduction to Data Mining, 2nd Edition",
Tan, Steinbach, Karpatne, and Kumar.
Pearson. 2019. ISBN-13: 978-0-13-312890-1.
Course Website: http://storm.cis.fordham.edu/~gweiss/classes/cisc4631
This course will cover data mining algorithms for analyzing large data sets as
well as the practical issues that arise when applying these algorithms to
real-world problems. It will balance theory and practice--the principles
of data mining methods will be discussed but students will also acquire
hands-on experience. Each student will select and complete an
application-oriented or research-oriented course project.
Introduction to Data Mining
Data: Types of Data, Preprocessing (e.g., feature selection), Quality Issues (e.g., missing values), Similarity Metrics
Classification and Prediction Fundamentals: Evaluation Metrics, Class Imbalance, etc.
Classification Algorithms: Decision Trees, Rule Learning, Naive Bayes, Nearest Neighbor, Neural Networks,
Clustering: K-Means, Hierarchical, DBSCAN, Evaluation
Association Analysis: Apriori algorithm
Prerequisites: There are no formal course prerequisites. This course
is open to non-CS majors and any necessary concepts will be introduced within
the course. Knowledge of programming is useful and students with programming
experience are free to use Python's data mining modules; other students
may use WEKA, which does not require any programming skills.
To develop a basic understanding of data mining so that you can recognize what
problems can be addressed by data mining and which data mining methods
are most appropriate for a given task.
To gain a basic understanding of how classification, prediction, clustering, and
association analysis techniques operate at the algorithmic level.
To gain experience using data mining toolkits and software suites, and to apply
data mining to a significant real-world dataset.
Improve technical writing skills and be able to document a data mining project
in a format suitable for conference publication.
Attendance and Class Participation:
It is important to attend every class and to be prepared for every class.
Being prepared means completing the assigned readings and homeworks on time and
being ready to discuss the material. Please actively participate in
class since this will make the course more interesting for everyone.
If you are going to miss class or will not have a homework completed on time,
whenever possible let me know beforehand-- I tend to be more lenient in
such cases (at least if you have a reasonable excuse). Your class participation
grade will be based on both your attendance and the degree to which you
participated in class.
Academic Honesty: All work produced in this course should be your
own unless it is specifically stated that you may work with others. You
may discuss the homework problems with other students generally, but
may not provide complete solutions to one another; copying of homework
solutions is always unacceptable and will be considered a violation
of Fordham's academic integrity policy. Violations of this policy
will be handled in accordance with university policy which can
include automatic failure of the assignment and/or failure of the
The percentages given below are guidelines for both the student and
instructor and minor changes may be made during the course (students
ill be informed prompty of any such changes).
|Homeworks & Labs ||12%|
|Cumulative Final Exam ||35%
|Course Project ||25% Proposal worth 2%|
To map a numerical grade to a letter grade, I use the following mapping (which
is the default built into Blackboard). However, in some cases I may curve
| A:||94-100|| || C+:||77-80
| A-:||90-94|| || C:||74-77
| B+:||87-90|| || C-:||70-74
| B:||84-87|| || D:||65-70
| B-:||80-84|| || F:||<65
The course will include a course project. You may work individually or in
teams of 2 (special permission is needed to work in larger teams). You may
address a research question or analyze a real world data set. Consider
working on a project that relates to a hobby or interest of yours. A good
start is to try to find high quality data-- once you have a data set you
can often find a data mining problem related to it. You are responsible for
coming up with your project topic but I can help you if you are having
I run the
Sensor Data Mining Lab and much of the work that we do in the lab could
form the basis of a course project. If you are interested in joining the
lab or learning about data that from the lab that could be used for a course
project, let me know as soon as possible.