CISC 5800: Machine Learning



Class times: Wednesday, 6:00 – 7:50pm, Room LL 519
Instructor: Prof. Daniel D. Leeds (my homepage)
Office: LL 819D for Office Hours; normally JMH 332 (Rose Hill)
E-mail:
Office hours: 5-6pm

Full syllabus is available here.

Course text: No book will be required, but the following will be useful as references.


Programming: We will have programming assignments throughout the semester. I will require you complete your programming assignments in Matlab. There are several ways to use Matlab/Matlab-equivalent-software:

Sections below:
  1. Resources
  2. Announcements
  3. Slides
  4. Assignments

Resources:
Computing guides
Linux Commands - important Linux commands for working on storm
vi Commands - important commands for the vi text editor; you are welcome to use emacs instead of vi
A Guide to Putty - Information for Windows users on accessing storm
Matlab
Extra background on Matlab
Matlab programming practice — download the accompanying data file sampleData.mat and function file newFunc.m
Calculus
Calculus practice - some optional problems with derivatives
Midterm practice
Practice questions
Practice answers -- I fixed the AIC answers (question 8) posted March 4, 8pm. Additional corrections to the answers written in blue in the following days.
Final practice
Practice questions -- additional questions expected to be placed online over the weekend (e.g., including Neural Nets!)
Practice answers
Additional practice, coded in colors red, green, blue
Additional practice answers BLUE, blue 1b and 6a corrected May 2
Additional practice answers GREEN, green 2c corrected May 2.
Additional practice answers RED


Announcements:
Our final exam will be in class May 3!!!
April 10, 11:50am: Office hours Wednesday are cancelled. I will be available by e-mail over break. Keep working on your final projects!
February 28, 12:10pm: Check the resources section for practice midterm questions and answers!!!
February 23, 5:45pm: We will have a review session 7:50-8:50pm March 1.
February 21, 9:15am: As announced in class last week, our midterm exam will be on March 8, covering material through lecture on February 27.
January 18, 10:30pm: We will have an OPTIONAL Matlab overview January 25, 8-8:30pm in LL 812. PLEASE NOTE you are required to have one semester worth of programming experience in some language (C++, Python, Java, etc.) prior to starting my class. Our homeworks will be too difficult for students without this background.

Slides:
I have not found any one textbook or online resource to be an optimal match for the material we cover at the mathematical level we cover it. However, Andrew Ng's lecture notes, available on
Stanford's Machine Learning course web site, often can be a helpful online read. I recommend these notes as well as looking through one of our course textbooks.
Supplementary reading
Lecture 1, Course logistics, background math, intro to classifiers.
Lecture 1.5, Matlab intro.
Lecture 2, Bayes classifier. Ng notes 2 particularly pages 8-11
Lecture 3, Logistic classifier. Ng Notes 1 Elements of Part II (starting on page 16)
Lecture 4, Support vector machines. Ng notes 3 parts of pages 1-20
Lecture 5, Dimensionality reduction. Ng notes 10 on PCA and Ng notes 11 on ICA
Lecture 6, Neural networks. (updated Mar 26 evening) Chapter 1 and Chapter 2 of "Neural Networks and Deep Learning 
Lecture 7, Hidden Markov Models; you are not responsible for forward and backward probability slides in pages 5 and 6. Stanford notes
Guest Lecture, Text Mining.
Lecture 8, Bayes Networks. Murphy (UBC) notes, first few pages are most relevant


Assignments:
Homework 0 - due January 25. I recommend you do it by January 22! This is largely to test your background knowledge for the course.

Homework 0 answers. "A range" 87-96; "B range" 77-87; "C range" 67-87


Homework 1 - due February 8 (Part A), and February 10 (Part B)
February 6: I modified questions 1 and 2 to make them somewhat less tricky. If you already completed questions 1 and 2, you can submit your answers for the earlier/trickier versions, or adjust to the simpler versions.
For Part B, I highly recommend you use a modified version of the MAT file originally mentioned in the homework. I recommend you use HW1Data_NEW.mat, as the results will look slightly nicer. This file is also available on erdos.
Homework 1 Answers — Part B grades will be released by the end of the weekend.
"A range" 61-68; "B range" 51-61; "C range" 41-51; "D range" 31-41

Homework 2 - due February 22 (Parts A and B), and February 24 (Part C); data new available as hw2data.mat
February 20, 2:30pm: Part C question 4 has been slightly adjusted. Please note the changes and the addition of question 5 (which was originally just part of question 4)
February 21, 7:10pm: Part A questions 4 and 5 have had a typo corrected. You must determine how many parameters (not features) are required to learn a separating hyper-plane. I apologize for the confusion! Some slight (unimportant) typos have been corrected in Part C as well.
February 22, 1:30pm: To reiterate the clarification on Question A9 (which I clarified in last night's e-mail), I mean for you to find d and f such that the optimal w will make S(...) as large as possible. I apologize for the confusion here, and I will grade leniently!
Homework 2 answers -- reposted Sunday morning for your studying convenience. Corrections made to questions 9 and 10 on February 28 11am, and to question 1 at 4pm on March 1.
Grade breakdown, for Parts A-C:
67-77 points A range
57-67 points B range
47-57 points C range

Final project - due May 8

Midterm answers
64-73 points A range
50-64 points B range
36-50 points C range

Homework 3 - due March 29 (Part A and B) and March 31 (Part C); letter-recognition.mat
March 26 2pm Part A Question 3b and 4a have been correct in red, you should estimate dot products to the nearest 0.1
Part C deadline has been moved to 11:59pm Sunday, April 2; note there is no example data set for the neural network.
Homework 3 answers.
Grade breakdown, for Parts A-C:
137-155 points A range
119-137 points B range
100-119 points C range

Homework 4 - due April 26; OPTIONAL; disregard question 4 --- there is an error that prevents it from making sense!
Homework 4 answers
I also took pictures of the Viterbi answer to the most likely states producing the sequence of observations "woof", "woof". The details of the answer ARE visible, but you have to blow up the images. Board 1, Board 2, Board 3