CISC 5800: Machine Learning

CISC 5800: Machine Learning

Final Project
Frequently Asked Questions

Question: How do we handle non-numeric features?
Answer: It's up to you. You may choose to ignore them. You may convert them to integers on a continuum (if that's relevant). Certainly classifiers can handle non-numeric features, others expect features on a number line. It's up to you to decide what to do. But you must explain what you chose to do and your reasoning.

Question: What is the training set and what is the testing set?
Answer: It's up to you. Just make sure the sets are separate, and the number of elements in your testing set stays constant as you evaluate your classifiers. But you must explain what you chose to do and your reasoning.

Question: What do we do with data points where some of the feature values are unknown?
Answer: It's up to you. You may replace the value with some "default" value or selectively train only for the known features for that data point, or ignore the data point entirely. But you must explain what you chose to do and your reasoning.

Question: How can I use SVM, PCA, ICA?
Answer: One of the classification methods must be coded by you. However, you are allowed to experiment with additional methods implemented by other people. The Statistics toolbox for Matlab implements SVM and PCA -- though it costs extra money to purchase (if you don't have it already). You can search Google for free code. SVMlight and fastica are prominent free software for SVM and ICA, respectively. Principal components and weights can be found through the svd command which is available by default on any Matlab version.

Question: How long should the paper be?
Answer: I recommend the paper be 6 to 8 pages.

Added December 4

Question: How do I deal with numeric features spanning different magnitudes (0-10 vs 0-100)?
Answer: If you wish, you can divide each column of features by the variance of that column --- this will force all the numeric data to be in the same range.

Question: How can I add +b into my logistic regression classification: evaluate w^Tx + b > 0 ?
Answer: You can create a new feature for all data points, and this feature will always have the value 1 for all data points. Then, the corresponding w entry will correspond to b in w^Tx + b > 0

Added December 5

Question: How do I come up with three different "learning/classification parameters" to explore using Bayes/Naive Bayes?
Answer: Technically, if you are learning entries in a probability table, each prior probability (for each feature value or combination of feature values) counts as a different parameter ... I will accept that interpretation for you to test different "learning/classification parameters". I also will accept varying the training set size and/or varying the number of features used in classification.