Python Expectations and Resources
The expectation is that prior to starting the class you will know how to:
-
Set up the Python Anaconda environment on your computer (<1 hour)
-
Write simple Python programs and run them (2-3 hours)
-
Use Jupyter Notebook (30 minutes)
Even if you have no knowlege of Python you can learn the minimum in as little as 4 hours
using the tutorials below (if necessary you can learn them during the first two weeks of the
course). You will need other Python skills, which you can acquire during the course, but
if you have time you can learn them before the class starts when you may be less rushed. Much
of what you will need is best acquired by looking at examples and adapting them, but if you
have this background knowledge that process will be smoother.
Here are
additional skills that you will need:
-
Basic knowledge of Pandas, a Python data analysis library for readning and processing
data (1-2 hours).
-
Abilty to use Matplotlib to generate charts (1 hour).
-
Abilty to use ScikitLearn, the main Python library you will use for the labs and project,
which includes all of the data mining/machine learning algorithms. You will learn this as
the course progresses, but you should start with a basic overview/tutorial, which I will
assign in class (2 hours).
PYTHON RESOURCES
Feel free to suggest others for me to add, to benefit your classmates and future students.
Key references
Tutorials
The tutorial topics (1-6) are organized in a logical sequence. There are also two tutorials
that follow (labelled i, ii), which cut across several of he topics and can get you started more quickly,
but providee only selective coverage. In my opinion you are better off with the main
sequence (1-6) but can use these other ones for review or as supplemental resources.
-
Download Anaconda.
This will include most of the libraries you will need (e.g., Pandas), include Qt Console for
running iPython code, and Jupyter Notebook. If you start Anaconda Navigator it will show
you the various interfaces. Any missing libraries can be installed later.
-
Go through parts of the
Python tutorial.
You should complete Sections 1-3, most of Section 4 (control flow), most of Section
5 (data structures), a bit of
Section 6 (Modules), and most of Section 7 (Input and Output). You can skip the rest.
Start with Qt Console but at some point shift to Jupyter Notebook (next item).
-
Learn to use Jupyter Notebook via
a 20 minute
Youtube video or a
short
tutorial.
There is also a "User Interface Tour" option under the "Help" menu in Jupyter Notebook, but
that may not be sufficient on its own. You should also check out "keyboard Shortcuts" under the
help menu as you become more familar with Notebook. You can submit your notebooks
for your labs, most likely as an exported pdf file.
-
Learn the basics of Pandas using this
1 hour Youtube Video.
While I recommend the video, you can use web-based tutorials by 1) going to Jupyter Notebook,
selecting the "Help" Menu, selecting "pandas reference" and then going to the "get started guides"
(the user guide and references may be useful too), or 2) consider the
Kaggle 4-hour tutorial.
-
Learn the basics of matplotlib by going to the
Matplotlib website and viewing the
quickstart guides and some of the examples, and/or the
30 minute Youtube video.
-
Learn the basics of Scikit Learn with this
1 hour 40
minute Youtube video.
Visit the official scikit-learn website
and browse it a bit. Click on classification and the select "Decision Trees" (currently 1.10)
and then do 1.10.1 in Jupyter Notebook, which will have you build a decision tree classifier
for the iris data set. This is how you will probably construct most of your labs-- by
finding similar examples on this site and then adapting them. In fact, if you search
for a specific item, it will likely automatically find it on the site, with reference
information and, much more importantly, specific examples that you can adapt. It is okay if you
do not understand the content until we cover the topic in class.
Supplemental Tutorials
-
An Introduction to Python for Data Science Applications: Covers python data
structures and a very quick look at Numpy/SciPy, Matplotlib, and Pandas.
-
A set of Jupyter
notebook tutorial examples from our textbook authors that covers an intro to Python,
Numpy and Pandas, Data Exploration and Preprocessing, Regression, Classification, etc.
|