Step-0: Machine Learning in Python
Machine learning has proliferated our everyday lives leading to a number of data-driven decisions. Typically, applying machine learning requires good math skills in linear algebra, probability theory, and signal processing. This coupled with good programming skills go a long way to test new/existing algorithms on large amounts of data. I am a big fan of Matlab for testing machine learning algorithms as it provides very easy interface for matrix operations. Recently, I came to know that working with matrices in Python is also very easy. Therefore, I decided to write a series of blog posts about how to use Python for machine learning. The fundamental ideas will remain same weather we use R, Matlab, Python.
As an example, I will mainly focus on supervised learning, as it is easier to make sense of.
The main components are:
Reading raw data from a csv file (or some other file format)
Identifying the input features ($\mathbf{H} = [ \mathbf{h}_1, \mathbf{h}_2, ..., \mathbf{h}_p]$) and output $\mathbf{y} = [y_1, y_2, ..., y_n]^T$. This notation means that we have $p$ features and $n$ data points. Bold capital represents a matrix, otherwise bold represents a vector. $(\cdot)^T$ means the transpose operator.
Divide the data into training and test set
Apply some supervised machine learning technique, such as linear regression or logistic regression to determine a mapping function from $H_i \rightarrow y_i,~~\forall~i \in [1,n]$ by using the training set.
Then, apply the mapping function on test set to obtain predictions for $\hat{y_{test}}$
Plot this data together with known actual value to check the accuracy of the model
My goal is to write python code for each of these steps to successfully apply a supervised learning algorithm. Application of any other algorithm will then only require modifications of step 1, 4, 5.
I will start with Anaconda for setting up the python environment.










