In this tutorial i will show you how to perform various Machine Learning activities using Python. We will use a popular machine learning framework in python Sci-kit Learn.
You can run the following code to check the versions of the packages on your system:
import numpy
print 'numpy:', numpy.__version__
import scipy
print 'scipy:', scipy.__version__
import matplotlib
print 'matplotlib:', matplotlib.__version__
import sklearn
print 'scikit-learn:', sklearn.__version__
Machine Learning can be considered a subfield of Artificial Intelligence since those algorithms can be seen as building blocks to make computers learn to behave more intelligently by somehow generalizing rather that just storing and retrieving data items like a database system would do.
Most machine learning algorithms implemented in scikit-learn expect data to be stored in a two-dimensional array or matrix. The arrays can be either numpy arrays, or in some cases scipy.sparse matrices. The size of the array is expected to be [n_samples, n_features].
Features in the Iris data-set:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
Target classes to predict:
Iris Setosa
Iris Versicolour
Iris Virginica
Code:
#Loading Iris Data
from sklearn.datasets import load_iris
iris = load_iris()
iris.keys()
n_samples, n_features = iris.data.shape
print (n_samples, n_features)
print iris.data[0]
print iris.data.shape
print iris.target.shape
print iris.target_names
#Loading Digits Data
from sklearn.datasets import load_digits
digits = load_digits()
digits.keys()
n_samples, n_features = digits.data.shape
print (n_samples, n_features)
print digits.data.shape
print digits.images.shape
# Visualize the Digit data point
import matplotlib.pyplot as plt
% matplotlib inline
# set up the figure
fig = plt.figure(figsize=(6, 6)) # figure size in inches
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
# plot the digits: each image is 8x8 pixels
for i in range(64):
ax = fig.add_subplot(8, 8, i + 1, xticks=[], yticks=[])
ax.imshow(digits.images[i], cmap=plt.cm.binary, interpolation='nearest')
# label the image with the target value
ax.text(0, 7, str(digits.target[i]))
Preliminaries
Checking the installationYou can run the following code to check the versions of the packages on your system:
import numpy
print 'numpy:', numpy.__version__
import scipy
print 'scipy:', scipy.__version__
import matplotlib
print 'matplotlib:', matplotlib.__version__
import sklearn
print 'scikit-learn:', sklearn.__version__
What is Machine Learning?
Machine Learning is about building programs with tunable parameters that are adjusted automatically so as to improve their behavior by adapting to previously seen data.Machine Learning can be considered a subfield of Artificial Intelligence since those algorithms can be seen as building blocks to make computers learn to behave more intelligently by somehow generalizing rather that just storing and retrieving data items like a database system would do.
Representation of Data in Scikit-learn
Machine learning is about creating models from data: for that reason, we'll start by discussing how data can be represented in order to be understood by the computer.Most machine learning algorithms implemented in scikit-learn expect data to be stored in a two-dimensional array or matrix. The arrays can be either numpy arrays, or in some cases scipy.sparse matrices. The size of the array is expected to be [n_samples, n_features].
Loading the Data with Scikit-Learn
Scikit-learn has a very straightforward set of data loading,we will look examples of loading Iris and Digit dataset.Features in the Iris data-set:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
Target classes to predict:
Iris Setosa
Iris Versicolour
Iris Virginica
Code:
#Loading Iris Data
from sklearn.datasets import load_iris
iris = load_iris()
iris.keys()
n_samples, n_features = iris.data.shape
print (n_samples, n_features)
print iris.data[0]
print iris.data.shape
print iris.target.shape
print iris.target_names
#Loading Digits Data
from sklearn.datasets import load_digits
digits = load_digits()
digits.keys()
n_samples, n_features = digits.data.shape
print (n_samples, n_features)
print digits.data.shape
print digits.images.shape
# Visualize the Digit data point
import matplotlib.pyplot as plt
% matplotlib inline
# set up the figure
fig = plt.figure(figsize=(6, 6)) # figure size in inches
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
# plot the digits: each image is 8x8 pixels
for i in range(64):
ax = fig.add_subplot(8, 8, i + 1, xticks=[], yticks=[])
ax.imshow(digits.images[i], cmap=plt.cm.binary, interpolation='nearest')
# label the image with the target value
ax.text(0, 7, str(digits.target[i]))
Supervised Learning
In Supervised Learning, we have a dataset consisting of both features and labels. The task is to construct an estimator which is able to predict the label of an object given the set of features. A relatively simple example is predicting the species of iris given a set of measurements of its flower. This is a relatively simple task.
Supervised learning is further broken down into two categories, classification and regression. In classification, the label is discrete, while in regression, the label is continuous.
Classification- K Nearest Neighbors
K nearest neighbors (kNN) is one of the simplest learning strategies: given a new, unknown observation, look up in your reference database which ones have the closest features and assign the predominant class.
Let's try it out on our iris classification problem:
from sklearn import neighbors, datasets
iris = datasets.load_iris()
X, y = iris.data, iris.target
knn = neighbors.KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
# What kind of iris has 3cm x 5cm sepal and 4cm x 2cm petal?
print iris.target_names[knn.predict([[3, 5, 4, 2]])]
Output: ['virginica']
Classification- Support Vector Machines
Support Vector Machines (SVMs) are a powerful supervised learning algorithm used for classification or for regression. SVMs are a discriminative classifier: that is, they draw a boundary between clusters of data.
Let's try it out on our iris classification problem:
from sklearn import svm,datasets
iris = datasets.load_iris()
X, y = iris.data, iris.target
unknown_iris = [[3, 5, 4, 2]]
#Kernel could be rbf or linear or any other
clf=svm.SVC(kernel='linear')
clf.fit(X,y)
print iris.target_names[clf.predict(unknown_iris)]
Output: ['versicolor']
Regression- Linear
The simplest possible regression setting is the linear regression one: import numpy as np
import matplotlib.pyplot as plt
%matplotlib _inline
# Create some simple data
np.random.seed(0)
X = np.random.random(size=(20, 1))
y = 3 * X.squeeze() + 2 + np.random.normal(size=20)
# Fit a linear regression to it
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
model.fit(X, y)
print "Model coefficient: %.5f, and intercept: %.5f" % (model.coef_, model.intercept_)
# Plot the data and the model prediction
X_test = np.linspace(0, 1, 100)[:, np.newaxis]
y_test = model.predict(X_test)
import pylab as pl
plt.plot(X.squeeze(), y, 'o')
plt.plot(X_test.squeeze(), y_test);
No comments:
Post a Comment