Big Data Analytics and Machine Learning: February 2015

Monday 16 February 2015

Open CV with Java and Python

Last week i was very excited about computer vision and the on going activity on this field of science. So i started looking at OpenCV,i was really amazed to see what it offer. Although i have just started playing with OpenCV,so i will show in this tutorial,how to get started with OpenCV in both Java and Python.

Now its just the beginner's tutorial,as i have no problem in admitting that currently i'm a novice or beginner in the computer vision field.Next time the example will be more realistic :)

Step 1: Setting up OpenCV

I have installed it on my Laptop having Ubuntu 14.04.Please follow the steps given here for installation.

After you have installed the OpenCV,set up your IDE. I have used Eclipse-IDE.

Create a Java Project

Add External Lib and include the following jar: opencv-2410.jar
Also add native library to opencv jar in our build-path,which is located in the path: /usr/local/share/OpenCV/java

You can download the source-code from here.Hope you enjoy it!

Step 2: Coding Time with Java

HelloWorld->Lets start our first HelloWorld example with OpenCV to make sure,its working fine.

import org.opencv.core.Core;
import org.opencv.core.CvType;
import org.opencv.core.Mat;

public class HelloWorld {
public static void main(String[] args) {
System.loadLibrary( Core.NATIVE_LIBRARY_NAME );
System.out.println("Hello OpenCV..!!");
Mat mat = Mat.eye(3, 3, CvType.CV_8UC1);
System.out.println("mat = " + mat.dump());
}
}

Output:
Hello OpenCV..!!
mat = [1, 0, 0;
0, 1, 0;

0, 0, 1]

OpenCV_Drawing-> Now we have done our HelloWorld working,its time to get little fancier.Lets draw a circle filled with green over a picture.

public class OpenCV_Drawing {
public static void main(String[] args) {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
Mat mat = Highgui.imread("/home/kuntal/Pictures/rock.jpeg");
PictureFrame.bufferedImageShow(mat, "Original");
Core.circle(mat, new Point(mat.width() * 0.5, mat.height() * 0.5), 40,
new Scalar(0, 255, 0), Core.FILLED);
Core.putText(mat, "Hello World!", new Point(30, 30), 100, 1,
new Scalar(0, 0, 0));
PictureFrame.bufferedImageShow(mat, "Drawing");
}

}

Output:

OpenCV_EdgeDetect-> We will use the same picture and detect the edge of this picture.

public class OpenCV_EdgeDetect {
public static void main(String[] args) {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
Mat mat = Highgui.imread("/home/kuntal/Pictures/rock.jpeg");
PictureFrame.bufferedImageShow(mat, "Original");
int kernelSize = 3;
Mat kernel = new Mat(kernelSize, kernelSize, CvType.CV_32F) {
{
put(0, 0, 0);
put(0, 1, -1);
put(0, 2, 0);
put(1, 0, -1);
put(1, 1, 4);
put(1, 2, -1);
put(2, 0, 0);
put(2, 1, -1);
put(2, 2, 0);
}
};
Imgproc.filter2D(mat, mat, -1, kernel);
PictureFrame.bufferedImageShow(mat, "Laplacian");

}

}

Output:

OpenCV_FaceDetect-> Now Face or Object detection is a very widely used technique in modern day application.You will see how easy it is to do this OpenCv,just few lines of code. Obviously you can tune your algorithm,but default work good.

public class OpenCV_FaceDetect {
public static void main(String[] args) {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
CascadeClassifier faceDetector = new CascadeClassifier(
"/home/kuntal/knowledge/software/opencv-2.4.10/data/lbpcascades/lbpcascade_frontalface.xml");
Mat mat = Highgui.imread("/home/kuntal/Pictures/rock.jpeg");
MatOfRect faceDetections = new MatOfRect();
faceDetector.detectMultiScale(mat, faceDetections);

for (Rect rect : faceDetections.toArray()) {
Core.rectangle(mat, new Point(rect.x, rect.y), new Point(rect.x
+ rect.width, rect.y + rect.height), new Scalar(0, 255, 0));
}
PictureFrame.bufferedImageShow(mat, "faceDetection");
JOptionPane.showMessageDialog(null,
"Detected " + faceDetections.toArray().length + " faces");
}

}

Output:

Note: I have used Swing JFrame in PictureFrame class for showing picture.You can use any other.

Step 3: Coding time with Python

Testing OpenCv with python

Open terminal, then launch python interpeter:

python
then, import opencv:

import cv2
cv2.__version__

Output:
'2.4.10'

Reading Writing image(to gray Scale)

import cv2
grayImage = cv2.imread('/home/kuntal/Pictures/rock.jpeg', cv2.CV_LOAD_IMAGE_GRAYSCALE)
cv2.imwrite('/home/kuntal/Pictures/rock_modified.jpeg', grayImage)

Output: (Original and Gray Scale)

Tracking Faces with Haar Cascades Classifier

For this example you need numpy and matplotlib installed in your system.

import numpy as np
import cv2
from matplotlib import pyplot as plt

face_cascade = cv2.CascadeClassifier('/home/kuntal/knowledge/software/opencv-2.4.10/data/haarcascades/haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('/home/kuntal/knowledge/software/opencv-2.4.10/data/haarcascades/haarcascade_eye.xml')

img = cv2.imread('/home/kuntal/Pictures/rock.jpeg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(gray, 1.3, 5)

for (x,y,w,h) in faces:
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)

cv2.imshow('img',img)
cv2.waitKey(0)

cv2.destroyAllWindows()

Output:

Hope you Enjoy It !!

Sunday 15 February 2015

Building a Recommender system with Apache Mahout

Recently i was playing with Apache Mahout for building recommend-er based system. I wanted to first test state of the art collaborative filtering algorithms before to build a customized solution (potentially on top of those algorithms).Here's a basic idea behind the recommendation system using Apache Mahout:

Collaborative Filtering

It is a technique for producing recommendations solely based on the user’s preferences for products (instead of including product features and/or user properties). Well, collaborative filtering can be user- or item-based.

User-based recommendation- promotes products to the user that are bought by users who are similar to his/her.

User-based Recommendation: recommend products to a user based on what similar users have bought

Item-based recommendation- proposes products that are similar to the ones the user already buys.

Item-based Recommendation: recommend products to a user that are similar to the ones he/she already bought

User-Item Preferences and Similarity

So what does similar mean in this context? In collaborative filtering similarity between users (for user-based recommendations) or items (for item-based recommendations) is computed based on the user-item preference only. We use the number of how often a user bought a product as a proxyfor the user’s preference.

Based on these user-item preferences we can use the Euclidean distance or the Pearson correlation to determine the similarity between users respectively items (products).

Based on the Euclidean distance, two users are similar if the distance between their preference vectors projected into a Cartesian coordinate system is small.
In fact, the Pearson correlation (based on demeaned user-item preferences) coincides with the cosine of the angle between the preference vectors. That is, two users are similar if the angle between their preference vectors is small, or formulated in terms of correlation, two users are similar if they rate the same products high and other products low.
The Tanimoto similarity between 2 users is computed as the number of products the 2 users have in common divided by the total number of products they bought (respectively clicked or viewed) overall.

Now lets implement the above ideas -Coding Time:

Lets start playing by building a simple recommendation engine based on the movie lens data.

To see a recommender engine in action, you can for download one of the movie Lens ratings data sets (I will show with one million ratings). Unzip the archive somewhere. The file that will interest you is u.data. Its format(separated by tab) is as follows:

userId | movieId | rating | timestamp

I have modified the file for mahout taste FileDataModel with the simple following format:

userId,movieId,rating

Sample data:

196,242,3
186,302,3
22,377,1
244,51,2
166,346,1
298,474,4
115,265,2
253,465,5

305,451,3
.....

Let's build a classic user based recommender algorithm using the Pearson correlation similarity with a nearest 10 users neighborhood with the code below:

public class UserRecommenderPlaying {

public static void main(String[] args) throws TasteException, IOException {
// specifying the user id to which the recommendations have to be generated for
int userId=6;

//specifying the number of recommendations to be generated
int noOfRecommendations=5;

//Get the dataset using FileData Model
DataModel model = new FileDataModel(new File("/home/kuntal/knowledge/IDE/workspace/MahoutTest/data/rating.csv"));

//Use a pearson similarity algorithm
UserSimilarity similarity = new PearsonCorrelationSimilarity (model);

/*NearestNUserNeighborhood is preferred in situations where we need to have control on the exact no of neighbors*/
UserNeighborhood neighborhood = new NearestNUserNeighborhood (10, similarity, model);

/*Initalizing the recommender */
Recommender recommender = new GenericUserBasedRecommender ( model, neighborhood, similarity);

//calling the recommend method to generate recommendations
List<RecommendedItem> recommendations = recommender.recommend(userId, noOfRecommendations);

for (RecommendedItem recommendedItem : recommendations) {
System.out.println("Recommended Movie Id: "+recommendedItem.getItemID()+" .Strength of Preference: "+recommendedItem.getValue());
}

}
}

Output:
Recommended Movie Id: 878 .Strength of Preference: 4.464102
Recommended Movie Id: 300 .Strength of Preference: 4.2047677
Recommended Movie Id: 322 .Strength of Preference: 4.0203676
Recommended Movie Id: 313 .Strength of Preference: 4.008741
Recommended Movie Id: 689 .Strength of Preference: 4.0

Let's build a classic item based recommender algorithm using the Pearson correlation similarity with the code below:

public class ItemRecommenderPlaying {

public static void main(String args[])throws TasteException, IOException {
// specifying the user id to which the recommendations have to be generated for
int userId=308;

//specifying the number of recommendations to be generated
int noOfRecommendations=3;

// Data model created to accept the input file
FileDataModel dataModel = new FileDataModel(new File("/home/kuntal/knowledge/IDE/workspace/MahoutTest/data/rating.csv"));

/*Specifies the Similarity algorithm*/
ItemSimilarity itemSimilarity = new PearsonCorrelationSimilarity(dataModel);

/*Initalizing the recommender */
ItemBasedRecommender recommender =new GenericItemBasedRecommender(dataModel, itemSimilarity);

//calling the recommend method to generate recommendations
List<RecommendedItem> recommendations =recommender.recommend(userId, noOfRecommendations);

for (RecommendedItem recommendedItem : recommendations)
System.out.println("Recommended Movie Id: "+recommendedItem.getItemID()+" .Strength of Preference: "+recommendedItem.getValue());

}
}

Output:
Recommended Movie Id: 245 .Strength of Preference: 5.0
Recommended Movie Id: 34 .Strength of Preference: 5.0
Recommended Movie Id: 35 .Strength of Preference: 5.0

Evaluation of the Algorithms:

In my opinion the most valuable part of the whole process is evaluating your algorithm/model. To feel immediately if your intuition of choosing a particular algorithm is a good one, or to see the good or bad impact of your own customized algorithm, you need a way to evaluate and compare them on the data.
You can easily do that with mahout RecommenderEvaluator interface. Two different implementations of that interface are given: AverageAbsoluteDifferenceRecommenderEvaluator and RMSRecommenderEvaluator. The first one is the average absolute difference between predicted and actual ratings for users and the second one is the classic RMSE (a.k.a. RMSD).

One way to check whether the recommender returns good results is by doing a hold-out test. We partition our dataset into two sets: a training-set consisting of 90% of the data and a test-set consisting of 10%. Then we train our recommender using the training set and look how well it predicts the unknown interactions in the testset.

public class EvaluationUserExample{

public static void main(String[] args) throws IOException, TasteException, OptionException {

RecommenderBuilder builder = new RecommenderBuilder() {
public Recommender buildRecommender(DataModel model) throws TasteException{
UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
//Splitting of data(.1) done using 90% in training-set & 10% test-set
UserNeighborhood neighborhood = new ThresholdUserNeighborhood (.1, similarity, model);
Recommender recommender = new GenericUserBasedRecommender ( model, neighborhood, similarity);
return new CachingRecommender(recommender);
}
};

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
DataModel model = new FileDataModel(new File("/home/kuntal/knowledge/IDE/workspace/MahoutTest/data/rating.csv"));
/*0.9 here represents the percentage of each user’s preferences to use to produce recommendations, the rest are compared to estimated preference values to evaluate.
1 represent the percentage of users to use in evaluation (so here all users).*/
double score = evaluator.evaluate(builder,null, model,0.9,1);

System.out.println("Result: "+score);
}
}

Output:
Result: 0.8018675119933131

Note: if you run this test multiple times, you will get different results, because the splitting into trainingset and testset is done randomly.

All codes available at github.

Saturday 14 February 2015

Delay or Scheduled Message Delivery with RabbitMQ

In this tutorial,i will show you the logic of Delay or Scheduled message delivery with RabbitMQ. And how to implement it through java. For Real world example/usecase of RabbitMQ,please go through this article.

Sometimes you don’t want messages in the queue to be read or delivered immediately. For example, while processing fax message if it fail due to network error,then no meaning of immediate retry,hence delay in this type of scenario will be useful.Fortunately, RabbitMQ 2.8+ introduced Dead Letter Exchanges (DLX), which allows us to simulate message scheduling.

In practice, if we wanted to enable retry on failure every 3 minutes, the flow would look like this:

Create Work Queue and bind it to Work Exchange.
Create Delay Queue and bind it to Delay Exchange.
Set x-dead-letter-exchange to Work Exchange.
Set x-message-ttl to 180000 ms (3 minutes) to Delay Queue .
Publish message to Work Queue.
Client reads message from Work Queue and attempts to process it.
If the message processing fails and client publishes to Delay Queue.
Messages stays in Delay Queue for 3 minutes.
When message ttl expires, it is re-queued to Work Queue via Work Exchange for another attempt at processing.
Repeat steps 4-7

Create the Work Queue:

private String WORK_QUEUE = "WorkQueue";

private String WORK_EXCHANGE = "WorkExchange";

//Create your connection factory for getting connection and channel

ConnectionFactory factory = new ConnectionFactory();

factory.setHost("localhost");

Connection connection = factory.newConnection();

Channel channel = connection.createChannel();

//declare Work Exchange and Work Queue,finally bind Work Queue to Work Exchange

channel.exchangeDeclare(WORK_EXCHANGE, "direct", true);

channel.queueDeclare(WORK_QUEUE, true, false, false, null);

channel.queueBind(WORK_QUEUE, WORK_EXCHANGE,"RK", null);

Create the Delay Queue:

private String DELAY_QUEUE ="DelayQueue";

private String DELAY_EXCHANGE = "DelayExchange";

//Make Delay Queue's Dead Letter Exchange to Work Exchange,so that after message ttl expires the message are sent to Work Queue via Work Exchange.

Map<String, Object> args = new HashMap<String, Object>();

args.put("x-dead-letter-exchange", WORK_EXCHANGE);

args.put("x-message-ttl", 180000);

//declare Delay Exchange and Delay Queue,finally bind Delay Queue to Delay Exchange

channel.exchangeDeclare(DELAY_EXCHANGE, direct, true);

channel.queueDeclare(DELAY_QUEUE, true, false, false,args);

channel.queueBind(DELAY_QUEUE, DELAY_EXCHANGE, "RK", null);

Read from Work Queue:

QueueingConsumer consumer = new QueueingConsumer(channel);

channel.basicConsume(WORK_QUEUE, true, consumer);

while (true) {

QueueingConsumer.Delivery delivery = consumer.nextDelivery();

String message = new String(delivery.getBody());

if (!processSomething(message)) {

processLater(message);

}

Publish to Delay Queue on message processing failure:

String message = new String(delivery.getBody());

channel.basicPublish(DELAY_EXCHANGE, "", null, message.getBytes());

Notes:

It's worth mentioning that Delay Queue mechanishm gurantee that the message will be delivered atleast after the delay time,but not exactly after delay time is over.

Refer: http://www.rabbitmq.com/dlx.html and http://www.rabbitmq.com/ttl.html

Real World Example of RabbitMQ - Universal Message Queue

Last year i was developing a message queue based application for our company to be used by different other products/application.Since it's purpose was to be very scalable and also to be used by various other application, so we named it Universal Message Queue (UMQ).

Why UMQ came to the picture?

Earlier database was used as queue for certain scenario. But Database is not a good choice for huge queuing functionality. A heavy loaded database with additional queuing functionality will affect the performance of an application. Database should not be used for queuing purpose because of following:

Database inherently does not support queue functionality
Using Database as queue increases load on the database, hence affect the overall application performance.
Implementing some new functionality of queue (such as priority, Delay, Tracking etc) through database are complex and not well proven.

To overcome the above issues, UMQ is good choice. UMQ is a generalized message queue system.
It is based on rest services. Beside eliminating the issues of database as a queuing system, UMQ
has the important additional features that are useful in different scenario for various application as
stated below.

Some of the major components that were used for the UMQ are RabbitMQ and Redis.

Truly speaking we have done lots of poc & rnd with various opensource message Queue's before developing this internal product,but RabbitMQ was able to full-fill our above needs very well.

Some of the fetaures of RabbitMQ are very well, like Routing logic based on Exchange and Queue mechanism, Prioity(through its Plugin) ,Negative Acknowledgement(NACK) with/without Requeing and Delay or Schedule Message delivery.

So in the next tutorial,i will give you the idea of Delay/Schedule message delivery along with how to implement the logic through java.

Big Data Analytics and Machine Learning