Big Data Analytics and Machine Learning: September 2014

Monday 29 September 2014

Application Log analysis with Flume Hive and Log4j

Overview
In this tutorial you will learn how to integrate your application log with flume agent through Log4j,then store it in hdfs path and later on analysis it with hive.

Sometimes the events from the applications have to be analyzed to know more about the customer behavior for recommendations or to figure any fraudulent use cases. With more data to analyze, it might take a lot of time or some times even not possible to process the events on a single machine. This is where distributed systems like Hadoop and others comes into play..

Apache Flume can be used to move the data from the source to the sink.One of the option is to make the application use Log4J to send the log events to a Flume sink which will store them in HDFS for further analysis.Flume has a log4j appender that can directly send the log events from your application to flume agents. Lets get started.

Flume Configuration

Lets create file name log4j-test.conf under $FLUME_HOME/conf directory

log4j-test.conf

# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = hdfs-sink1

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory

# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.hdfs-sink1.type = hdfs
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://localhost:9000/user/flume/logevents/

agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream
agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text

# Number of seconds to wait before rolling current file (in seconds)
agent1.sinks.hdfs-sink1.hdfs.rollInterval=0

# File size to trigger roll, in bytes
agent1.sinks.hdfs-sink1.hdfs.rollSize = 500

# never roll based on number of events
agent1.sinks.hdfs-sink1.hdfs.rollCount = 0

# Timeout after which inactive files get closed (in seconds)
agent1.sinks.hdfs-sink1.hdfs.idleTimeout = 3600

#chain the different components together
agent1.sinks.hdfs-sink1.channel = ch1
agent1.sources.avro-source1.channels = ch1

Note: You need to configure based on your requirement.Above configuration is simple and general way of demonstrating the tutorial.

Develop Application

Now lets create a simple application that will log its event directly to flume agent through Log4j appenders.

Following Jars highlighted are required inside your application.( All jars are available inside flume distribution)

Remember Flume log4j appender only works with INFO,WARN and ERROR level,not with DEBUG level (may be its a bug,but its designed with thought about production usage,hence debug is not applicable). So here's how it works.

Let say you have Base Logger Class, through which your are logging in your application:

package com.test.base.logging;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public final class FlumeLogger {
public final static Logger l = LoggerFactory.getLogger(FlumeLogger.class.getName());

}

And say this is your application class,where logging is used:

package com.test.generate.log;

import com.test.base.logging.FlumeLogger;

public class ApplicationLog {

public static void main(String[] args) {
for(int i=0;i<10;i++){
//debug level wont log any event to flume
#FlumeLogger.l.debug("Test msg : "+i);
FlumeLogger.l.info("Test msg : "+i);
}
}

}

In-order to make your logging event go straight to flume agent,you need to have this following things in your application's log4j properties file:

#Remove Flume appender from rootLogger,otherwise same log event will be written twice.
#log4j.rootLogger = INFO, flume
log4j.rootLogger = INFO

# Define the the Base Logger Class with package. DEBUG level wont work.Default level is INFO,but ERROR, WARN will work too.
#log4j.logger.com.test.base.logging.FlumeLogger = DEBUG, flume
log4j.logger.com.test.base.logging.FlumeLogger = INFO, flume

# Define the flume appender
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = localhost
log4j.appender.flume.Port = 41414
log4j.appender.flume.UnsafeMode = false
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
#Add this Conversion Pattern so that we can take the input from Flume into Hive based on this pattern
log4j.appender.flume.layout.ConversionPattern= %d | %p | %c | %m%n

Now the application is ready lets start our flume agent. First start your hdfs cluster,then start your flume agents from $FLUME_HOME/bin on the node where your application is running by using following command:

./flume-ng agent --conf conf --conf-file /home/kuntal/practice/BigData/apache-flume-1.5.0/conf/log4j-test.conf --name agent1 -Dflume.root.logger=INFO,console

Start your application and check the hdfs flume directory user/flume/logevents

As the events are flowing nicely from Application -> Flume Agents -> HDFS.

We are all set to analyse this data using Hive.

Hive Table Creation

The application log format will be like:
2014-09-29 12:57:41,222 | INFO | com.test.generate.log.ApplicationLog | Test msg : 0

So now lets create Hive external table to parse and analyse this log format.

Go to $HIVE_HOME/bin and start hive (./hive)

CREATE EXTERNAL TABLE logevents (timestamp STRING, level STRING, className STRING, message STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE

LOCATION 'hdfs://localhost:9000/user/flume/logevents';

And now its time to query some data.

select * from logevents;

Future Considerations
As you can see, this example saves log file according to hours. This will only work for short term or transactional data that need to be referenced often. The problem here is the number of files. At some point, it will be necessary to aggregate some of this data into fewer files, which will make it just a little harder to reference. Just something to think about.

Once you have the data in Hive, then you will need to consider the next level of detail, what to do with all this interesting data. Depending upon who will view the data and how it will be used, you will need to set something up for your users.

Conclusion
I hope you have found this tutorial useful, and it has helped to clear up some of the mystery of managing Log Files with Flume and Hive.

In my upcoming tutorials on Real world Log analysis, i will show how to do real time analysis of apache server log with flume and hive.Also there will be some customization of flume and hive to suit your need.
Also you will learn how to analyse the data with pig and graphically represents it using GnuPlot.

So Stay tuned for some big data adventure !!

Sunday 28 September 2014

Maven Multiple Environment Build and Set Up

One of the fundamental requirements of a build process is that it must be possible to use different configuration for binaries, which are used in different environments. If you are using Maven, it is possible to fulfill this requirement by using build profiles.

This can be done either by using maven ant-run plugin as shown here. But a more robust and maven specific way of doing is to have a separate profile for development, testing and production purposes.Each profile must have its own configuration file.

These requirements can be fulfilled by following these steps:

Creating profile directories and profile specific configuration files.
Adding build profile configuration to the POM file.

These steps are described with more details in following Sections.

Creating Profile Specific Configuration Files

First, I created profiles directory under the project’s resources directory. After that was done I created 3 config files for dev,prod and test environment. Here's the project structure:

Adding the Build Profile Configuration To Pom.xml

Second,add the build profile configuration to the POM file. The profile configuration section of the POM file is given (with comments highlighted) below:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.test</groupId>
<artifactId>DemoWebApp</artifactId>
<packaging>war</packaging>
<version>1.0-SNAPSHOT</version>
<name>DemoWebApp Maven Webapp</name>
<url>http://maven.apache.org</url>


<profiles>
<profile>
<id>dev</id>
      
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
    
<env>dev</env>
</properties>

<build>
<filters>
   
<filter>src/main/resources/profiles/config-${env}.properties</filter>
</filters>
<resources>
  
<resource>
<filtering>true</filtering>
<directory>src/main/resources</directory>
    

</resource>
</resources>
</build>
</profile>
<profile>

    <id>test</id>

<properties>
  
<env>test</env>
</properties>

<build>
<filters>
       
<filter>src/main/resources/profiles/config-${env}.properties</filter>
</filters>
<resources>
    
<resource>
<filtering>true</filtering>
<directory>src/main/resources</directory>
       

</resource>
</resources>
</build>
</profile>
<profile>

    <id>prod</id>

<properties>
    
<env>prod</env>
</properties>

<build>
<filters>

<filter>src/main/resources/profiles/config-${env}.properties</filter>
</filters>
<resources>
      
<resource>
<filtering>true</filtering>
<directory>src/main/resources</directory>
   

</resource>
</resources>
</build>
</profile>

</profiles>
</project>

Now we have created the needed profile specific configuration files and configured Maven to use them. As explained earlier, the configuration file contain only one property called jdbc.url, which specifies the database url . Now this value will be used in spring-bean.xml file during maven build based on profile you pass.

Note: Spring dependency are not included in the pom.The tutorial just focus on demonstrating multi environment project set up with Maven.

Here's the content of different config files:

config-dev.properties
jdbc.url=jdbc:dev

config-prod.properties
jdbc.url=jdbc:prod

config-test.properties
jdbc.url=jdbc:test

Maven Before processing (spring-bean.xml):

<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="url" value="${jdbc.url}"/>
</bean>

Now build your project based on your requirement for different environment (for example prod):

mvn clean package -P prod

Maven after processing (spring-bean.xml)

<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
  <property name="url" value="jdbc:prod" />
</bean>

I hope this tutorial was informative. There does not seem to be too much information on how to do this sort of thing multiple environment build with Maven, perhaps because this process is not standard and teams and organizations have evolved different strategies to deal with this problem.

Reference:
http://maven.apache.org/guides/mini/guide-building-for-different-environments.html
http://maven.apache.org/guides/introduction/introduction-to-profiles.html

Saturday 27 September 2014

Developing Identity Management System with CAS

Section 1. Before you start

In order to follow along with this tutorial, you should have a basic understanding of Session, Cookies, Database, LDAP, J2ee Spring framework and related concepts like Identity Management System and basic security concepts.
About this tutorial

This tutorial, explains how to develop a single sign on based identity management system with CAS server and CAS client. This will also show how you can traverse the resources of multiple applications with single login mechanism.

Objectives
In this tutorial, learn how to:

Set up, configure and build CAS application as an Identity provider with Apache on Ubuntu.
Integrate Ldap directory and Derby (embedded DB) with CAS.
Create two client applications to test the SSO feature provided by CAS.

Prerequisites

This tutorial assumes familiarity with some basic concepts of Identity Management and Single Sign .

CAS is a spring framework based J2ee web application that can run on any web container(i.e tomcat) or any application server . Thus it does not require any OS specific installations. This tutorial will show the pre requisite configurations in Windows system.

You can also refer to Part2 of this series and use the Linux bases configurations.

Downloads

Apache Tomcat 7.
http://tomcat.apache.org/download-70.cgi

Apache Derby.
http://db.apache.org/derby/derby_downloads.html

CAS
http://www.jasig.org/cas/download

Open DS v 2.3.0-build003
https://java.net/projects/opends/downloads

Set Up SSL.

Set environment variable JAVA_HOME and PATH.
Using command prompt run java’s keytool utility to generate a self-signed certificate.

keytool -genkey -alias cascert -keyalg RSA -keystore c:\partha\keystore\cascert.jks

You will be asked to give a few information along with a keystore password while creating the certificate.

To configure https in tomcat server, modify server.xml inside tomcat\conf directory.

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="150" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS"
keystoreFile="c:/partha/keystore/cascert.jks"
keystoreType="JKS"
keystorePass="your given keystore password"
keyPass="your given password" />

You can also copy the certificate to tomcat/conf directory and mention conf/ cascert.jks as keystoreFile value.

Do not load the APR library, comment out the AprLifecycleListener.

Set Up Derby Embedded DB
Apache Derby is a pure Java relational database engine using standard SQL and JDBC as its APIs. More information about Derby can be found on the Apache web site. Derby functionality includes:

Embedded engine with JDBC drivers
Network Server
Network client JDBC drivers
Command line tools: ij (SQL scripting), dblook (schema dump) and sysinfo (system info)

Unzip apache derby in a directory and set DERBY_HOME to the directory . Also append PATH variable with ${ DERBY_HOME}/bin.

Execute the ‘startNetworkServer.bat’ file to start derby server listening to default port 1527.

Set Up Open-Ds

Download Open DS and unzip it in folder. Run the setup.bat file to open the installation wizard.

Proceed accordingly by specifying Hostname, LdapConnectorPort, AdmistratorConnector port ,RootUserDN etc.

For more installation info can be found at https://java.net/projects/opends/pages/2_4_OpenDSInstallationGuide

To monitor and browse the LDAP directory, one can use the Apache Directory Ldap browser (update site : http://directory.apache.org/studio/update/2.x/).
Or use the PHPldapadmin shown in Part-1 of the series.

Section 2. Introduction

What is CAS?

As specified in the documentation “CAS is a multiprotocol Web single sign-on (SSO) product composed of a single logical server component that services authentication requests from multiple clients that communicate via one or more supported protocols.” explains the sole purpose and usefulness of a robust J2ee based Identity provider.

CAS is composed of two main components:

CAS Server: The CAS server is a web application containing servlets built on the Spring Framework whose primary responsibility is to authenticate users and grant access to CAS-enabled services, commonly called CAS clients. It creates a SSO session on authenticating user and generates a ticket-granting ticket (TGT) upon successful login. A service ticket (ST) is issued to a service at the user's request via browser redirects using the TGT as a token.

CAS Client: A CAS client is any CASfied or CAS-enabled application that can communicate with the server via a supported protocol. A CAS client is also a software package that can be integrated with various software platforms (.NET,java,Python,Ruby etc) and applications in order to communicate with the CAS server .

Why CAS?

Check out the following points, which makes CAS at the top of the identity provider’s list and the best choice to an architect.

Modern & Robust : CAS ‘s web tier is built on robust Spring framework (MVC/Webflow) , which is flexible and easy to configure. It is quite transparent and can be customized accordingly.

Multiple protocol support: By default CAS works on it’s own protocol CAS 1.0,2.0 & 3.0 . Apart from these it supports SAML 1.1, Oauth 1.1,2.0, OpenID etc. Can also integrate with Google App Engine.

Multiple Authentication Handlers: The CAS server delegates authentication decisions to any number of supported authentication mechanisms including LDAP/Active Directory, Kerberos, and RDBMS.

Cross Platform Support for CAS Client: The CAS client is available in many platform like(.NET, Php,Ruby,Python,java etc).It exponentially increases the horizon of support CAS can provide.
Section 3. Set up and Configuration

CAS Set Up

CAS installation is a fundamentally source-oriented process , one have to build the CAS source code to generate a WAR. Thus it’s a web application project , so it requires a web container to run.
Download the latest release of CAS.

Inside CAS distribution base directory one can find a parent pom.xml file linked to other 22 child modules. Some important modules are specified below.
Cas-server-core: It is the core module implementing the login logout SSO functionality. It contains the basic authentication handlers and ticket generating classes to implement the core functionality of SSO.
Cas-server-webapp : This is the web module containing the deployment descriptor and the servlets.The deployment WAR is built under this module.
Cas-server-support-ldap : Contains the LDAP authentication handler , required for authentication a credential.

Cas-server-support-jdbc : Contains classes to authenticate user from a RDBMS store.

CAS Configuration

Ticket Generation
CAS server has a memory based default ticket registry, which is not suitable in production or high availability environment. To meet this issue CAS can be configured to use distributed caching framework like MemCache or Terracotta.
Memcache is works on hashing, node location strategy and object serialization mechanisms. One can set up distributed cache using memcache and use org.jasig.cas.ticket.registry.MemCacheTicketRegistry class present inside CAS server.
This article is going to show a more reliable ticket registry i.e JDBCTicketRegistry and its configurations.

CAS server uses JPA to persist tickets across RDBMS store. Here Hibernate is used as an implementation for JPA and Derby as a RDBMS store.

** We are going to use derby ,an embedded DB as a database storage. Derby jdbc client driver can be found in derbyclient.jar. So add the following dependency in pom.xml present inside cas-server-webapp module. This is required to connect to derby and create the datasource.

<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derbyclient</artifactId>
<version>10.10.1.1</version>
</dependency>

Also add the following dependency for Hibernate implementation of JPA. Specify the version as 4.1.0 or the latest one.

<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-core</artifactId>
<version>${hibernate.core.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>${hibernate.core.version}</version>
<scope>runtime</scope>
</dependency>

Inside the Cas-server-webapp package navigate to package spring-configuration present inside WEB-INF. Edit ticketRegistry.xml to remove DefaultTicketRegistry and add JpaTicketRegistry.

<bean id="ticketRegistry" class="org.jasig.cas.ticket.registry.JpaTicketRegistry" />

<bean class="org.springframework.orm.jpa.support.PersistenceAnnotationBeanPostProcessor"/>

Create a datasource object to connect to derby.

<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource"
p:driverClassName="org.apache.derby.jdbc.ClientDriver"
p:url="jdbc:derby://localhost:1527/CASDB;create=true"
p:username="partha"
p:password="123"
/>

Spring orm ‘s LocalContainerEntityManagerFactoryBean is used to configure JPA. The property packagesToScan is used to specify the base packages of the mapped model classes which are to be persisted. On setting packagesToScan property, the required persistence.xml can be bypassed.

<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource"/>
<property name="packagesToScan">
<list>
<value>org.jasig.cas.services</value>
<value>org.jasig.cas.ticket</value>
</list>
</property>
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
<property name="generateDdl" value="true"/>
<property name="showSql" value="true" />
</bean>
</property>
<property name="jpaProperties">
<props>
<prop key="hibernate.dialect">org.hibernate.dialect.DerbyDialect</prop>
<prop key="hibernate.hbm2ddl.auto">update</prop>
</props>
</property>
</bean>

Enable transaction on entity manager.

<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager"
p:entityManagerFactory-ref="entityManagerFactory" />

<tx:annotation-driven transaction-manager="transactionManager" />

The ticketRegistryCleaner cleaner is scheduled on quartz, which runs periodically to clear the unused or invalidated ticket from the registry.

<bean id="ticketRegistryCleaner" class="org.jasig.cas.ticket.registry.support.DefaultTicketRegistryCleaner"
p:ticketRegistry-ref="ticketRegistry"
p:logoutManager-ref="logoutManager"
p:lock-ref="cleanerLock" />

<bean id="cleanerLock" class="org.jasig.cas.ticket.registry.support.JpaLockingStrategy"
p:uniqueId="${host.name}"
p:applicationId="cas-ticket-registry-cleaner" />

Configuring Authentication Manager (LDAP).

The AuthenticationManager interface present in org.jasig.cas.authentication packageis responsible for validating the provided credentials using its Authenticate method. It accepts the credentials and delegates authentication to configured AuthenticationHandler components.
PolicyBasedAuthenticationManager is an implementation of AuthenticationManager interface which authenticates the credentials by iterating over all configured authentication handlers. There can be multiple handlers configured with a defined security policy. On violation of any security policy AuthenticationException is thrown.

We are going to configure LdapAuthenticationHandler present inside cas-server-support-ldap module. Edit the deployerConfigContext.xml inside WEB-INF of the source folder present inside cas-server-webapp module.

<bean id="ldapAuthenticationHandler"
class="org.jasig.cas.authentication.LdapAuthenticationHandler"
p:principalIdAttribute="ou"
c:authenticator-ref="authenticator">
<property name="principalAttributeMap">
<map>
<entry key="member" value="member" />
<entry key="mail" value="mail" />
<entry key="displayName" value="displayName" />
</map>
</property>
</bean>

<bean id="dnResolver"
class="org.ldaptive.auth.FormatDnResolver"
c:format="ou=%s,${ldap.authn.baseDn}" />

<bean id="authHandler" class="org.ldaptive.auth.PooledBindAuthenticationHandler"
p:connectionFactory-ref="pooledLdapConnectionFactory" />

<bean id="pooledLdapConnectionFactory"
class="org.ldaptive.pool.PooledConnectionFactory"
p:connectionPool-ref="connectionPool" />

<bean id="connectionPool"
class="org.ldaptive.pool.BlockingConnectionPool"
init-method="initialize"
p:poolConfig-ref="ldapPoolConfig"
p:blockWaitTime="${ldap.pool.blockWaitTime}"
p:validator-ref="searchValidator"
p:pruneStrategy-ref="pruneStrategy"
p:connectionFactory-ref="connectionFactory" />

<bean id="ldapPoolConfig" class="org.ldaptive.pool.PoolConfig"
p:minPoolSize="${ldap.pool.minSize}"
p:maxPoolSize="${ldap.pool.maxSize}"
p:validateOnCheckOut="${ldap.pool.validateOnCheckout}"
p:validatePeriodically="${ldap.pool.validatePeriodically}"
p:validatePeriod="${ldap.pool.validatePeriod}" />

<bean id="connectionFactory" class="org.ldaptive.DefaultConnectionFactory"
p:connectionConfig-ref="connectionConfig" />

<bean id="connectionConfig" class="org.ldaptive.ConnectionConfig"
p:ldapUrl="${ldap.url}"
p:connectTimeout="${ldap.connectTimeout}"
p:useStartTLS="${ldap.useStartTLS}"/>

<bean id="pruneStrategy" class="org.ldaptive.pool.IdlePruneStrategy"
p:prunePeriod="${ldap.pool.prunePeriod}"
p:idleTime="${ldap.pool.idleTime}" />

<bean id="searchValidator" class="org.ldaptive.pool.SearchValidator" />



<bean id="authenticationManager" class="org.jasig.cas.authentication.PolicyBasedAuthenticationManager">
<constructor-arg>
<map>
<entry key-ref="proxyAuthenticationHandler" value-ref="proxyPrincipalResolver" />
<entry key-ref="ldapAuthenticationHandler" value-ref="primaryPrincipalResolver" />
</map>
</constructor-arg>
<property name="authenticationPolicy">
<bean class="org.jasig.cas.authentication.AnyAuthenticationPolicy" />
</property>
</bean>

One can set the values of the above variables in the cas.properties file, which is loaded by spring.

#LDAP Location
ldap.url=ldap://localhost:389

# LDAP connection timeout in milliseconds
ldap.connectTimeout=3000

# Amount of time in milliseconds to block on pool exhausted condition
# before giving up.
ldap.pool.blockWaitTime=3000

# Whether to use StartTLS (probably needed if not SSL connection)
ldap.useStartTLS=false

# Frequency of connection validation in seconds
# Only applies if validatePeriodically=true
ldap.pool.validatePeriod=300

# Attempt to prune connections every N seconds
ldap.pool.prunePeriod=300
# Maximum amount of time an idle connection is allowed to be in
# pool before it is liable to be removed/destroyed
ldap.pool.idleTime=600

#========================================
# LDAP connection pool configuration
#========================================
ldap.pool.minSize=3
ldap.pool.maxSize=10
ldap.pool.validateOnCheckout=false
ldap.pool.validatePeriodically=true
#========================================
# Authentication
#========================================
# Base DN of users to be authenticated
ldap.authn.baseDn=dc=example,dc=com

Set up a LDAP directory as required and specify the ldap.authn.baseDn attribute in cas.properties file accordingly. Also specify principalIdAttribute in ldapAuthenticationHandler .

Section 4. Build and Deploy CAS Server

After configuring build the pom.xml in cas-server-webapp once again.
Run-> mvn clean install command on the root POM or the cas-server-webapp pom file.

Copy the cas.war file generated under the target directory of cas-server-webapp module to tomcat’s webapp directory .

Start the server and visit url https://localhost:8443/cas

Section 5. Creating CAS Client Application

CAS client is a application that uses the CAS service as an identity provider and authenticates its users across platforms.

To demonstrate SSO functionality we are going to create two web based J2ee applications and authenticate users using CAS.

To create a CAS client application the cas-client jar is required.It consists of an Authentication filter and a Ticket validation filter.

Add the following dependecy in pom.xml to get the CAS client.
<dependency>
<groupId>org.jasig.cas</groupId>
<artifactId>cas-client-core</artifactId>
<version>3.1.10</version>
</dependency>

In the web.xml add org.jasig.cas.client.authentication.AuthenticationFilter and the org.jasig.cas.client.validation.Cas10TicketValidationFilter .

<filter>
<filter-name>CAS Filter</filter-name>
<filter-class>org.jasig.cas.client.authentication.AuthenticationFilter</filter-class>
<init-param>
<param-name>casServerLoginUrl</param-name>
<param-value>https://cas.partha.com:8443/cas/login</param-value>
</init-param>
<init-param>
<param-name>serverName</param-name>
<param-value>https://cas.partha.com:8443</param-value>
</init-param>
<init-param>
<param-name>renew</param-name>
<param-value>false</param-value>
</init-param>
</filter>

The serverName corresponds to the server location on which the client application is running. In this example both the CAS server and its client are in the same web container.

<filter>
<filter-name>CAS Validation Filter</filter-name>
<filter-class>org.jasig.cas.client.validation.Cas10TicketValidationFilter</filter-class>
<init-param>
<param-name>casServerUrlPrefix</param-name>
<param-value>https://cas.partha.com:8443/cas</param-value>
</init-param>
<init-param>
<param-name>serverName</param-name>
<param-value>https://cas.partha.com:8443</param-value>
</init-param>
<init-param>
<param-name>redirectAfterValidation</param-name>
<param-value>true</param-value>
</init-param>
<init-param>
<param-name>renew</param-name>
<param-value>false</param-value>
</init-param>
</filter>
Apart from these the HttpServletRequestWrapperFilter and AssertionThreadLocalFilter need to be configured.
To enable Single Sign Out load the SingleSignOutHttpSessionListener listener class through web.xml.

Additionally a simple java servlet is required to check the availability of remote user and accordingly take decision to proceed further on successful authentication or show error msg. This completely depends upon the application’s purpose.

We need to create two similar applications:

casclient-appA : https://cas.partha.com:8443/cas-appA/

casclient-appB : https://cas.partha.com:8443/cas-appB/

On hitting the first url it’s going to get redirect to CAS login page, and on successful login it hits the servlet and displays message. Now hit the second one, it will get automatically logged in. If one logs out , other logs out too.

Section 6. Tuning and Common Issues

1. Session Management: Disable sticky session and configure shared session when
Application server running CAS is to be clustered among multiple JVM’s. These high availability options are fully dependent on the application server and are affected by its limitation.
2. Logout: SLO (Single Log out) is by default enabled in CAS server. To override these property specify slo.callbacks.disabled to true in cas.properties file. The SLO is maintained by LogoutManagerImpl class , make sure its present in applicationContext.xml

3. High Availability: CAS provided high availability for Ldap servers by providing multiple url , but its recommended to use a load balancer and provide a virtual IP for the ldap servers. This is also applicable while creating the data source for RDBMS store while configuring the JdbcTicketRegistry. It is advised to use SCAN address (in case of grid) or load balanced virtual IP to connect to the RDBMS.

Section 7. Conclusion and Resource
Conclusion

In this tutorial one can learn the basics of Identity Management System and how to implement this using CAS. It gives a fair overview of informations required to configure CAS, though actual production configuration depends on specific production requirements and the complexity of the underline systems.

Big Data Analytics and Machine Learning