Machine learning has been a buzz word in recent times. I often hear people talking about this and their profound interest to learn more and gain expertise in this field. As a consequence, many programming languages are now having machine learning libraries available in their toolbox. For example, Java has ML libraries like WEKA, C++ has Encog (Encog is available for Java as well), PyBrain for Python etc.  These libraries come in handy and obviates the need to write basic algorithms from scratch. One such tool is WEKA (http://www.cs.waikato.ac.nz/ml/weka/)-  A data mining software in Java. You can use it as a standalone application or use it's APIs in your own Java program.

 

In this article, I will show you how to use WEKA library in ColdFusion. I am going to write a very simple Java program and add wrappers to it in ColdFusion. These wrappers can then be called from ColdFusion pages. In short, I am making the WEKA library available to ColdFusion. 

 

I will be using weather dataset located at data/weather.numeric.arff  in downloaded weka package. Weka uses ARFF format for datasets. Below is the dataset

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no


The above dataset contains five attributes namely outlook , temperature, humidity, windy & play. outlook is a categorical variable(it can take only three values - sunny,overcast, rainy) temperature and humidity are numeric variables, windy is again a categorical variable(it can take two values - TRUE, FALSE) and play is a categorical variable(it can take two values - yes,no). play is the class variable in our problem which means we have to predict the value of this variable for given test datapoint which contains remaining four values. 

 

So the problem at hand is a classification problem. We are given a training dataset which can be used to train a classifier and then use this model to predict the label for a test datapoint. There are many classifiers which can be used to solve this problem. But I am going to use naïve Bayes to solve this problem. It's a very simple classifier based on probability and it assumes that features are independent of one another. This means that the value outlook takes is independent of the values other attributes take.  I will not explain the mathematical background of naïve Bayes but rather provide an implementation for it. To understand the theory behind naïve Bayes, check this link.

 

Here is the Java program

 

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.trees.J48;
import weka.core.Attribute;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.SparseInstance;
import java.util.*;


public class Demo {

	public static Classifier naiveBayesClassifierTrainedWithCV(Instances data, int numFolds) throws Exception
	{
		Classifier cls = new NaiveBayes();
		Evaluation eval = new Evaluation(data);
		
		if(data.classAttribute().isNominal())
			data.stratify(numFolds);
		
		for(int i=0;i<numFolds;++i)
		{
			Instances train = data.trainCV(numFolds,i);
			Instances test = data.testCV(numFolds, i);
			
			cls.buildClassifier(train);
			eval.evaluateModel(cls, test);
		}
		
		return cls;
	}
	
	public static Instance generateTestSample(String overcastValue, int temp, int hum, String wind,Instances data)	{
		
		//Create nominal attribute
		List<String> outlookValues = new ArrayList<String>();
		outlookValues.add("sunny");
		outlookValues.add("overcast");
		outlookValues.add("rainy");
		Attribute outlook = new Attribute("outlook", outlookValues);
		
		//Create numeric attribute
		Attribute temperature = new Attribute("temperature");
		Attribute humidity = new Attribute("humidity");
		
		List<String> windyValues = new ArrayList<String>();
		windyValues.add("TRUE");
		windyValues.add("FALSE");
		
		Attribute windy = new Attribute("windy",windyValues);
		
		
		Instance testSample = new SparseInstance(5);
		
		testSample.setDataset(data);
		testSample.setValue(outlook, overcastValue);
		testSample.setValue(temperature, temp);
		testSample.setValue(humidity, hum);
		testSample.setValue(windy, wind);
		
		return testSample;
	}
	
	public static double[] getDistributionInstanceForATestSample(Classifier cls, Instance data) throws Exception	{
		
		return cls.distributionForInstance(data);
	}
}


Download weka.jar from http://www.cs.waikato.ac.nz/ml/weka/downloading.html. Add this jar to the classpath and compile the above program.

 

In the above program, naiveBayesClassifierTrainedWithCV is used to train a naïve Bayes classifier with the dataset that is passed as an argument. The second argument numFolds is used to mentioned the number of folds for cross-validation. For e.g., 10-fold cross-validation requires numFolds = 10. generateTestSample is used to generate a test sample for which a class label will be predicted by the classifier. getDistributionInstanceForATestSample is used to get the probability distribution for a test sample predicted by the classifier.

 

Following is a wrapper for these methods in ColdFusion. I have created a CFC for the same

component output=false
{
	property name="demo";
	property name="bufferedReference";
	property name="fileReference";
	property name="fileObj";
	property name="bufferedObj";
	property name="instanceReference";
	property name="instanceObj";
	property name="model";
	
	//This function loads ARFF file
	function Init()
	{
		demo = CreateObject("java","Demo");
		
		//Create a buffered reference
		bufferedReference = CreateObject("java","java.io.BufferedReader");
		
		//Create a file reference
		fileReference = CreateObject("java","java.io.FileReader");
		
		//Create a file object
		fileObj = fileReference.init("C:/ColdFusion11/cfusion/wwwroot/weka/weather.numeric.arff");
		
		//Create a buffered reader object
		bufferedObj = bufferedReference.init(fileObj);
			
		instanceReference = CreateObject("java","weka.core.Instances");
		instanceObj = instanceReference.init(bufferedObj);
		
		instanceObj.setClassIndex(instanceObj.numAttributes() - 1);
	
	}
	
	function naiveBayesClassifierTrainedWithCV()	{
		
		model = demo.naiveBayesClassifierTrainedWithCV(instanceObj,10);
		return model;
	}
	
	function  getDistributionInstanceForATestSample(required string outlook, required numeric temp, required numeric humidity, required string windy)	{
		
		sample = demo.generateTestSample(JavaCast("string",outlook),JavaCast("int",temp),JavaCast("int",humidity),JavaCast("string",windy),instanceObj);
		distribution = demo.getDistributionInstanceForATestSample(model,sample);
		return distribution;
	}
}

Above file is named as MachineLearning.cfc. Init() method is used to load the ARFF file. Rest of the methods are just wrappers around the corresponding Java methods. This CFC is used as follows

<cfset mlObj = CreateObject("MachineLearning")>
<cfset mlObj.Init()>
<cfset NB = mlObj.naiveBayesClassifierTrainedWithCV()>
<cfset distributionForTestSample = mlObj.getDistributionInstanceForATestSample("sunny",80,95,"TRUE")>
<cfsavecontent variable="html">
	<cfoutput>
		<html>
			<head>
				<title>WEKA Demo</title>
			</head>
			<body>
				<h3>Naive Bayes</h3>
				<p><pre>#NB#</pre></p>
				<hr />
				<h3>Classifying a sample</h3>
				<h4>t = (Outlook = "sunny", Temperature = 76, Humidity = 95, Windy = "TRUE")</h4>
				<p>
					Yes = #distributionForTestSample[1]#, No = #distributionForTestSample[2]#
				</p>
			</body>
		</html>
	</cfoutput>
</cfsavecontent>
<cfoutput>#html#</cfoutput>

MachineLearning object is created and naïve Bayes classifier is called on that object. Once the model is trained, a test sample is created and we want to know what is the label that is predicted by the classifier. Instead of getting the label, probability distribution of the test sample over the class labels is calculated i.e. probability of test sample being yes and probability of the sample being no. This is achieved by calling getDistributionInstanceForATestSample method. Below is the output that we get

As you can see, for test sample("sunny",76,95,"TRUE") the classifier predicts with 60% probability that play=yes and with 40% that play=no.

 

So this was a simple example on how to use Weka in ColdFusion. Perhaps, this idea can extended and a ML library for ColdFusion can be created.

2 Comments to “Machine Learning in ColdFusion”

  1. Tom Chiverton
    I don't understand why you created the Demo Java class.
    You could just interact with Weka directly from ColdFusion and skip the 'compile this class' step completely, which would be neater as you end up writing a CFC wrapper around Java either way.
  2. Avinash Bukkittu
    Yes, you are right. Demo.java is not required. I have included it because otherwise MachineLearning.cfc would look lengthy if more and more classifiers are added. Thus to abstract all these from ColdFusion, I have written a java file and used it in a CFC.

    Thanks,
    Avinash

Leave a Comment

Leave this field empty: