Entries Tagged as “General”

Machine Learning in ColdFusion

August 25, 2015 / Avinash Bukkittu

  Adobe ColdFusion | General

Machine learning has been a buzz word in recent times. I often hear people talking about this and their profound interest to learn more and gain expertise in this field. As a consequence, many programming languages are now having machine learning libraries available in their toolbox. For example, Java has ML libraries like WEKA, C++ has Encog (Encog is available for Java as well), PyBrain for Python etc.  These libraries come in handy and obviates the need to write basic algorithms from scratch. One such tool is WEKA (http://www.cs.waikato.ac.nz/ml/weka/)-  A data mining software in Java. You can use it as a standalone application or use it's APIs in your own Java program.

 

In this article, I will show you how to use WEKA library in ColdFusion. I am going to write a very simple Java program and add wrappers to it in ColdFusion. These wrappers can then be called from ColdFusion pages. In short, I am making the WEKA library available to ColdFusion. 

 

I will be using weather dataset located at data/weather.numeric.arff  in downloaded weka package. Weka uses ARFF format for datasets. Below is the dataset

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no


The above dataset contains five attributes namely outlook , temperature, humidity, windy & play. outlook is a categorical variable(it can take only three values - sunny,overcast, rainy) temperature and humidity are numeric variables, windy is again a categorical variable(it can take two values - TRUE, FALSE) and play is a categorical variable(it can take two values - yes,no). play is the class variable in our problem which means we have to predict the value of this variable for given test datapoint which contains remaining four values. 

 

So the problem at hand is a classification problem. We are given a training dataset which can be used to train a classifier and then use this model to predict the label for a test datapoint. There are many classifiers which can be used to solve this problem. But I am going to use naïve Bayes to solve this problem. It's a very simple classifier based on probability and it assumes that features are independent of one another. This means that the value outlook takes is independent of the values other attributes take.  I will not explain the mathematical background of naïve Bayes but rather provide an implementation for it. To understand the theory behind naïve Bayes, check this link.

 

Here is the Java program

 

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.trees.J48;
import weka.core.Attribute;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.SparseInstance;
import java.util.*;


public class Demo {

	public static Classifier naiveBayesClassifierTrainedWithCV(Instances data, int numFolds) throws Exception
	{
		Classifier cls = new NaiveBayes();
		Evaluation eval = new Evaluation(data);
		
		if(data.classAttribute().isNominal())
			data.stratify(numFolds);
		
		for(int i=0;i outlookValues = new ArrayList();
		outlookValues.add("sunny");
		outlookValues.add("overcast");
		outlookValues.add("rainy");
		Attribute outlook = new Attribute("outlook", outlookValues); 
		
		//Create numeric attribute
		Attribute temperature = new Attribute("temperature");
		Attribute humidity = new Attribute("humidity");
		
		List windyValues = new ArrayList();
		windyValues.add("TRUE");
		windyValues.add("FALSE");
		
		Attribute windy = new Attribute("windy",windyValues); 
		
		
		Instance testSample = new SparseInstance(5);
		
		testSample.setDataset(data);
		testSample.setValue(outlook, outlookValue);
		testSample.setValue(temperature, temp);
		testSample.setValue(humidity, hum);
		testSample.setValue(windy, wind);
		
		return testSample;
	}
	
	public static double[] getDistributionInstanceForATestSample(Classifier cls, Instance data) throws Exception	{
		
		return cls.distributionForInstance(data);
	}
}


Download weka.jar from http://www.cs.waikato.ac.nz/ml/weka/downloading.html. Add this jar to the classpath and compile the above program.

 

In the above program, naiveBayesClassifierTrainedWithCV is used to train a naïve Bayes classifier with the dataset that is passed as an argument. The second argument numFolds is used to mentioned the number of folds for cross-validation. For e.g., 10-fold cross-validation requires numFolds = 10. generateTestSample is used to generate a test sample for which a class label will be predicted by the classifier. getDistributionInstanceForATestSample is used to get the probability distribution for a test sample predicted by the classifier.

 

Following is a wrapper for these methods in ColdFusion. I have created a CFC for the same

component output=false
{
	property name="demo";
	property name="bufferedReference";
	property name="fileReference";
	property name="fileObj";
	property name="bufferedObj";
	property name="instanceReference";
	property name="instanceObj";
	property name="model";
	
	//This function loads ARFF file
	function Init()
	{
		demo = CreateObject("java","Demo");
		
		//Create a buffered reference
		bufferedReference = CreateObject("java","java.io.BufferedReader");
		
		//Create a file reference
		fileReference = CreateObject("java","java.io.FileReader");
		
		//Create a file object
		fileObj = fileReference.init("C:/ColdFusion11/cfusion/wwwroot/weka/weather.numeric.arff");
		
		//Create a buffered reader object
		bufferedObj = bufferedReference.init(fileObj);
			
		instanceReference = CreateObject("java","weka.core.Instances");
		instanceObj = instanceReference.init(bufferedObj);
		
		instanceObj.setClassIndex(instanceObj.numAttributes() - 1);
	
	}
	
	function naiveBayesClassifierTrainedWithCV()	{
		
		model = demo.naiveBayesClassifierTrainedWithCV(instanceObj,10);
		return model;
	}
	
	function  getDistributionInstanceForATestSample(required string outlook, required numeric temp, required numeric humidity, required string windy)	{
		
		sample = demo.generateTestSample(JavaCast("string",outlook),JavaCast("int",temp),JavaCast("int",humidity),JavaCast("string",windy),instanceObj);
		distribution = demo.getDistributionInstanceForATestSample(model,sample);
		return distribution;
	}
}

Above file is named as MachineLearning.cfc. Init() method is used to load the ARFF file. Rest of the methods are just wrappers around the corresponding Java methods. This CFC is used as follows

<cfset mlObj = CreateObject("MachineLearning")>
<cfset mlObj.Init()>
<cfset NB = mlObj.naiveBayesClassifierTrainedWithCV()>
<cfset distributionForTestSample = mlObj.getDistributionInstanceForATestSample("sunny",80,95,"TRUE")>
<cfsavecontent variable="html">
	<cfoutput>
		<html>
			<head>
				<title>WEKA Demo</title>
			</head>
			<body>
				<h3>Naive Bayes</h3>
				<p><pre>#NB#</pre></p>
				<hr />
				<h3>Classifying a sample</h3>
				<h4>t = (Outlook = "sunny", Temperature = 76, Humidity = 95, Windy = "TRUE")</h4>
				<p>
					Yes = #distributionForTestSample[1]#, No = #distributionForTestSample[2]#
				</p>
			</body>
		</html>
	</cfoutput>
</cfsavecontent>
<cfoutput>#html#</cfoutput>

MachineLearning object is created and naïve Bayes classifier is called on that object. Once the model is trained, a test sample is created and we want to know what is the label that is predicted by the classifier. Instead of getting the label, probability distribution of the test sample over the class labels is calculated i.e. probability of test sample being yes and probability of the sample being no. This is achieved by calling getDistributionInstanceForATestSample method. Below is the output that we get

As you can see, for test sample("sunny",76,95,"TRUE") the classifier predicts with 60% probability that play=yes and with 40% that play=no.

 

So this was a simple example on how to use Weka in ColdFusion. Perhaps, this idea can extended and a ML library for ColdFusion can be created.


getHeaders - a new attribute in the cfexchangemail tag

August 14, 2015 / Piyush Kumar Nayak

  Adobe ColdFusion | Adobe ColdFusion 11 | ColdFusion | ColdFusion 11 | exchange | General

With ColdFusion 11 Update 3, we have introduced a new parameter called "getHeaders", in the "cfExchangeMail" tag. It accepts a boolean value. When set to true, cfExchangeMail returns a query with an additional "InternetHeader" column. This column contains a struct containing key-value pairs of the email-headers associated with each message.

Email message headers provide technical details about the message, such as who sent it, the software used to compose it, the version of the MIME protocol used by the sender etc. 

On Exchange 2010, the fields that are returned are: CC, Content-Transfer-Encoding, Content-Type, Date, From, MIME-Version, Message-ID, Received, Return-Path, Subject, To, X-MS-Exchange-Organization-AuthAs, X-MS-Exchange-Organization-AuthSource, X-Mailer.

You may reference this weblink for the detailed list of the fields and their description.

You can put this new feature to any good use that suites your purpose. I will dwell on one such use case below.

In MS Exchange 2010 and later, the "ToId" column in the retrieved messages query contains the primary email address of the sender. A primary email address can have multiple aliases. If you need to retrieve the email-alias the message was sent to, you can make use of this new attribute.

Here's an example that demonstrates the usage the tag in the context of the use case discussed above:

<cfmail from="#frm_usr_email#" to="#to_usr_alias#" cc="#cc_usr_alias#" subject="#msg_sub#"  server= "#exchangeServerIP#" port = "25">

----------- testing mail to an alias address ------------

</cfmail>

<cfset sleep(5000)>

<cfexchangeConnection action="open" username ="#to_usr#" password="#password#" server="#exchangeServerIP#" serverversion="#version#" protocol="#protocol#" connection="excon">

<cfexchangemail action="get" name="usr_msgs" connection="excon" getheaders=true folder="Inbox">

<cfexchangefilter name="fromID" value='#frm_usr#'>

<cfexchangefilter name="subject" value="#msg_sub#">

</cfexchangemail>

<cfif usr_msgs.recordcount GTE 1>

info from cfexchangemail fields:<br>

<cfloop query="usr_msgs">

<cfoutput>

#usr_msgs.subject#<br> 

#usr_msgs.CC#<br> 

#usr_msgs.fromId#<br>

</cfoutput>

</cfloop>

info from cfexchangemail.internetHeaders fields:<br>

<cfloop query="usr_msgs">

<cfoutput>

#ReplaceList(usr_msgs.internetHeaders["from"][1], ">,<", ",", ",", ",")#<br>

#ReplaceList(usr_msgs.internetHeaders["to"][1], ">,<", ",", ",", ",")#<br>

#ReplaceList(usr_msgs.internetHeaders["cc"][1], ">,<", ",", ",", ",")#<br>

</cfoutput>

</cfloop>

</cfif>

 

You can reference the bugbase, for the enhancement request originally logged for this feature.



Taking Thread Dumps from ColdFusion Server Programmatically

August 11, 2015 / Krishna Reddy

  Performance | Adobe ColdFusion | Adobe ColdFusion 10 | Adobe ColdFusion 11 | ColdFusion | ColdFusion 11 | General | productivity | Scheduled Tasks

Many times you would want to tweak the performance of the ColdFusion server or want to debug the bottlenecks that make the server unresponsive.

To analyze this, ideally you would want to have Thead dumps and Server memory snapshot(Heap Space, Eden Space, Survivor Space, Old Gen, Perm Gen).

While you can use JDK tools like jstack to get the dumps, it is tedious to install it and schedule the thread dumps.

So, programmatically thread dumps and memory snapshots are are triggered and can be configured as a scheduler task and can be triggered on-demand as well.

Download the following zip file and move it to the server where you want to take thread dumps.

 

threaddump.zip

This zip file contains two files. One is threaddump.jar file.

Place this file under: C:\ColdFusion11\cfusion\wwwroot\WEB-INF\lib\ and restart the server for it to load.

And the other file is the cfm file takethreaddump.cfm where the call to ThreadDump class is made and the path where the dump content should be written.

By default it is dumped to the file #GetTempDirectory()#/threaddump<Day>-<Month>-<Year>.log

(Depending on the server location it translates similar to C:\ColdFusion11\cfusion\runtime\work\Catalina\localhost\tmp\threaddump12-8-2015.log)

You can change this to any other convenient path in the cfm file.

You can call this cfm on-demand at point of time or configure a new scheduled task to schedule it at some interval.

More number of Thread dumps are more helpful for problem analysis. So, it is better to take at some interval.

On every new day, dump is rotated automatically to a new file name.

 


When should tools.jar be updated in ColdFusion Server

July 27, 2015 / Krishna Reddy

  Administrator | Adobe ColdFusion | Adobe ColdFusion 10 | Adobe ColdFusion 11 | Adobe ColdFusion Builder | General

tools.jar contains the utilities to compile java source into class files.

While using ColdFusion Web services stubs have to be generated.

So, this utility is required for this feature to be functional.

The tools.jar is shipped by default with ColdFusion along with full installation of ColdFusion which would be same version of the jre that ColdFusion is shipped with.

If you are just using this default installation and the built-in jre, your ColdFusion web services do work fine.

However, due to security bugs or platform support you would want to update the jre version that ColdFusion runs on.

Once you do that, please make sure to copy the tools.jar file manually from {JDK_Home}/lib to {cf_install_home}/cfusion/lib/

Only JDK contains the tools.jar file not the jre installation. You don't have to install JDK on the machine where ColdFusion is installed. You can just have jre on this machine and get tools.jar from any other machine's JDK installation.

And also make sure that the earlier stubs are cleared from {cf_install_home}/cfusion/stubs/ to get the newly compiled classes.

It is necessary to update tools.jar ONLY if you are upgrading the jre to a a higher major version.

Say, if you are upgrading from 1.8 U5 to 1.8 U51 you don't have to update tools.jar file.

But, say   if you are upgrading from 1.7 U55 to 1.8 U51, you have to update tools.jar file.

 

Another case where you would want to update the tools.jar is JEE deployments.

Say, if you are using Websphere application server with IBM JDK, you have to place the corresponding JDK's tools.jar under  {cf_install_home}/cfusion/lib/.

The same applies to any other application server as well.

The simple rule is that tools.jar has to be from the same major version of Java (minor version doesn’t matter) that ColdFusion runs on.

 

This applies to any version of ColdFusion and the same applies to all platforms.

You can copy tools.jar from Windows machine to Solaris macine as well as long as the version is correct.

 

 


Some of the factors that help in deciding the memory that application needs

July 27, 2015 / Krishna Reddy

  Administrator | Adobe ColdFusion | Adobe ColdFusion 10 | Adobe ColdFusion 11 | Adobe ColdFusion Builder | ColdFusion | ColdFusion 11 | General

Would like to explain the some of the factors that decide how much maximum memory that an application needs.

Here are two examples. One is with cffile upload action and the other is with cfpdf thumbnail action.


Sometimes, while uploading large files using cffile with upload action (<CFFILE ACTION= "UPLOAD") you would have run into the following error

"500 - Internal server error.

There is a problem with the resource you are looking for, and it cannot be displayed."

 

File upload is dependent on “Maximum size of post data” value that you can set in ColdFusion Administrator.

However, it can’t exceed the total available free memory.

Say, if 512 MB is the Xmx memory value (jvm.config) and 350 MB is already occupied by the application, even if “Maximum size of post data” value is set to 200 MB, it can upload files of up to ~150 MB (512 MB- 350 MB).

It can’t upload files of size 200 MB.

To fix it, Xmx should be increased which is again subjected to the available memory in that machine.

 

Here is other example:-

Say, you are using CFPDF's Thumbnail Action and the thumbnails background is not generated properly.
Here the issue would be insufficient memory.

        Just to hold the decoded image data, java application needs large memory (say 1.2 GB depending on the image pixels).

It is because of the format and high pixel nature of the image that is being converted. Number objects created are equal to the total number of pixels in that image.

So, high pixel imge would consume more memory.

To fix this, Xmx value in jvm.config should be at least -> Applications memory Size + 1.2 GB (from the above example) i.e. a minimum of 1.5 GB and can be more depending on your application.

Changing the value and restarting the server would fix this.


Blue Mango Theme Design By Mark Aplet

Super Powered by Mango Blog