Comparison between Machine Learning Evaluation Metrics

Posted on Posted in Data & Business Intelligence

You just finished executing a machine learning algorithm, after the hard work of choosing the right data and preparing it, and now the algorithm has finished and you‘re looking at the results but wonder, are they good? What if you want to compare the results to another algorithm, but they don’t speak the same language – one has RMSE and the other one has Recall/Precision/Accuracy/AUC etc.?

Classification vs. Regression

In general, machine learning supervised algorithms are divided into 2 families: classification and regression.

  • Classification – to predict (classify) categorical class, such as “Iris-setosa” or ”Iris-versicolor”.
  • Regression – to predict real values, such as “price”, ”weight”, ”height” etc.


Classification evaluation metric

The typical classification evaluation metric is the confusion matrix. A Confusion Matrix is a table with two rows and two columns that reports the number of false positivesfalse negativestrue positives, and true negatives







Using the confusion matrix, you can calculate the following evaluation metrics:

  • Recall = True Positive Rate = (True Positive) / (True Positive + False Negative).
  • False Positive Rate = (False Positive / (False Positive + True Negative). This is the 1st order mistake.
  • False Negative Rate = 1-(Recall). This is the 2nd order mistake.
  • Precision = (True Positive) / (Actual Value: True).
  • Accuracy = (True Positive + True Negative) / (True Positive + True Negative + False Positive + False Negative).
  • F-Score (F-measure) = 2 * Precision * Recall / (Precision + Recall).

Example – a confusion matrix and the calculation of the measurements:






  • Recall =  (55) / (55+28) = 0.662
  • False Positive Rate = (45) / (45+72) = 0.384
  • False Negative Rate = 1 –  0.662 = 0.338
  • Precision = (55) / (55+45) = 0.55
  • Accuracy = (55+ 72) / (55+72+28+45) = 0.635
  • F-Score = 2*0.55*0.662 / (0.55+0.662) = 0.6

ROC/AUC = AUC stands for Area Under the Curve, which refers to the ROC curve. The ROC (Receiver Operating Characteristic) curve is a graphical plot that illustrates the performance of a classifier. The curve is created by plotting the true positive rate against the false positive rate at various threshold settings.

Regressions evaluation metric

When using regression, the popular evaluation metric is the RMSE:

RMSE – Root Mean Square Error – this metric measures the difference between the actual value and the predicted value.












Sum of Error Squared is  24

Divide by the number of cases (10) = 24/10 = 2.4

Root of 2.4 is 1.549 and that’s the RMSE.


The Iris dataset

The Iris dataset is a dataset with 150 instances of plants and the predicted attribute which is class of the Iris plant.

The attributes are the length and width of the sepal and the length and width of the petal.

Let’s execute a simple decision tree (J48) on that dataset using Weka:


































If we will execute linear regression (after we will add another column which will represent the class of the plant in numbers since the predicted attribute can only be numeric), we will get RMSE = 0.2273.


Don’t be confused by the RMSE that’s in the J48 summary output – it’s a little bit misleading, since you can’t really compare the RMSE from J48 execution to that one.

Here’s why: in J48 (or any other classification algorithm) we want to predict categorical class – so Weka is calculating the RMSE by giving each categorical class a number (starts with 1).

But what would happen if in the Iris dataset we will add a class_number numeric and instead of 1,2,3 the values would be 1000,2000,3000?

In that case, the RMSE would be:







How can we compare the results between decision tree and the linear regression?

The question is much wider, of course. How can we compare between classification and regression algorithms?  Each has its pros and cons; why shouldn’t we enjoy both?

Comparing between RMSE and Confusion matrix

It’s easy, we just need to transform the RMSE into a confusion matrix, and then we can calculate all the other evaluation metrics (Recall, Precision, True Positive, etc..).

Wait, what?

Transform the RMSE into a confusion matrix.

In order to convert the RMSE into a confusion matrix, we need to look at each classification error. You can find it under “Save classification error” (if using Weka, you can right click on the result list and choose “Visualize classification error” and save the results to a csv).

The first rows in the file are metadata, describing the attributes.












In our case, the last column is the actual class and the column before that is the predicted class. Now we can calculate the confusion matrix.

If you want to double check, you can calculate the RMSE and verify that you got 0.2273 as is the results set.





Calculating the confusion matrix

The last 2 columns are the predicted class and the class number (which is the actual class from the training set). You need to add another column – to classify the values in the predicted class to a new class (1,2,3 – these are the options for Iris), once you have it you can compare it to the class number, which is the actual class – and you can calculate the confusion matrix.

Once you have the confusion matrix you can calculate all the evaluation metric and find out which algorithm works for you.

Leave a Reply

Your email address will not be published. Required fields are marked *