The machine learning commands “Train”, “Predict” and “Cluster” explained based on a practical example
Part 1 – “Train” and “Predict” in “ACL™ Robotics”
In this blog post we show you two examples of methods by which the analysis software “ACL™ Robotics” - previously known as “ACL™ Analytics” - of the software manufacturer Galvanize makes it possible to implement machine learning. For expert users: Both supervised and unsupervised learning approaches are supported. “ACL™ Robotics” is a software solution which has already been assisting the manual and automated analysis of large amounts of data for many years. Besides having a variety of interfaces to e.g. SAP (via “SAP Connector”), Salesforce, Google Hive, Amazon Redshift, Outlook, PDF imports or any ODBC data sources desired, an automated script language helps to automate the sequence of analytic steps. The software developer Galvanize allocates this to the field of RPA (Robotic Process Automation). Individual analytic steps are implemented by methods or commands, such as sorting, summarizing, joining and relating to name only a few. With Version 14 these analytic commands have been extended by three machine learning commands named as “Train”, “Predict” and “Cluster”. In this blog post we are familiarising you with the use of these three commands, taking specific examples from the world of business, such as forecasting return values and the clustering of customers combined with due dates for payment. For existing ACL users, we also offer the opportunity to download ACL projects with the examples, and thus be able to try out each method, step by step, for yourself.
This blog post is, due to its length, divided into two parts:
- Part 1 deals with the “Train” and “Predict” commands
- Part 2 deals with the “Cluster” command, which is based on the k-means algorithm
The example for Part 1 is taken from the Sales process: Customers order goods of varying values and may return a part. Let us take, for example, the following question:
What amount does the return value come to in the case of an order value of € 2,000, € 4,200.50 or € 65,000?
We will answer this question with the use of the “Train” and “Predict” commands in ACL (see the menu item “Machine Learning”). Figure 1 shows 967 orders and their return values. A notional dataset serves as the basis for the calculations. Each point in the graph represents an order. The respective order value is plotted on the x-axis, the associated return value on the y-axis. The angle bisector in blue specifies the maximum return value, as this is always less than or equal to the order value. Orders plotted on the angle bisector have a return value that is equal to the order value.
Next, change the parameters and settings in the input mask, and then start the search for the Winning Model by pressing “OK” (cf. Figure 3).
The dataset with which you look for the Winning Model and the dataset for which you then estimate the values of the target field need to have exactly the same feature variables. The “estimated_returns” table in your side bar now contains the forecasts for the order values € 2,000, € 4,200.50 and € 65,000 based on the Winning Model and the training data. The rounded forecasts are: € 796, € 1,672 and € 25,873. Figure 5 shows the training data, as well as the three order values under review, € 2,000, € 4,200.50 and € 65,000; and their associated forecasts, in red.
The procedure described above is assigned to Supervised Learning. The term “Supervised” relates, in this case, to values for the numerical target field being estimated with the aid of the key variables. The sequence presented also works with multiple key variables. The number of variables included in the model is thereby increased. No “target field” exists with Unsupervised Learning. In the second part of this blog post, an example with an Unsupervised Learning approach is presented.
In the first part of this blog post, we showed you how you can calculate forecasts based on a dataset using a combination of the “Train” and “Predict” commands. If you would like to use your own data, the necessary steps you need to perform can be summarised as follows:
- Create a training dataset. This always includes the key variables and the target field.
- Create a dataset for which forecasts are supposed to be calculated. This contains the same key variables as the training dataset.
- Calculate the Winning Model with a suitable choice of parameters.
- Calculate forecasts using the Winning Model for the dataset from Step 2.
We hope that you enjoyed this interactive blog post. Have fun trying out the commands explained above. If you have any questions, please do not hesitate to contact us at any time.
In the second part, which will be published soon, we will be looking at the “Cluster” command with an example from customer master data and payment due dates.