What are Outliers exactly?
Outlier Detection with Artificial Intelligence and R
You've probably heard of the term (Data) Outliers. In the following, I will use an example to explain what is meant by it and where it is worth looking for them.
How would you describe the actor Danny DeVito?
Probably most people characterize the American actor with his striking height of 1.47 meters. For comparison, the average American is 1.77 meters tall. Danny DeVito, at least in terms of his height, is noticeable. He is statistically an Outlier. The term Anomaly is used interchangeably.
An Outlier is always determined on the basis of one or more attributes. In our example, this has been exclusively body height. This one-dimensional consideration is usually not sufficient in practice. For this reason, we now also consider weight and gender. The following table contains a total of 20 known persons and three values for each.
When looking at the table, did you notice Brigitte Nielsen as another Outlier?
When reading a table, one tends to look at each column individually. If you compare the values of the Danish actress column-wise, she is unnoticeable: Neither is a person with 1.85 meters, 82 kg or with a female gender unusual in the table. It's the combination of gender and size that makes Brigitte Nielsen noticeable. This is referred to as a combinatorial Outlier. Danny DeVito is a global Outlier. In his case, one value is that extreme that he is therefore already classified as an Anomaly. The remaining subjects in the table are so-called inliers, i.e. lines that tend to define the broad middle and are more inconspicuous.
Even with only three columns, it is quite difficult to quickly identify noticeable rows manually. For this reason, we automated this task with our AI-Assistant DEAN. The name is an acronym and is composed of the first letters of "Detecting Anomalies". During development, it was very important to us that the user does not have to provide any hints, rules or domain-specific info. DEAN only requires a table as input and then delivers the identified Outliers as output. The number of columns in the input table is not limited, any number of columns can be passed. The implementation of the AI-Assistant was done in R and the application is called as usual within our existing software products.
In the SAP and business environment, we can apply our AI-Assistant to various pre-defined tables. These contain information from incoming invoices, outgoing payments, purchase contracts, credit notes or material movements.
Would you like to identify Outliers in your data with AI using no rules? Then feel free to get in touch with us.