26.05.2015
Anton Grening
Author: Anton Grening

ACL An­a­lyt­ics™ – Four prac­tical text field func­tions

Following on from our blog series on topics like terms of payment in SAP® or process mining this time I want to launch a new series of blog posts all about ACL™ Analytics.

 

To start we look at practical ACL™ functions on text fields, showing you how simple it is to work with and even combine these fields. Basically you will find all of these functions and commands explained in the very good and very detailed official ACL™ online help, which I used as base for this article. In addition to the technical explanation there I want to give you some background and advice based on my daily work, which hopefully will provide some added value for you. I will use the example of “Identifying master data duplicates” for explaining the following functions. If you examine customer or vendor master data for double entries for this purpose, it is advisable to search across different columns. The field with the phone number can also be a decisive indicator. But this very seldom uses a uniform convention, so first you have to prepare and make it comparable.

 

IDNameStreetZipCityPhone number
1005Beispiel AG Hauptstraße 8 94469 Deggendorf +49 (0) 991 1234 56
4236 ACME Corp. 963 Sample Drive 90028Los Angeles +1 818 123 4567
5459sample:company hauptstraße 8 10557berlin(+49) 0 30 123-456
7894 Example Inc. 12 Baker Street WC2E 9RZ London+44 20 1234 567-8
8001John Doe 123 Main St 10081 New York +1 718 123456789
8864 t.e.s.t GmbH Deggendorfer Straße 9 10557Berlin030 98765432
9354SAMPLE COMPANY HAUPTSTRAßE 8 10557BERLIN+49 (0)30123456

 

The duplicate in the above example cannot be discovered that easily by the phone number. First you have to clean the field of spaces and special characters before adjustment produces a result. For this cleanup we are supported by a number of functions from the ACL™ repository:

1. EX­CLUDE() func­tion

This function allows you to exclude predefined characters from a string of characters. Meaning you can edit a text field so that certain characters are removed and no longer contained in the result. To be able to use EXCLUDE you consequently need two parameters:

EXCLUDE(character string; exclude character)

  1. Character string:             text from which certain characters are to be excluded.
  2. Exclude character(s):     character(s) to be excluded and must not appear in the result.

If you want to exclude multiple characters, they can simply be entered together. If none of the exclude characters is contained in the character string, the latter will just not be altered. Note that the EXCLUDE() function distinguishes between upper-case and lower-case letters, so both have to be entered to exclude them. The following examples illustrate use of this function:

 

FunctionResult
EXCLUDE( “DEG-ZD-123”; “-“) “DEGZD123”
EXCLUDE(“abc:456”; ”ABCDEFG”) “abc:456”
EXCLUDE(“(+49) 123 456 789 - 0”; ” +()-“) “491234567890”

 

The last example shows a possible solution to the duplicate problem. The unwanted spaces and special characters were removed, making the number easier to compare. But there are so many special characters that can appear in a phone number. So all imaginable characters would have to be excluded to create a generally valid solution. It would be easier, more safe and sure, if there were a way of saying you only want all the numerals. And that is what the next ACL™ function in our list does.

2. IN­CLUDE() func­tion

If, as in the case of our phone number duplicates, you need to ensure that all characters are removed except for the numerals, the list of exclude characters can become extremely long. And there is always the risk of overlooking and forgetting characters. The INCLUDE() function can help here because of entering characters for EXclusion you can enter those characters you want INcluded and thus wish to see in the result. You summon it as follows:

INCLUDE(character string; include character)

  1. Character string:     text that is to be restricted to certain characters.
  2. Include character:   characters you want to restrict to and that may appear in the result.

Here too there is a distinction between upper case and lower case, and you can enter one or more include characters. If the character string contains none of the characters to be included, the result remains blank. If you want to ensure that only numerals or only letters appear in the result, you simply enter all these characters. All other characters not entered are left out of the result. The following examples illustrate this:

 

FunctionResult
INCLUDE( “DEG-ZD-123”; “D“) “DD”
INCLUDE(“abc:456”; ”ABCDEFG”) “”
INCLUDE(“(+49) 123 456 789 - 0”; ”0123456789“) “491234567890”

 

The last example with the phone number produces the same result as the EXCLUDE function before. But it is still more advisable to use the INCLUDE() function for a generally valid solution.

Fig. 1: Define new field c_TelNr_clean while using the INCLUDE function

Perhaps you want to ensure that only alphabetic characters are considered, and all numerals and special characters are left out of the result. For example when comparing by the company name in the example above (sample:company and SAMPLE COMPANY). All letters, upper-case and lower-case, can simply be put into the list of include characters, meaning that the result field, although it only contains letters, would not be recognized as duplicates because of the different upper-case/lower-case notation. ACL™, in commands like Classify or Summarize that are used to search for duplicates, ensures exactly the same notation, and so it would not treat "samplecompany" and "SAMPLECOMPANY" as the same. But we want to see these different entries as par, so we can make use of the following ACL™ functions

3. UP­PER/LOW­ER/PROP­ER() func­tion

When comparing names, you should keep the UPPER, LOWER and PROPER functions in mind. Because then alphabetic characters can be altered so that upper case and lower case are converted depending on the function.

 

 

UPPER(character string) – all letters of the character string become upper-case.

 

FunctionResult
UPPER(“Sample:Company”) “SAMPLE:COMPANY”
UPPER(“Deggendorfer Straße 9”) “DEGGENDORFER STRAßE 9”

LOWER(character string) – all letters of the character string become lower-case.

 

FunctionResult
LOWER(“Sample:Company”) “sample:company”
LOWER (“Deggendorfer Straße 9”) “deggendorfer straße 9”

PROPER(character string) – first letter and that following a space or special character is written upper-case, all others lower-case.

 

FunctionResult
PROPER(“sample:company”) “Sample:Company”
PROPER (“DEGGENDORFER STRAße 9”) “Deggendorfer Straße 9”

When working with manual text entries these should be prepared with the UPPER, LOWER or PROPER function. These can then be compared uniquely and without knowing the precise notation. In ACL™ it is also possible to combine for example UPPER and INCLUDE function to create a solution for tracing duplicates through the company name later on.

Fig. 2: Define new field c_Name_clean while using INCLUDE and UPPER function

4. COM­BINE text fields

Another lead to detecting duplicates may be the address. But this is often split into several fields and therefore first has to be combined. To concatenate text fields in ACL™ they only have to be joined by a plus sign. So in our example the street and city can be combined quite simply to a text field c_StreetCity.

Fig. 3: Define new field c_StreetCity by combining the fields Street and City

However if you are experienced ACL™ user, you might remember the fact that ACL™ is using fixed field lengths. This means, combining two text fields might look a bit unusual. If for example our fields “Street” and “City” each have a length of 20 and the content is “Merrit”, and City would be “Connecticut”, the combined field will look like “Merrit Connecticut”. ACL™ populates each field with blanks (spaces) up to its total length, so the new field will have a length of 20+20=40. We can avoid the “strange look” by using the ALLTRIM function which removes leading and trailing blanks. The new expression looks as follows: DEFINE FIELD c_StreetCity COMPUTED ALLTRIM(Street) + City. The adjusted result will now be: “MerrittConnecticut”. Still not perfect, because the two words are not separated anymore because of the ALLTRIM funtion. This can be fixed by adding a blank (space) in between, simply by adding it again with a plus: DEFINE FIELD c_StreetCity COMPUTED Alltrim(Street) + “ ”+City. Now the result looks as expected: “Merritt Connecticut”.

 

 

ExpressionResult
DEFINE FIELD c_StraßeOrt COMPUTED Straße + Ort Merrit         Connecticut
DEFINE FIELD c_StraßeOrt COMPUTED ALLTRIM(Straße) + Ort MerritConnecticut
DEFINE FIELD c_StraßeOrt COMPUTED ALLTRIM(Straße) + “ ” + Ort Merrit Connecticut

 

The field can then be cleaned using the familiar functions and adjusted for duplicates. With the right touch and the right approach these ACL™ functions can be a great help when searching for duplicates and in many other kinds of analysis.

I hope you found my article about text field functions helpful and interesting.

For any comments, feel free to write us at info@dab-gmbh.de.

 


Comments (0)
Be the first who comments this blog entry.
Blog login

You are not logged in. Please log in to comment this blog entry.

go to Login