17.03.2014

CCM (Part 4) - 5 things you should know - and 2 things you bet­ter not for­get - when build­ing a CCM da­ta an­a­ly­tic en­viron­ment

This post is the fourth of a multi-part article which is titled “5 things you should know - and 2 things you better not forget - when building a data analytic environment”. It is about data analytics in general, but with a focus on building bigger solutions like a CCM (Continuous Controls Monitoring Environment). The full article contains of several parts. The first five parts describe attention points that you should be aware of:

  1. Know what you want to know (aka list analytic questions)
  2. Know your systems (aka identify data sources)
  3. Know your data (aka examine data structures)
  4. Know your Analytic Tools (aka understand technical possibilities and limits)
  5. Know your customers (aka think about who receives the results and what they will do with it)

Part six and seven are about two things that you probably already know, but we emphasize their importance in the context of data analytics:

  1. There is a price tag to it (aka budget)
  2. You need someone who can do it (aka resources)

Part 1-3 have already been covered (they are linked above, so you can go back anytime there). This is now part 4, “Know your Analytic Tools”:

Five things you should know, continued (4/5)

We continue with the fourth aspect out of the five things that I see as “essential”. The first step, “Know what you want to know”, was about a getting proper business understanding and stating as-detailed-as-possible questions that can actually be answered. Only by knowing your business well you can make a good list of analytic questions. The area we were interested in was “Compliance”, the topic “Payments to critical countries” and the detail questions that were raised we stated as follows:

 

  • “Identify payment transactions that have been done in context of partners like vendors, or customers, or even where an expense account is directly debited.
  • Create a list of these transactions and identify the payee’s bank country for those payments.
  • Match these detail transactions versus a list of countries that are seen as critical because they are on the embargo list, known as tax haven or having a Corruption Perception Index > x.”
  • In step two we talked about the importance to “Know your systems”. We found out that our payment data is on an SAP® system “SY2”, running SAP® ERP ECC 6.0, and containing the financial transactions for all companies worldwide.

Picture 1 – Identifying the correct system

Based on that we found out about the data ownership and then followed the process of requesting a user with appropriate read-only rights to access the data.

 

What data actually is necessary, we discovered in chapter “Know your data”. We went from identifying the tables to finally identifying fields and understanding the logic and relations between the tables. Part of that was an explanation of different techniques how to find the data, and the top down approach. We ended up with a list of tables and fields, which for our example could look like this:

 

 

Element / Process steps
Table
Field(s)
Comment
Vendor master dataLFA1LIFNRVendor number
Vendor master dataLFA1 LAND1Vendor country of residence
Vendor bank accountsLFBKLIFNR Vendor number
Vendor bank accountsLFBKBANKS Vendor bank country
Customer master dataKNA1KUNNR Customer number
Customer master dataKNA1LAND1Customer country of residence
Customer bank accounts KNBKBANKSCustomer bank country
Customer bank accounts KNBKKUNNRCustomer number
Payment run transactionsREGUHLIFNR Vendor number
Payment run transactionsREGUHKUNNRCustomer number
Payment run transactionsREGUHLAND1 Vendor country of residence
Payment run transactionsREGUHZLANDPayees country of residence
Payment run transactionsREGUHZBNKSPayees bank country
... ... ... ...

Picture 2 - Examine data structure: Table and Field list

 

We know the question, we know where the data for the answer is stored, we are also aware what data is necessary. But how do we get it and what is the best way to analyze it? For this you need to “Know your Analytic Tools”.

Aspect 4: Know your Analytic tools

Now that we examined the data structures and defined the data, we are able to start analyzing it. For this we need to understand what the technical possibilities of the analytic tools are. On the other hand, we also need to know about the limitations. So this chapter first of all is about the tools you will need for your CCM project.

 

Of course this will depend on the setup and technical environment that you want to set up (or have to deal with). There is no doubt that you will need a data analytic tool. It will be the piece of software where the number crunching is done and where the analytic routines are implemented. What you actually need beforehand is a data extraction tool which accesses the data, extracts it and puts it onto a laptop or a server. The following picture shows the setup as described. The source for the picture is the ISACA article “Automated Audit Testing for SAP Data – Benefit or Just Another Black Box?”. (Kim-Reinartz, A.; Wenig, S.: „Automated Audit Testing for SAP Data – Benefit or Just Another Black Box?“, in ISACA Journal, Volume 3/2011, S. 25-30; Download Article).

Picture 3 – Basic architecture

Ob man ein Datenanalysetool benötigt, hängt grundsätzlich von der Frage ab “Innerhalb oder Außerhalb”?

 

Innerhalb bedeutet, dass Sie versuchen können, die Daten direkt im oder mittels des ERP Systems, etwa des im Beispiel beschriebenen SAP® Systems, zu analysieren. „Außerhalb“ würde heißen, dass die Daten wie in der Einleitung beschrieben extrahiert und auf einem separaten Computer analysiert werden. Für unser Beispiel werden wir uns mit letzterem beschäftigen, und die zu analysierenden Daten erst einmal extrahieren.

 

Dies bietet in der Tat einige wichtige Vorteile:

  1. Performance
  2. Independency
  3. Flexibility

First, you should keep in mind performance of data analytics and also performance impacts on the system. If you extract the data out of SAP® on a separate machine, then you cannot harm the SAP® system performance during the analytic runs. Imagine a heavy-duty algorithm which combines 50 tables with millions and sometimes even billions of records directly on the SAP® server or within its database – most administrators will tell you, that this might be a bad idea. Once the data is on your own computer or server however, you can basically do what you want – the productive system will not be harmed. In the ISACA article, we described this as “Download once, analyze often”.

Secondly, keep in mind the independency, and reliability. If you are able to access the data on your own, you can be sure that it is really “original” data, and not changed by accident or intentionally. Think about fraud cases, and you need to monitor or analyze the data: You want to make sure that as less persons as possible were able to touch it before you got it. Best case, you have a download tool and access it yourself with a reliable tool.

As third benefit, related to the independency, this makes you more flexible. Having the data extracted from the system it could, in addition to the CCM approach, be used for Ad-Hoc analytics, as you can take it with you (for Audits etc.) on an encrypted hard drive for example.

So for our project of analyzing “Payments to critical countries” in CCM environment, we will use data extraction software. If you need to select one, the following checkpoints should be considered when deciding for software in that area:

  • Read-only access
  • Mass data capability
  • Performance
  • Delta-Downloads
  • Automization
  • Usability
  • Correct target formats
  • Data Privacy

Of course we could seem to be a little biased here, as we offer a data extraction tool for getting the data out of SAP®, but for this article we skipped the marketing terms and kept the message neutral.

Read-only access means, that the data extraction tool should not be able to change the data on the source system, so everyone will be more relaxed if the data itself stays untouched. For the data extraction itself, we need to make sure, that it is capable of downloading mass data. Think about our payment transactions; there may be millions of records over the years.

In our projects we encountered tables with billions of records; the data extraction tool has to be able to handle this. But it also needs to make sure, that the source systems performance, in our example of our SAP® System SY2, will not be impacted too much, even when big data id extracted. Best case you can use the data extraction tool 24/7 without anyone noticing it.

Of course, especially for CCM approaches, it would be important if the tool can handle delta-downloads. This means, instead of downloading the full automated payment transactions table REGUH every month you only extract the payment transactions that got created between the last analytic run and the current one. This kind of goes together with the aspect of automization. There should not be a user necessary who triggers it every time a download is needed.

Especially for Continuous Controls Monitoring there should a feature which enables you to schedule downloads. Yet if a user has to do things manually, the usability needs to be good. It should be allow you to do technical things (data extraction) but being performed or set up by people who also have a solid business background, not for IT only. Of course it would be good if the table- and field catalog we defined before could be used without much hassle as input for the data extraction tool.

Last but not least, the data formats are very important. This needs to be tied together with the data analytic tool later on: The tool for downloading the data gets it out of the SAP® system and converts it to – well, ideally to the format of the data analytic software, so that it can be used without format discontinuity. Also keep in mind that, like stated in another blog post, data privacy aspects can be of importance here, for example encrypting sensible information during download.

Now that the data is extracted based on our table and field list we generated in chapter 2, we have it in our data analytic software. But what do you have to know about that software? Some points are similar to the data extraction:

  • Mass data capability
  • Performance
  • Client/Server capability
  • Automization
  • Usability
  • Transparency
  • Read-Only
  • Content
  • Output formats

Obviously, mass data capability is one feature you will need. If you do get millions and billions of records extracted with your data extraction tool, the data analytic tool has to be able to handle that big data as well. Most of the cases you will need data analytic tools for it, as most of the “standard software” has built in limitations when it comes to data volumes.

Usually, Microsoft Excel™ or even Access™ is not enough. If you look at Excel™ 2010 for example, the limitation of 1.048.576 records most of the times is not nearly enough and even Access™ 2010 with the limit of 2 Gigabyte per database usually is not capable of doing serious CCM projects.

Putting in best case unlimited amounts of data is one thing – having these analyzed in a reasonable manner when it comes to performance, is another. The data analytic software should be able to process data fast. This usually depends on the data format the tool is using, and the architecture of the solution. Of course you could try to speed up things by setting up very powerful hardware, on the other hand sometimes it just does not make sense to try to kill your performance problems with iron (by just putting more and more hardware in). Even if your tool seems absolutely ok performance wise, think about scalability. One analytic question may require only small pieces of data; others in the future might be real beasts to deal with.

As we are still talking about setting up a CCM environment, a Continuous Controls Environment, the analytic tests need to be performed according to a certain schedule. This requires the need for automization. For example we want to analyze our “Payments to critical countries” on a monthly base. This requires the analyses to be stored as program or some kind of macro that can be executed automatically by an internal clock (“scheduler”). A lot of analytic software tools do offer built in macro or programming languages where these analytics can be built and implemented. If you are not familiar to programming, think of it as something similar to record an Excel macro.

The need for setting up a centralized CCM solution, which is perform ant, integrated to the source systems with a data extraction tool and distributing results to legal entities on global level usually can only be done with a proper client/server solution of a data analytic tool. Let alone data volume, but when you need to go that full distance (and sometimes the extra mile) in such projects, you need a decent vehicle for your journey. Though a lot of the standalone software solutions are quite powerful, you only can do the full range of aspects required by a CCM project with a software which is made for this, and which can be put on a server, with user management in place, multi-processor usage maybe, working in the background for you even when your laptop is disconnected.

The programming language as well as the tools GUI is required to have a reasonable usability. There may be a lot of pretty technical tools on the market. However these analytic tools often need to be run by people who have none or only partial IT background. So the tools need to provide access to solving technical tasks in a non-technical manner.

So far so good, but what does the bullet point “transparency” mean? Think of it as extremely important point that makes your life easier when you are the data analyst. Imagine performing not one analytic test like in our example, but 20. Imagine sending the results out to 40 legal entities worldwide, getting C-level attention if your results indicate risks or even fraud. The results may trigger a course of action in these 40 legal entities, involving a lot of people.

A lot of them will ask “Woooha, did you do it even correctly? How did you make sure that there wasn’t an error? How did you do it? Was xyz considered in this analytics?” The last thing you might want to answer now is “I have no idea, I just pushed a button.” In the ISACA article mentioned above we talk about avoiding the black box effect. Tools and algorithms need to be as transparent as possible; wherever you feel that they are not possible enough, make sure to have a proper documentation in place. Also a kind of log file capacity which is built in the analytic tool might help big time here.

It helps if the analytic tool allows read-only access, as obviously you want to avoid that when doing data analytics, you change the original data by accident and produce errors accidentally, for example by overwriting data or making copy & paste errors. Don’t forget that you need to deal with an immense volume of data, so any accidental change would not be easy to spot in that huge pile of digits.

There are some tools on the market which fit those criteria, just to mention ACL™ (Audit Command Language) from ACL Services Limited or IDEA from Audicon. For our projects, we mainly use ACL™, because the features we need for solving the tasks are quite well-balanced there. With the new client/server products like ACL™ Analytic Exchange (AX) you can do a lot, and also the desktop solution ACL™ Analytics (AN) has improved a lot regarding features and usability. When deciding what to use in your CCM project, have a look at the key features listed here, and compare for yourself what solution fits your needs best, maybe test drive the software.

Also look out for existing content. You may not have to re-invent the wheel all over again. Hundreds of companies might have analyzed “Payments to critical countries” in an SAP® environment before. Instead of putting 50 days of implementation work in developing that analytic, you might get something out of the box for the analytic tool you bought or will buy. This may accelerate your projects a lot. The analytic program is only a shell usually – quality content is key!

If we take our 5 things as checklist, currently we managed to the first four steps:

Picture 4 – The progress we made so far

What we did not look at yet, are the output formats. These will become very important in our CCM project. You probably do not want to only generate ten thousands of records, moreover you need to graph data, track results or visualize processes. As you may get different requirements from different types of “result consumers”, certain flexibility is important. You have a manager who needs the KPI information; on the other hand you have an accountant who should look up the details.

Which types of result consumers you might encounter, and what this means for our example scenario of “Payments to critical countries”– this will be looked at next week in our chapter xyz – cu then!

I hope you enjoyed this forth part of our CCM articles! For any comments or questions, feel free to write us at info@dab-gmbh.de.

To contact the author you can also use LinkedIn or XING (you may have to login first before you can access these links).

LinkedIn: http://de.linkedin.com/pub/stefan-wenig/54/1b8/b30

XING: https://www.xing.com/profile/Stefan_Wenig2?sc_o=mxb_p


Comments (0)
Be the first who comments this blog entry.
Blog login

You are not logged in. Please log in to comment this blog entry.

go to Login