Ob man ein Datenanalysetool benötigt, hängt grundsätzlich von der Frage ab “Innerhalb oder Außerhalb”?
Innerhalb bedeutet, dass Sie versuchen können, die Daten direkt im oder mittels des ERP Systems, etwa des im Beispiel beschriebenen SAP® Systems, zu analysieren. „Außerhalb“ würde heißen, dass die Daten wie in der Einleitung beschrieben extrahiert und auf einem separaten Computer analysiert werden. Für unser Beispiel werden wir uns mit letzterem beschäftigen, und die zu analysierenden Daten erst einmal extrahieren.
Dies bietet in der Tat einige wichtige Vorteile:
First, you should keep in mind performance of data analytics and also performance impacts on the system. If you extract the data out of SAP® on a separate machine, then you cannot harm the SAP® system performance during the analytic runs. Imagine a heavy-duty algorithm which combines 50 tables with millions and sometimes even billions of records directly on the SAP® server or within its database – most administrators will tell you, that this might be a bad idea. Once the data is on your own computer or server however, you can basically do what you want – the productive system will not be harmed. In the ISACA article, we described this as “Download once, analyze often”.
Secondly, keep in mind the independency, and reliability. If you are able to access the data on your own, you can be sure that it is really “original” data, and not changed by accident or intentionally. Think about fraud cases, and you need to monitor or analyze the data: You want to make sure that as less persons as possible were able to touch it before you got it. Best case, you have a download tool and access it yourself with a reliable tool.
As third benefit, related to the independency, this makes you more flexible. Having the data extracted from the system it could, in addition to the CCM approach, be used for Ad-Hoc analytics, as you can take it with you (for Audits etc.) on an encrypted hard drive for example.
So for our project of analyzing “Payments to critical countries” in CCM environment, we will use data extraction software. If you need to select one, the following checkpoints should be considered when deciding for software in that area:
- Read-only access
- Mass data capability
- Correct target formats
- Data Privacy
Of course we could seem to be a little biased here, as we offer a data extraction tool for getting the data out of SAP®, but for this article we skipped the marketing terms and kept the message neutral.
Read-only access means, that the data extraction tool should not be able to change the data on the source system, so everyone will be more relaxed if the data itself stays untouched. For the data extraction itself, we need to make sure, that it is capable of downloading mass data. Think about our payment transactions; there may be millions of records over the years.
In our projects we encountered tables with billions of records; the data extraction tool has to be able to handle this. But it also needs to make sure, that the source systems performance, in our example of our SAP® System SY2, will not be impacted too much, even when big data id extracted. Best case you can use the data extraction tool 24/7 without anyone noticing it.
Of course, especially for CCM approaches, it would be important if the tool can handle delta-downloads. This means, instead of downloading the full automated payment transactions table REGUH every month you only extract the payment transactions that got created between the last analytic run and the current one. This kind of goes together with the aspect of automization. There should not be a user necessary who triggers it every time a download is needed.
Especially for Continuous Controls Monitoring there should a feature which enables you to schedule downloads. Yet if a user has to do things manually, the usability needs to be good. It should be allow you to do technical things (data extraction) but being performed or set up by people who also have a solid business background, not for IT only. Of course it would be good if the table- and field catalog we defined before could be used without much hassle as input for the data extraction tool.
Last but not least, the data formats are very important. This needs to be tied together with the data analytic tool later on: The tool for downloading the data gets it out of the SAP® system and converts it to – well, ideally to the format of the data analytic software, so that it can be used without format discontinuity. Also keep in mind that, like stated in another blog post, data privacy aspects can be of importance here, for example encrypting sensible information during download.
Now that the data is extracted based on our table and field list we generated in chapter 2, we have it in our data analytic software. But what do you have to know about that software? Some points are similar to the data extraction:
- Mass data capability
- Client/Server capability
- Output formats
Obviously, mass data capability is one feature you will need. If you do get millions and billions of records extracted with your data extraction tool, the data analytic tool has to be able to handle that big data as well. Most of the cases you will need data analytic tools for it, as most of the “standard software” has built in limitations when it comes to data volumes.
Usually, Microsoft Excel™ or even Access™ is not enough. If you look at Excel™ 2010 for example, the limitation of 1.048.576 records most of the times is not nearly enough and even Access™ 2010 with the limit of 2 Gigabyte per database usually is not capable of doing serious CCM projects.
Putting in best case unlimited amounts of data is one thing – having these analyzed in a reasonable manner when it comes to performance, is another. The data analytic software should be able to process data fast. This usually depends on the data format the tool is using, and the architecture of the solution. Of course you could try to speed up things by setting up very powerful hardware, on the other hand sometimes it just does not make sense to try to kill your performance problems with iron (by just putting more and more hardware in). Even if your tool seems absolutely ok performance wise, think about scalability. One analytic question may require only small pieces of data; others in the future might be real beasts to deal with.
As we are still talking about setting up a CCM environment, a Continuous Controls Environment, the analytic tests need to be performed according to a certain schedule. This requires the need for automization. For example we want to analyze our “Payments to critical countries” on a monthly base. This requires the analyses to be stored as program or some kind of macro that can be executed automatically by an internal clock (“scheduler”). A lot of analytic software tools do offer built in macro or programming languages where these analytics can be built and implemented. If you are not familiar to programming, think of it as something similar to record an Excel macro.
The need for setting up a centralized CCM solution, which is perform ant, integrated to the source systems with a data extraction tool and distributing results to legal entities on global level usually can only be done with a proper client/server solution of a data analytic tool. Let alone data volume, but when you need to go that full distance (and sometimes the extra mile) in such projects, you need a decent vehicle for your journey. Though a lot of the standalone software solutions are quite powerful, you only can do the full range of aspects required by a CCM project with a software which is made for this, and which can be put on a server, with user management in place, multi-processor usage maybe, working in the background for you even when your laptop is disconnected.
The programming language as well as the tools GUI is required to have a reasonable usability. There may be a lot of pretty technical tools on the market. However these analytic tools often need to be run by people who have none or only partial IT background. So the tools need to provide access to solving technical tasks in a non-technical manner.
So far so good, but what does the bullet point “transparency” mean? Think of it as extremely important point that makes your life easier when you are the data analyst. Imagine performing not one analytic test like in our example, but 20. Imagine sending the results out to 40 legal entities worldwide, getting C-level attention if your results indicate risks or even fraud. The results may trigger a course of action in these 40 legal entities, involving a lot of people.
A lot of them will ask “Woooha, did you do it even correctly? How did you make sure that there wasn’t an error? How did you do it? Was xyz considered in this analytics?” The last thing you might want to answer now is “I have no idea, I just pushed a button.” In the ISACA article mentioned above we talk about avoiding the black box effect. Tools and algorithms need to be as transparent as possible; wherever you feel that they are not possible enough, make sure to have a proper documentation in place. Also a kind of log file capacity which is built in the analytic tool might help big time here.
It helps if the analytic tool allows read-only access, as obviously you want to avoid that when doing data analytics, you change the original data by accident and produce errors accidentally, for example by overwriting data or making copy & paste errors. Don’t forget that you need to deal with an immense volume of data, so any accidental change would not be easy to spot in that huge pile of digits.
There are some tools on the market which fit those criteria, just to mention ACL™ (Audit Command Language) from ACL Services Limited or IDEA from Audicon. For our projects, we mainly use ACL™, because the features we need for solving the tasks are quite well-balanced there. With the new client/server products like ACL™ Analytic Exchange (AX) you can do a lot, and also the desktop solution ACL™ Analytics (AN) has improved a lot regarding features and usability. When deciding what to use in your CCM project, have a look at the key features listed here, and compare for yourself what solution fits your needs best, maybe test drive the software.
Also look out for existing content. You may not have to re-invent the wheel all over again. Hundreds of companies might have analyzed “Payments to critical countries” in an SAP® environment before. Instead of putting 50 days of implementation work in developing that analytic, you might get something out of the box for the analytic tool you bought or will buy. This may accelerate your projects a lot. The analytic program is only a shell usually – quality content is key!
If we take our 5 things as checklist, currently we managed to the first four steps: