Data analysis – the good, the bad and the ugly

This time I would like to look at an article by Kerstin Daemon headed "Big data in business — when the firm knows when you want to quit before you do", which appeared in the April 15, 2015 issue of "Wirtschaftswoche" [WiWo1] and "The great data chaos of German business" by Meike Lorenzen published April 12th, 2013 [WiWo2].

[::en-US::] (c) Fotolia

It focuses on examples of big data in business and, to be precise, the following sentence aroused my interest: "Let's be clear: Big data aren't necessarily intimate data like your personal medical record, the salary of John Smith sitting opposite you, or the content of private e-mails sent by a colleague." [WiWo1]

This statement is important. In public discussion you sometimes get the impression that data analysis automatically has something to do with illegal snooping. In other words employees are spied on, normal citizens are eavesdropped upon, your movements are plotted, your chats are recorded and read. The article gets away from this by some positive examples that make welcome reading.

Today data analysis is no longer nice to have, it must be taken for granted. The fact that, as Kerstin Daemon cites in her article, [WiWo1] “some 2.5 quintillion bytes of data are produced daily (according to IBM) but only 1% of this daily data volume is actually analyzed (says market researcher International Data Corporation IDC)” shows that there is still a long way ahead of us, that we have a lot to catch up on. And that although the buzzword "big data" is already starting to sound outmoded and stale.

Data analysis has not always come off so well — which is understandable given the many negative examples in the recent past. But data analysis offers such a broad spectrum of questions and leverage points that it should be a simple matter for any data analyst to identify ethical fields of action and work in them.

Read the online article in "Wirtschaftswoche" [WiWo2] by Meike Lorenzen titled "The great data chaos of German business" and you are immediately struck by the subject of data quality. In daily business we are often confronted with a statement like the following: "It's no use analyzing data, they're not well-ordered in the first place." For me that is the wrong attitude, especially when you look at the development of data volume in recent years. If as a company you delay the question of master data quality too long, you are only putting it off for later, when the problem will have become all the greater. Say the master record of a supplier is already wrongly created in the system five times for instance. When the next business transaction comes round and in the absence of assuredness of decision-making a sixth master record will be created in the system instead — the problem just amasses the further you go. Mandatory fields not filled in (tax numbers), legal requirements (correct and complete addresses and tax numbers of business partners), analysis of duplicates in customer, supplier and material master records: Just tackling these "simple" chores seriously and consistently is a challenge but would much improve process throughput and quality. Nothing to do when it comes to enhancing the quality of master data in your own company — that amounts to trying to drive from here to China using a road atlas from the year 1979 in which pages are missing, others are duplicated, and coffee stains and burns from cigarettes have made important details impossible to read anyway.

Luckily we seem to see confirmation of what Meike Lorenzen is getting at when she says "That's why engineering executives have polished up the subject of databases, and carried it into management levels." [WiWo2] There we are! Awareness must spread from engineering to management — although engineers are basically better suited to solve the problem, the catalog of measures is hardly likely to be adequate without support, ie budget, from management.

Of course it does not have to be the quality of master data. Analysis of potentially duplicated payment, determination of overdue, open items, cash discount losses or price variances between orders and contracts, frequent upward adaptations of credit limit and excesses of credit limit, consistency of terms of payment — these are all further examples of business (administration) motivated analyses that we conduct on a daily basis.

But there are also areas that can present conflicts. There are rules in companies, and these rules (created to protect a company, but also often motivated by national or international legislation, transparency and the need to counter corruption) must be observed to make sure they are maintained. Were there deliveries to or orders from parties that are noted on embargo or sanction lists? Are supplier names and invoices a front for "genuine" companies, a "genuine" delivery or rendering of a service? Or were false suppliers created and false invoices submitted for the purpose of personal enrichment or to form slush funds? This is often a subject for personal data such as names, addresses or bank accounts, also in the context of data analytics.

What analyses are right? What are not? How do you analyze data in gray zones like those above?

I see three points to orient on:

  1. Observation of legislation
  2. Transparent, trusting cooperation with data protection and works committee
  3. Ethics of data analysis as a subject of corporate ethics and guiding principle

1. Observation of legislation

There is not really much to add on this point, it speaks for itself. Of course it is essential in creating legislation — especially where such "new" aspects are concerned — that appropriate professional expertise be in place. This must combine the necessity of data protection (like for sensitive data in human resources management, security of personnel and customer data) with the need for data analysis in different areas (data quality, process improvement, internal auditing and compliance, detection of fraud and corruption, white-collar crime) in their drafted legislation as practical and relevant as possible.

2. Trans­pa­rent, trust­ing co­op­er­a­tion with da­ta pro­tec­tion and works com­mit­tee

In a company applying data analytics transparency and communication with data protection and a works committee are a decisive factor. Why do you apply data analysis and where do you do it? What is analyzed and — very important — what is NOT analyzed? What do you steer clear of? Basically everyone involved should have the same objectives: To protect a company against damage (whether through poor data quality or through false invoices). But also to safeguard employee data entrusted to the company and clearly define which possibilities of data analysis you use with which motivation, and which you will not use. Pointing out technical possibilities (like anonymization or pseudonymization) or also of data economy is an important part of this discussion. Everyone involved must tackle the subject of data analysis; and the more open, transparent and positive this communication is footed, the greater the probability that the interests of all persons and parties involved can be warranted.

3. Ethics of da­ta an­a­ly­sis as a sub­ject of cor­po­rate ethics and guid­ing prin­cip­le

My third point is the ethics of data analysis. I see this as applying not only to the companies themselves (say the data and process owners) but also the employees and often also external service providers active in this context. Integrity of action and firmly seated data analysis ethics not only help all involved to orient, they are also a must in the land of unlimited analytic possibilities. Underscoring the positive opportunities of data analysis, the benefit for all involved, the necessity of taking it for granted, but also respect for personal and sensitive data — all that should not be hollow-phrased marketing actions, it must be lived and practised daily. The result for the individual data analyst must not necessarily be an intricate set of rules. From one day to the next the old truism "Do as you would be done by" is enough to go by.

In this blogpost I referred mainly to the following articles which have been published in the “Wirtschaftswoche” or “Wirtschaftswoche online”:

www.wiwo.de/technologie/digitale-welt/studie-zu-datenqualitaet-das-grosse-datenchaos-deutscher-unternehmen/8057598.htm [WiWo2]

For any comments on this article, feel free to write us at info@dab-gmbh.de.

To contact the author you can also use LinkedIn or XING (you may have to login first before you can access these links).

LinkedIn: http://de.linkedin.com/pub/stefan-wenig/54/1b8/b30

XING: https://www.xing.com/profile/Stefan_Wenig2?sc_o=mxb_p

Comments (0)
Be the first who comments this blog entry.
Blog login

You are not logged in. Please log in to comment this blog entry.

go to Login