It focuses on examples of big data in business and, to be precise, the following sentence aroused my interest: "Let's be clear: Big data aren't necessarily intimate data like your personal medical record, the salary of John Smith sitting opposite you, or the content of private e-mails sent by a colleague." [WiWo1]
This statement is important. In public discussion you sometimes get the impression that data analysis automatically has something to do with illegal snooping. In other words employees are spied on, normal citizens are eavesdropped upon, your movements are plotted, your chats are recorded and read. The article gets away from this by some positive examples that make welcome reading.
Today data analysis is no longer nice to have, it must be taken for granted. The fact that, as Kerstin Daemon cites in her article, [WiWo1] “some 2.5 quintillion bytes of data are produced daily (according to IBM) but only 1% of this daily data volume is actually analyzed (says market researcher International Data Corporation IDC)” shows that there is still a long way ahead of us, that we have a lot to catch up on. And that although the buzzword "big data" is already starting to sound outmoded and stale.
Data analysis has not always come off so well — which is understandable given the many negative examples in the recent past. But data analysis offers such a broad spectrum of questions and leverage points that it should be a simple matter for any data analyst to identify ethical fields of action and work in them.
Read the online article in "Wirtschaftswoche" [WiWo2] by Meike Lorenzen titled "The great data chaos of German business" and you are immediately struck by the subject of data quality. In daily business we are often confronted with a statement like the following: "It's no use analyzing data, they're not well-ordered in the first place." For me that is the wrong attitude, especially when you look at the development of data volume in recent years. If as a company you delay the question of master data quality too long, you are only putting it off for later, when the problem will have become all the greater. Say the master record of a supplier is already wrongly created in the system five times for instance. When the next business transaction comes round and in the absence of assuredness of decision-making a sixth master record will be created in the system instead — the problem just amasses the further you go. Mandatory fields not filled in (tax numbers), legal requirements (correct and complete addresses and tax numbers of business partners), analysis of duplicates in customer, supplier and material master records: Just tackling these "simple" chores seriously and consistently is a challenge but would much improve process throughput and quality. Nothing to do when it comes to enhancing the quality of master data in your own company — that amounts to trying to drive from here to China using a road atlas from the year 1979 in which pages are missing, others are duplicated, and coffee stains and burns from cigarettes have made important details impossible to read anyway.
Luckily we seem to see confirmation of what Meike Lorenzen is getting at when she says "That's why engineering executives have polished up the subject of databases, and carried it into management levels." [WiWo2] There we are! Awareness must spread from engineering to management — although engineers are basically better suited to solve the problem, the catalog of measures is hardly likely to be adequate without support, ie budget, from management.
Of course it does not have to be the quality of master data. Analysis of potentially duplicated payment, determination of overdue, open items, cash discount losses or price variances between orders and contracts, frequent upward adaptations of credit limit and excesses of credit limit, consistency of terms of payment — these are all further examples of business (administration) motivated analyses that we conduct on a daily basis.
But there are also areas that can present conflicts. There are rules in companies, and these rules (created to protect a company, but also often motivated by national or international legislation, transparency and the need to counter corruption) must be observed to make sure they are maintained. Were there deliveries to or orders from parties that are noted on embargo or sanction lists? Are supplier names and invoices a front for "genuine" companies, a "genuine" delivery or rendering of a service? Or were false suppliers created and false invoices submitted for the purpose of personal enrichment or to form slush funds? This is often a subject for personal data such as names, addresses or bank accounts, also in the context of data analytics.
What analyses are right? What are not? How do you analyze data in gray zones like those above?
I see three points to orient on:
- Observation of legislation
- Transparent, trusting cooperation with data protection and works committee
- Ethics of data analysis as a subject of corporate ethics and guiding principle
1. Observation of legislation
There is not really much to add on this point, it speaks for itself. Of course it is essential in creating legislation — especially where such "new" aspects are concerned — that appropriate professional expertise be in place. This must combine the necessity of data protection (like for sensitive data in human resources management, security of personnel and customer data) with the need for data analysis in different areas (data quality, process improvement, internal auditing and compliance, detection of fraud and corruption, white-collar crime) in their drafted legislation as practical and relevant as possible.
2. Transparent, trusting cooperation with data protection and works committee
In a company applying data analytics transparency and communication with data protection and a works committee are a decisive factor. Why do you apply data analysis and where do you do it? What is analyzed and — very important — what is NOT analyzed? What do you steer clear of? Basically everyone involved should have the same objectives: To protect a company against damage (whether through poor data quality or through false invoices). But also to safeguard employee data entrusted to the company and clearly define which possibilities of data analysis you use with which motivation, and which you will not use. Pointing out technical possibilities (like anonymization or pseudonymization) or also of data economy is an important part of this discussion. Everyone involved must tackle the subject of data analysis; and the more open, transparent and positive this communication is footed, the greater the probability that the interests of all persons and parties involved can be warranted.