Cloud migration: if companies aren’t actively doing it, they’re definitely talking about it. Studies now show that the average enterprise engages at least 1200 services. 1200! Articles are everywhere detailing the pros and cons of the cloud, so our focus here is a challenge that not a lot of people discuss: the overwhelming amount of data created by these myriad services, the problems it can present, and the strategies needed to surmount them.
Dark Data – and lots of it
Perhaps ironically in the age of Big Data, one of the most widespread and costly challenges is dealing with the copious amount of data generated when a company has on average over 1,000 services in play (a note: many of these services are micro-services and practically imperceptible, but they still generate data).
Between marketing tools, collaboration software, and HR services – just to name the top three most used services – the average company is already totaling 300 applications. Then there’s DNS, hosting, email, security, microservices. These things generate “off-grid” data – Dark Data – that can make up nearly 90% of a company’s overall data collection. Surprise: it’s growing by over 50% year over year.

The costs? Missed opportunity, tons of manual work, knowledge silos, stale analysis. The fix depends on what type of Dark Data you’re dealing with.
Dark Data Type 1: Unstructured but accessible data
If your team is doing tons of manual work, like copy/pasting, reformatting reports, importing/exporting, then you’re working with a kind of Dark Data that’s called unstructured data. It’s data that requires work to model, reformat, and organize.
It may be easy enough to get to, like a company message board, for example, but it’s extremely difficult to glean insight from, and it doesn’t play well with the other data.
Unstructured data is a prime candidate for automations like Datafile Parsing using things like Python and Pandas or other methods of scripting. Basically anywhere data must be converted from one form to another, there’s an opportunity to reduce manual work with automations.
Dark Data Type 2: Inaccessible data
Here’s where all those hundreds (and thousands) of apps and services start causing big problems. They all generate data. Perhaps much of it is useless, but perhaps some of it is extremely valuable. When it’s stuck behind a platform, you can’t know. It’s lost opportunity.
Looking for applications and services with native integration is extremely helpful in mitigating this issue, but that’s not always possible (ahem…legacy apps). In those cases, APIs will be your friend, helping you pull out the data you need to drive decisions.
Similarly, applications with single logins cause bottlenecks where only one person can log in and get the data. Not only is that time-consuming for the individual, it’s a risk for the organization to have the keys to that knowledge tied to one person.
Something like Robotic Process Automation (RPA), which can do all the human interaction automatically, solves this problem. While there are concerns about RPA, it’s a great catalyst to fast automation.
Dark Data Type 3: Siloed data
Downloaded files. Files in your emails. Static spreadsheet files. There’s so much badness about data silos that it hurts just to talk about it. This stuff is a risk for compliance, it creates a ton of extra work, it makes your data’s accuracy questionable at best, and on and on. It’s just bad.
Data silos are a more general data problem that overlaps others and so there’s no one best solution. Demolishing silos requires a knowledge of where they exist and a data management strategy to bring them back into the fold.
Toward Data Management
Data problems, like most problems, need a strategic approach. You need to understand what data you have access to, then what questions you want to ask it.
Once these questions start getting answered, you’ll begin seeing which automations make sense with your goals – whether it’s some example listed here, or a different solution altogether. What you want to know from your data should drive every decision you make in getting the answer, from data storage, to automating ingestion, to presentation.
Without question, the landscape of business data is growing in both size and complexity. Virtually everything done in a business is a source of data. Knowing what’s valuable, where it is, how to get it, and what to ask of it will be the difference between leveraging data for success and simply chipping the tip of an iceberg of value.
Bill Erickson is Interject’s head of communications. He’s an avid strategist and a professional writer. His passion for story and value gives our team and our clients a voice.