Stop data pollution from turning your company’s data lake into a swamp

Hear from the CIO, CTO and other C-level and senior executives on data and AI strategies at the Future of Work Summit on January 12, 2022. Learn more

This article was contributed by Kevin Campbell, CEO of Cinity

Today, every organization is a data organization. It doesn’t matter if you work for a tech company, an established manufacturer, a legacy financial services firm or a government agency in Silicon Valley, your company is collecting, storing and targeting more data than ever before. Keeps.

Globally, we are in the midst of a data explosion right now; The total global volume of enterprise data is projected to double between 1,005 and 2,025 terabytes between 2020 and 2022. It is not surprising that many organizations are playing a permanent catch-up game, lacking the knowledge and tools to effectively manage their data. Collect so that it is really useful.

To control this data flood, many enterprises turn to data lakes instead of standard data warehouses. Theoretically, Data Lex gives businesses the upper hand in terms of scalability, flexibility and integration with technologies such as IoT. However, instead of a natural data leak, many organizations end up with something like a static data swamp, filled with obscure data pollution. So, what can you do to stop the swamp and take full advantage of your data?

1. Choose the most important company data … and (almost) everyone agrees

I have seven children, so as a father, of course, I love all my children equally. The same is not true for data. Stop treating all of your company’s data as if it were of equal importance. Trust me, it’s not.

You need to decide – with some key stakeholders – which data is most important to your organization and its goals. You probably won’t be able to cover all of your data, and dumping it all into a data lake is the fastest way to create a swamp. So, come up with data that drives the company and delivers broad business value – driving efficiency, enhancing customer experience, providing product development information – and designating it as your KPIs and success metrics.

Once you’ve got that key success metric and the most important data, make sure you socialize it with key stakeholders so you have a purchase. Here are some questions to ask:

  • What are our key KPIs?
  • What is the matrix we will measure?
  • Do we understand the formulas for this calculation?
  • What rules are required around how data is drawn in this matrix?
  • In what system does our data reside?

Consider creating a data charter that clearly states the above so that everyone can refer to it and support your overall data strategy.

2. Know your data

So, you have chosen the most important, business-critical data, and you have got the agreement on it from the key people in your organization. What’s next To persuade some wise Greek philosopher, you need to know your data – how is it made? Where is it entered? How is it approved?

Take stock of where your company’s important data is coming from and how and where it entered your system. From there, let’s make sure the data you’re storing is accurate; Effective and regular cleaning will suppress or modify incorrect, incomplete, irrelevant or improperly formatted data. Make sure you include procedures for getting rid of duplicates and merging different datasets. Duplication may not be the sexiest thing in data, but it is one of the most important – and well done, it can save you a lot of money and resources.

Due to the variety of databases, file formats, structures, it will take time and work but do not skip this step. It is crucial to remove the internal silos and create really valuable data. Proper maintenance and point-of-entry implementations that exclude duplicate records and bad addresses are non-negotiable. Without it, your lake will swamp again before you know it. Organizations often make this mistake.

3. Governance is important for company data

I know. Governance is often seen as controlled, slow, and limited. But in reality, it helps to delegate authority and control over the data asset, so that the data remains consistent and can be used throughout the organization.

For many businesses, customer success is one of the most important KPIs. To properly understand the entire consumer lifecycle, it goes back to the first marketing contact. Who creates and establishes those customer records?

Without proper governance, we can have multiple numbers for a single customer, which dilutes the information we have, prevents us from making smart data-based decisions, and potentially impairs our ability to provide an excellent customer experience.

Good governance should support compliance with any regulation affecting your organization, whether it be HIPAA, GPDR, CCPA, POPI, LGPD or beyond.

Those data charters referred to earlier can serve as the cornerstone of your governance strategy. As the data program continues, it’s easy to lose sight of your initial goals. Make sure you refer to it regularly so that it stays on top of mind for all stakeholders. Likewise, it’s important not to be too rigid, so if your organization’s needs change, adjust your data charter accordingly.

Last but not least, transparency is crucial. Internally, this means clear communication between all stakeholders, allowing different departments to impart their knowledge, while maintaining transparency and accountability for maintaining data quality.

Externally, your company needs to be completely transparent about what customer and potential data it is collecting. The most obvious reason for this is to avoid regulatory fraud – Google, WhatsApp and CaixaBank all received multi-million-euro fines for violating GDPR transparency clauses. It’s just not worth it.

The more data, the better? Not necessary

More data is not always good. Companies should be careful about collecting and storing data for which they have limited tangible use. This represents not only security, privacy and compliance risks, but also unnecessary costs for storing and managing such data. Instead, focus on data that has value and usefulness – you already have more than enough!

Clean, useful and valuable data has the potential for new business growth, streamlining operations, enhancing customer relationships and increasing agility. Who doesn’t want that?

For more than three decades, Kevin Campbell has been enthusiastically driving innovation and growth in global Fortune 500 and start-up organizations. Currently, he serves as the CEO of Syniti.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including tech people working on data, can share data-related insights and innovations.

If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing to your own article!

Read more from DataDecisionMakers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *