Last week I wrote a post to cover an overview about Data Product Managers. You can read it here.
While researching for the article, I came across so many different terms and names for the Data Engineering Teams that I was baffled and intimidated to understand the life of folks working in data and the way they can keep track of all the happenings in their world.
Not to get philosophical but when overcome with confusion, its always worth it to look on the inside and listen to the voice in the head.
I did that and found a recognizable voice. Me.
My Struggle with the World of Data Engineering
When I am not talking to myself or the voice in my head I tend to work on real-world problems and product enhancement.
As a Product Manager, there are few options on which I can rely to give me answers to build on the solutions to the problem. The most reliable one is data.
The real problem is not just acquiring the data but making sense of the data to align it with Product Strategy.
Is there such a thing as too much data? Can Organizations have too much data? Who should own the data? Who should manage the data?
There are a lot of questions that remain to be answered by the experts. I want to tap into my experience to try and make sense of the data world in this article.
80% of the Healthcare world is Data, The Rest is Unexplored
I spent a decade working in the Healthcare industry thoroughly enjoying the thrilling journey of walking into hair-splitting complex problems and walking out with proud solutions.
Working in a team solving complex healthcare IT problems is a satisfying way to build solutions and relationships.
When in IT, the usual consensus is to follow the trawlers like a seagull and there will be food to feed the herd. The same notion does not work in Healthcare IT.
Before we can build customer-facing solutions, the conundrum of data needs to be solved first and foremost. There are many different terms for it -
- Data Interoperability
- Data Security
- Data Quality
Each of these terms tries to solve a unique problem in Healthcare IT.
I spent my time working on interoperability and quality. I learned a thing or two from the experience.
What is Interoperability and How it Works in the Data World?
Have you ever wondered since the early days of the internet, I am not talking about the days of ARPANET and the first word transmitted over the network, which was “lo” and there is an interesting story behind it. Read it here. I am talking about the World Wide Web.
When the World Wide Web was launched, the team needed to figure out how to exchange data uniformly and consistently. Hence the web protocol.
You can read about it here.
In the Healthcare world, interoperability is achieved by agreeing on information exchange standards, terminology standards, content standards, identifier standards, and standards for security and privacy.
There are experts needed to work on the standards & policies while engineers can develop the systems needed to implement the standards & policies.
To give a short example of the challenges of interoperability in the healthcare world, consider the scenario of trying to share patient information between two providers. The format of data stored by one provider is different than that stored by another provider even though the data stored is similar. While automating the sharing of data, one additional step will be needed to manage the formats before data can be consumed by either system. Multiply this issue by millions of records and thousands of facilities while adding data security & privacy issues across them.
As you can see the world where a simple “lo” was transmitted across the wire, can easily turn into a complex web of policies and standards where regulations drive the innovations.
Who makes the policies and who engineers the solutions?
I am not trying to be political here, but the world of data is more democratic than anyone can imagine. All kinds of data are equally important. Everyone handling data is an engineer.
The question that arises when we are building systems that predominantly rely on an exchange of data is one of accountability & responsibility,
Who is accountable for creating and managing the policies to handle data and who is responsible for creating those solutions?
This gives rise to many such verticals but to summarize it broadly -
- Data Governance
- Data Engineering
There are actors in each of these verticals and they play a pivotal role in shaping the data in any organization.
Key Roles in Data Governance & Engineering
The term Data Governance sounds very broad and serious. I wondered if it was made up by people to make it seem more real but then I looked up the definition.
Data governance is a set of principles, standards, and practices that ensure your data is reliable and consistent. It also helps ensure that your data can be trusted to drive business initiatives, inform decisions, and power digital transformations.
The definition itself suggests the involvement of players intending to define the rules and make people play by them. A successful data-driven organization should value and understand its data while having the right people managing and driving the maintenance of it.
Chief Data Officer is the ideal person to manage data in the organization while able to leverage it to solve complex business-related problems. A CDO in coordination with Data Owners and Data Stewards are the ones who drive the data strategy and Data Strategy sits at the core of a data-driven organization.
Data Owners are the ones who own the data assets used in the organizations while Data Stewards are responsible for maintaining the data assets.
When it comes to analyzing, cleaning, and maintaining the data there are a slew of characters playing a very pivotal role in the process.
The Data Analysts and Data Scientists are the people who get their hands dirty in dealing with complexities around data and tooling used for extracting, analyzing, transforming, and loading the data in the storage.
If you have seen enough superhero movies then you might want to think of this team of CDO and Data Engineers as the Avengers that protect the organizations from crumbling under the attack of bad actors.
The Data Analysts and Data Scientists are experts in dealing with technical complications of tools & infrastructure. To give a small idea about the type of complications they deal with, consider creating a data ingestion plan to extract data from one of the source systems in the organizations dealing with employee finances. A typical task for Data Engineers is to extract, analyze, and clean the data using libraries in Python or plain old SQL in tools like Apache Airflow or Azure Purview. The deliverable for such a task includes creating an automated pipeline for the above task to be done on a scheduled basis.
Not to forget all this starts with a business problem that the Data Engineers need to interpret and understand correctly as per the stakeholders.
This is where one crucial player comes in to play a crucial part to make sure all the parties play well together.
It is the job of the Data Product Managers to make sure complex problem statements from businesses are translated into actionable tasks for Data Engineers.
How Do I Get Familiar with the Data World?
As I mentioned earlier, I worked in healthcare IT for a decade and wore multiple hats but the most enjoyable part of the job was solving problems.
Starting as a typical Software Engineer with Development, Testing, and Automation to being an ETL developer and Domain Expert, I always wondered about what next and where next.
Data Product Management is a unique horizontal that makes sure data is treated as a product and just like a product it goes through a lifecycle. To familiarize myself with the world of data, and understand the responsibility of Data Product Management in an organization.
What is your experience with Data Engineering? Maybe I missed something here, if I did, please share it in the comments section. Let’s share knowledge so we uncomplicate this complex world of data.