Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw data can still take days or weeks. Most organizations can’t scale data science teams fast enough to keep up with the growing amounts of data to transform. What’s the answer? Self-service data. With this practical book, data engineers, data scientists, and team managers will learn how to build a self-service data science platform that helps anyone in your organization extract insights from data. Sandeep Uttamchandani provides a scorecard to track and address bottlenecks that slow down time to insight across data discovery, transformation, processing, and production. This book bridges the gap between data scientists bottlenecked by engineering realities and data engineers unclear about ways to make self-service work.
I read most of the book and skipped the chapters that were not as relevant to my current day-to-day. The book is centered around determining your time to insight from data discovery all the way through operationalizing the data which is then used to define your Time-to-Insight Scorecard. He talks through 18 metrics and spends each chapter digging into them. Some of the chapters are more useful than others and he does really get into the weeds for some of them. He also tends to focus on how to determine what tools will help you assess the metric you are investigating. Part 1 which focused on self-service data discovery was the most useful section for me.
The following image provides a view of the core concepts.
This book is probably most relevant to folks who are data engineers and data platform engineers. The intro provides a great high-level overview and is definitely worth the read if you decide to read nothing else in the book. Concepts that are worth digging into if you want to get the most out of this book are data architecture concepts, data modeling concepts, and data management concepts.
The book is a good introduction for everybody in a company willing to seriously start their Data Roadmap. Especially for people managing this journey, to be aware how complex the topic is, how many pieces must be glued, how different technologies must be considered and how much expertise is needed. Big Data, AI, ML are loudly spoken by whoever, but are you really conscious of what's behind the scenes? Software Data products have peculiar engineering challenges, they live in a wider field of competences at all levels, so starting this journey is not easy for a company and if not properly managed will easily terminate in a failure. This book tries to make this journey even less centralized with the aim to minimize the time to insights, via automation and proper structuring of all the pieces. No more than three stars because if you are willing to go at a practical level you need more specialized information, potentially one book for each chapter.