Arne Lapõnin works as a Consultant Developer at Thoughtworks Spain. He is fascinated by using data to create tools so that people would be empowered to make better decisions. Over the last couple of years, he has been helping clients with data engineering and data platform projects. Arne loves climbing and skateboarding and is obsessed with good coffee.
What Can Go Wrong With Testing an ETL Pipeline?
With businesses wanting to make data-driven decisions, software engineers find themselves architecting and implementing more complicated data-intensive applications. We are seeing more engineers making the jump from a more traditional application software development to data engineering. Yet, when it comes to testing data-intensive systems, the lessons that we have learned over many decades of writing application code seem to be forgotten. When developing ETL pipelines, one might worry about how they are going to get enough production-like synthetic data to test their system. While in reality, they should think about how they are implementing a high-quality system responding to their users’ needs. This talk will be about how a testing strategy evolved in one of the analytics projects Arne was involved in and how he learned these old lessons the hard way.