Great Expectations About Data Quality

Speaker: Ariella Rink

Track: Data Science

Type: Talk

Room: Lefthand Room (Seminar Room 3)

Time: Oct 04 (Fri): 12:15

Duration: 0:45

A story about the City of Cape Town Data Science Branch’s quest to develop a framework for measuring and monitoring data quality; what worked, what didn’t, and what we learnt along the way.

Testing out popular data quality packages from the Python ecosystem (Great Expectations, Soda) and a roll-your-own approach, we contended with both generic and domain-specific measures of data quality.

I'll begin the talk with a brief background to the Data Science Branch and provide some context for why we embarked on the data quality mission. I'll then present an introduction to the study of data quality, before moving on to our implementation in Python using the packages described above, discussing the pros and cons of each approach. I'll wrap up the talk by discussing our future plans and summarizing the major learnings we gained along the way.

This talk is intended for anyone who has faced questions about data quality in their own work, and emerged more or less unscathed. It is hoped that the presentation provides insight to both the audience and speaker by generating discussion about other data practitioners’ experiences in this realm.


Thinkst Canary
Python Software Foundation SARAO
AWS City of Cape Town
Afrolabs Centre for High Performance Computing
Black Python Devs