Airflow - beyond the basics (and also some basics)
Room: Tutorial Room
Time: Oct 04 (Wed): 09:00
Apache Airflow is a platform that allows Pythonistas to programmatically author, schedule and monitor workflows. It was originally created by the nice folks at Airbnb because they had a lot of problems.
Now Airflow is open source and pretty well documented. But it has a few gotchas and unexpected behaviors.
In this tutorial we'll cover:
- a setup tutorial and tour for those who are new to Airflow
- using Airflow command line utilities for efficient dag authoring
- how to create custom operators that don't break Airflow's magic. We'll do this by creating a Django operator (airflow tasks are instances of operators)
- strategies for passing non-trivial data between tasks
- how to generate Dags and tasks based on data instead of authoring everything by hand ( a dag is like a workflow, a collection of tasks and configuration)
The examples that will be covered in the tutorial are available from this github repo.