
The new release also introduces a schedule parameter that consolidates all of Airflow’s extant scheduling parameters. Enabling ML engineers and ops personnel to automate the process of retraining, testing, and redeploying production ML models, radically simplifying maintenance.ĭata-driven scheduling is Airflow 2.4’s top-line feature, but it isn’t the only major change.Guaranteeing the timely delivery of the cleansed, conditioned data used to feed the business-critical KPIs, metrics, and measures that power operational dashboards.Ensuring that data scientists, data analysts, and other self-service users always have access to the up-to-date data they need to do their work.Basically, if you have any downstream use case that depends on one or more upstream data sources, data-driven scheduling is going to transform how you use Airflow. Doing so makes it easier for you to ensure the timely delivery of data to consumers in different roles, as well as optimize the runtime performance of your business-critical DAGs. In Airflow 2.4, you can use data-driven scheduling to break up large, monolithic DAGs into multiple upstream and downstream DAGs, and explicitly define dependencies between them. Several significant UI improvements, including an ability to drill down into log files from Airflow’s grid view.Support for additional input types for use with Airflow’s dynamic task mapping feature.A new CronTriggerTimetable makes Airflow’s scheduling syntax behave more like cron’s.A new consolidated schedule parameter is available as an alternative to Airflow’s existing schedule and timetable parameters.Data-driven scheduling is fungible enough to be adapted to many common use cases as well as cutting-edge ones like model maintenance in ML engineering.You can break up monolithic pipelines and run multiple upstream and downstream DAGs in their place, so that adding a new data source or new transformation logic to one won’t adversely affect the performance of the others - which helps in prioritizing your business-critical pipelines.



New data-driven scheduling logic inside of Airflow 2.4 automatically triggers downstream DAGs to run once their upstream dependencies successfully complete.The same can be said of today’s Airflow 2.4 release, which introduces a “datasets” feature that augments Airflow with powerful new data-driven scheduling capabilities. When Airflow 2.3 dropped just over four months ago, we called it one of the most important-ever releases of Apache Airflow, thanks mainly to its introduction of dynamic task mapping.
