Questions

#Questions
  • On a scale of 0-10 how likely are you to recommend Apache Airflow? (0 being not at all)
  • How do you expect your use of Airflow to evolve in 2019? Increase, Stay about the same, Not sure yet, Decrease
  • How many active DAGs do you have in your Airflow cluster(s)? 1—5, 6—20, 21—50, 51+
  • Roughly how many Tasks do you have defined in your DAGs? 1—10, 11—50, 51—200, 201+
  • What executor do you use? Sequential, Local, Celery, Kubernetes, Dask, Mesos
  • What would you like to see added/changed in Airflow for version 2.0 and beyond? Free text input
  • Anything else you’d like to mention? Free text input

Results

#Results

1. Airflow’s Net Promoter Score

#1.-Airflow’s-Net-Promoter-Score

The average score was 8.3, which is pretty good (though the channels I used to find respondents is probably going to impart a large chunk of selection bias to the results. Still, a pretty good figure, and there were some “detractors”, and I’m glad they responded too!)

RatingResponsesPercent
000%
100%
210.7%
321.3%
432%
521.3%
685.3%
7149.2%
85133.6%
92818.4%

2. Change of use in Airflow for 2019

#2.-Change-of-use-in-Airflow-for-2019
AnswerResponsesPercent
Increase118 responses77.6%
Stay about the same28 responses18.4%
Not sure yet4 responses2.6%
Decrease2 responses1.3%

3. How many DAGs

#3.-How-many-DAGs
# DAGsResponsesPercent
1—525 responses16.6%
6—2044 responses29.1%
21—5027 responses17.9%
51+55 responses36.4%

4. How many Tasks

#4.-How-many-Tasks
# TasksResponsesPercent
1—1041 responses27.3%
11—5044 responses29.3%
51—20023 responses15.3%
201+42 responses28%

5. What executor do you use?

#5.-What-executor-do-you-use?
ExecutorResponsesPercent
Celery95 responses63.8%
Local30 responses20.1%
Kubernetes16 responses10.7%
Sequential7 responses4.7%
Dask1 response0.7%
Mesos0 responses0%

6. What would you like to see added/changed in Airflow for version 2.0 and beyond?

#6.-What-would-you-like-to-see-added/changed-in-Airflow-for-version-2.0-and-beyond?

To be able to summarize these answers in any useful format I’ve had to try and classify the responses given. For each response I classified it as against the following categories and sub-categories. In total 70% of the responses had

Scheduler - 23 comments

#Scheduler---23-comments

High-availability or run multiple schedulers: 8 comments

#High-availability-or-run-multiple-schedulers:-8-comments

Performance of scheduler: 8 comments

#Performance-of-scheduler:-8-comments

Ed: Comments about the CPU use of the scheduler when running, or the time it takes the scheduler to queue tasks

Reparsing of DAG files: 5 comments

#Reparsing-of-DAG-files:-5-comments

The scheduler currently re-parses the DAG files in a fairly tight loop, which can be a bit heavy on external systems if you have a dynamic DAG.

Improvements to SubDAGs: 2 comments

#Improvements-to-SubDAGs:-2-comments

General requests for “improve subdags”. Ed: I agree, and I’m surprised more people didn’t ask for this.

Webserver and WebUI - 39 comments

#Webserver-and-WebUI---39-comments

Accessibility: 3 comments

#Accessibility:-3-comments

Colour blind/high contrast mode. General accessibility improvements. Absolutely, we should be better about this.

User Experience: 11 comments

#User-Experience:-11-comments

Lots of comments around asking for a “Better UI” or a “Cleaner UI”

Performance: 7 comments

#Performance:-7-comments

Comments about the UI being slow - especially for large DAGs or a large number of DAGs.

The Web server shouldn’t have to parse the DAGs. Ed: Agreed, and AIP-12 will go a large way towards that

Auto-updating: 3 comments

#Auto-updating:-3-comments

Having to refresh the page to see tasks changing state is so 2001 ;)

Ed: this would make a huge difference to the feel of the UI, but might need larger architectural changes to make happen. Sadly

Operational Visibility:: 2 comments

#Operational-Visibility::-2-comments

Requests to make it easier to see that state of the whole Airflow system from within the UI - i.e. helping workout why tasks in a DAG might not be progressing etc.

Ed: people after my own heart!

Timezone handling: 5 comments

#Timezone-handling:-5-comments

Better handling of Timezones in the UI, specifically better support for local timezone. Ed: not clear if “local” means the viewers timezone, or just the configured timezone - i.e. do people access Airflow from multiple TZs?

Misc Feature Request: 8 comments

#Misc-Feature-Request:-8-comments

Comments that didn’t fit else where - things like parameterized DAG trigger from UI, more control, keyboard shortcuts, grouping/collapsing rows

Core - 15 comments

#Core---15-comments

The “core” of Airflow, excluding the scheduler or the webserver.

Plugins: 4 comments

#Plugins:-4-comments

Requests for clearer defined plugin architecture, splitting Airflow into core and plugins. Ed: they may not need to be plugins to split, just python modules would work

More Operators: 11 comments

#More-Operators:-11-comments

Requests for more operators/sensors. One good request was to have “composable” operators to explosion of XtoY operators. Ed: this would be nice! If someone wants to start an Airflow Improvement Proposal for this that would be ace.

Pull Request review/merge time - 3 comments

#Pull-Request-review/merge-time---3-comments

Three people commented about how long it takes to get PRs reviewed or merged. Ed: Absolutely, and we’d love to get through them quicker, but there is only so much time the volunteer-based committers can spend on this in a day without getting fired ;)

DAGs - 16 comments

#DAGs---16-comments

Inter-DAG dependencies: 3 comments

#Inter-DAG-dependencies:-3-comments

A better way of declaring cross-dag dependencies. Ed: None of the comments specifically said what the current ExternalTaskSensor was lacking.

Event-based Sensors: 4 comments

#Event-based-Sensors:-4-comments

The ability to sensors to respond to external events without polling. Ed: the new mode="reschedule" on sensors goes a little way to helping with this, but this could still be improved.

Versioned DAGs: 4 comments

#Versioned-DAGs:-4-comments

Asking for better handling of DAGs as they change over time.Ed: Again AIP-12 will go a large way towards that

Misc: 4 comments

#Misc:-4-comments

Various DAG API changes such as more flexibility in retry, SLA, timeout. Better isolation between DAGs Ed: PythonVirtualEnvOperator might help a little bit with this.

Documentation - 21 comments

#Documentation---21-comments

Lots of requests for better docs Ed: yes please!*, many mentioning “best practice” around deployment, upgrade process etc. Clearer write ups of what new features each release brings.

Kubernetes - 10 comments

#Kubernetes---10-comments

Better/tighter Kubernetes integration. Easier deployments of DAGs on Kube. Further customization of pods that are run.

Ed: Some comments like “integration with Kubernetes” probably ties back to the previous point about docs - we have a Kubernetes executor and PodOperators too. Maybe people don’t know about them

Alternative ways of authoring DAGs - 5 comments

#Alternative-ways-of-authoring-DAGs---5-comments

Ed: these are II’m afraid low-priority for the Airflow core team. One of the selling points of Airflow is that the DAGs are Python code. This could be added via a plugin though

Add a DSL (Domain Specific Language): 1 comment

#Add-a-DSL-(Domain-Specific-Language):-1-comment

A request to describe DAGs in YAML/JSON and then submit via the API - helpful for non-Python teams. Ed: JustEat described something similar (without the API) in their Talk ait the London Airflow Meetup #1)

GUI editor for DAGs: 4 comments

#GUI-editor-for-DAGs:-4-comments

Various “UI to edit from Web”, “drag-and-drop” etc.

Other - 20 comments

#Other---20-comments

Improved HTTP API: 5 comments

#Improved-HTTP-API:-5-comments

Calls for better/more fully-featured HTTP API - anything you can do via Web UI or CLI should be possible via HTTP API too. Ed: Totaly!

Test Framework for end users: 3 comments

#Test-Framework-for-end-users:-3-comments

Three people asked for “ways to test DAGs locally” or variations of that. Ed: Bas at GoDataDrvien wrote https://blog.godatadriven.com/testing-and-debugging-apache-airflow which provides some useful tips.

Miscelanous: 12

#Miscelanous:-12

Things that didn’t fit elsewhere, or didn’t deserve their own category: “Better security” Ed: yes, security could always be improved, but what specifically?”, multi-tenant clusters Ed: RBAC helps a tiny bit there, execution_date is confusing to new-comers, Airflow should be on the Amazon Marketplace, etc.