The average score was 8.3, which is pretty good (though the channels I used to find respondents is probably going to impart a large chunk of selection bias to the results. Still, a pretty good figure, and there were some “detractors”, and I’m glad they responded too!)
|Stay about the same||28 responses||18.4%|
|Not sure yet||4 responses||2.6%|
To be able to summarize these answers in any useful format I’ve had to try and classify the responses given. For each response I classified it as against the following categories and sub-categories. In total 70% of the responses had
Ed: Comments about the CPU use of the scheduler when running, or the time it takes the scheduler to queue tasks
The scheduler currently re-parses the DAG files in a fairly tight loop, which can be a bit heavy on external systems if you have a dynamic DAG.
General requests for “improve subdags”. Ed: I agree, and I’m surprised more people didn’t ask for this.
Colour blind/high contrast mode. General accessibility improvements. Absolutely, we should be better about this.
Lots of comments around asking for a “Better UI” or a “Cleaner UI”
Comments about the UI being slow - especially for large DAGs or a large number of DAGs.
The Web server shouldn’t have to parse the DAGs. Ed: Agreed, and AIP-12 will go a large way towards that
Having to refresh the page to see tasks changing state is so 2001 ;)
Ed: this would make a huge difference to the feel of the UI, but might need larger architectural changes to make happen. Sadly
Requests to make it easier to see that state of the whole Airflow system from within the UI - i.e. helping workout why tasks in a DAG might not be progressing etc.
Ed: people after my own heart!
Better handling of Timezones in the UI, specifically better support for local timezone. Ed: not clear if “local” means the viewers timezone, or just the configured timezone - i.e. do people access Airflow from multiple TZs?
Comments that didn’t fit else where - things like parameterized DAG trigger from UI, more control, keyboard shortcuts, grouping/collapsing rows
The “core” of Airflow, excluding the scheduler or the webserver.
Requests for clearer defined plugin architecture, splitting Airflow into core and plugins. Ed: they may not need to be plugins to split, just python modules would work
Requests for more operators/sensors. One good request was to have “composable” operators to explosion of XtoY operators. Ed: this would be nice! If someone wants to start an Airflow Improvement Proposal for this that would be ace.
Three people commented about how long it takes to get PRs reviewed or merged. Ed: Absolutely, and we’d love to get through them quicker, but there is only so much time the volunteer-based committers can spend on this in a day without getting fired ;)
A better way of declaring cross-dag dependencies. Ed: None of the comments specifically said what the current ExternalTaskSensor was lacking.
The ability to sensors to respond to external events without polling. Ed: the new mode="reschedule" on sensors goes a little way to helping with this, but this could still be improved.
Asking for better handling of DAGs as they change over time.Ed: Again AIP-12 will go a large way towards that
Various DAG API changes such as more flexibility in retry, SLA, timeout. Better isolation between DAGs Ed: PythonVirtualEnvOperator might help a little bit with this.
Lots of requests for better docs Ed: yes please!*, many mentioning “best practice” around deployment, upgrade process etc. Clearer write ups of what new features each release brings.
Better/tighter Kubernetes integration. Easier deployments of DAGs on Kube. Further customization of pods that are run.
Ed: Some comments like “integration with Kubernetes” probably ties back to the previous point about docs - we have a Kubernetes executor and PodOperators too. Maybe people don’t know about them
Ed: these are II’m afraid low-priority for the Airflow core team. One of the selling points of Airflow is that the DAGs are Python code. This could be added via a plugin though
A request to describe DAGs in YAML/JSON and then submit via the API - helpful for non-Python teams. Ed: JustEat described something similar (without the API) in their Talk ait the London Airflow Meetup #1)
Various “UI to edit from Web”, “drag-and-drop” etc.
Calls for better/more fully-featured HTTP API - anything you can do via Web UI or CLI should be possible via HTTP API too. Ed: Totaly!
Three people asked for “ways to test DAGs locally” or variations of that. Ed: Bas at GoDataDrvien wrote https://blog.godatadriven.com/testing-and-debugging-apache-airflow which provides some useful tips.
Things that didn’t fit elsewhere, or didn’t deserve their own category: “Better security” Ed: yes, security could always be improved, but what specifically?”, multi-tenant clusters Ed: RBAC helps a tiny bit there, execution_date is confusing to new-comers, Airflow should be on the Amazon Marketplace, etc.