
Apache Airflow has revolutionized the way organizations orchestrate complex data workflows. Created by Airbnb in 2014 and later donated to the Apache Software Foundation, Airflow provides a programmatic approach to authoring, scheduling, and monitoring workflows.
At its core, Airflow allows you to define your data pipelines as code using Python, making them maintainable, versionable, and testable. These workflows are represented as Directed Acyclic Graphs (DAGs), which consist of individual tasks and their dependencies. This structure ensures that tasks are executed in the correct order, with proper handling of dependencies, retries, and failures.
Organizations across industries rely on Airflow for:
What sets Airflow apart is its flexibility and extensibility. With hundreds of pre-built operators, hooks, and sensors, it easily integrates with cloud services, databases, messaging systems, and more. Its web UI provides real-time monitoring and troubleshooting capabilities, while its scheduler ensures reliable execution based on defined schedules or external triggers.
When developing Airflow DAGs, having a consistent environment that mirrors production is crucial. This is where Dev Containers come in. Dev Containers allow developers to use a Docker container as a full-featured development environment, providing:
Visual Studio Code’s Dev Container extension makes this process seamless, automatically connecting to the container and providing a fully-featured development experience within it.
Let’s explore how we can leverage Dev Containers for Airflow development, starting with a simple setup and then enhancing it for parallel task execution.
Before we dive into setting up Airflow with Dev Containers, make sure you have the following installed on your system:
You don’t need to install Python or Airflow directly on your machine, as we’ll be running everything inside containers. This is one of the key benefits of the Dev Container approach – it isolates dependencies from your local system.
Our first Dev Container provides a minimal setup to get started with Airflow. The configuration consists of three key files in the .devcontainer directory:
Here’s how our project structure looks for this simple setup:
my-airflow-project/
├── .devcontainer/
│ ├── Dockerfile
│ ├── devcontainer.json
│ ├── .env
│ └── requirements.txt
└── dags/
├── hello_world.py
└── parallel_tasks.py
Let’s look at how these files are configured:
FROM mcr.microsoft.com/devcontainers/python:3.10 RUN pip install "apache-airflow==2.10.5" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.5/constraints-3.10.txt" WORKDIR /app COPY requirements.txt requirements.txt RUN pip install --no-cache-dir -r requirements.txt --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.5/constraints-3.10.txt"
This Dockerfile starts with Microsoft’s Python 3.10 dev container image and installs Apache Airflow 2.10.5 with its dependencies.
{
"name": "Airflow devcontainer",
"dockerFile": "Dockerfile",
"runArgs": [
"--env-file",
".devcontainer/.env"
],
"postStartCommand": {
"start airflow": "nohup bash -c 'airflow standalone &'",
"create airflow user": "airflow users create --username airflow --password airflow --role Admin --email '' --firstname '' --lastname ''"
},
"forwardPorts": [
8080
]
}
As soon as the devcontainer is started, the devcontainer will:
AIRFLOW__CORE__LOAD_EXAMPLES=false AIRFLOW__CORE__DAGS_FOLDER=dags
These environment variables configure Airflow to:
This file can contain additional Python dependencies for your Airflow tasks. For now, we can leave it empty (but still create it).
Let’s create a simple DAG to test our setup:
from airflow import DAG
from airflow.operators.python import PythonOperator
def my_task():
print("Hello, Airflow!")
# Define the DAG
with DAG(
dag_id="my_dag",
schedule=None, # Run manually
) as dag:
task = PythonOperator(
task_id="print_hello",
python_callable=my_task,
)
This DAG contains a single task that prints «Hello, Airflow!» when executed.
After starting our Dev Container (CTRL+SHIFT+P -> Dev Containers: Rebuild and Reopen in Container) and accessing the Airflow UI at http://localhost:8080 (username: airflow, password: airflow), we can trigger our DAG manually and see it run successfully.

While our simple setup works for basic DAGs, it has a significant limitation: it can’t execute tasks in parallel. By default, Airflow uses SQLite as its metadata database, which doesn’t support multiple concurrent connections due to its locking mechanism. To demonstrate this limitation, let’s create a DAG with multiple tasks:
import time
from airflow import DAG
from airflow.operators.python import PythonOperator
def my_task():
time.sleep(3)
print("Hello, Airflow!")
with DAG(
dag_id="my_parallel_dag",
schedule=None, # Run manually
) as dag:
for i in range(10):
task = PythonOperator(
task_id=f"print_hello_{i}",
python_callable=my_task,
)
When running this DAG in our simple setup (it might take a few minutes until airflow picks up the new DAG), the tasks will execute sequentially, one after another, rather than in parallel. For workflows with many tasks, this can significantly increase execution time.

To overcome this limitation, we need to replace SQLite with a database that supports concurrent connections, like PostgreSQL. Our enhanced Dev Container adds PostgreSQL as a sidecar service using Docker Compose.
Here’s the updated project structure for our enhanced setup:
my-airflow-project/
├── .devcontainer/
│ ├── Dockerfile
│ ├── devcontainer.json
│ ├── .env
│ ├── docker-compose.yaml
│ ├── init.sql
│ └── requirements.txt
└── dags/
├── hello_world.py
└── parallel_tasks.py
name: airflow
services:
postgres:
image: postgres
environment:
- POSTGRES_PASSWORD=root
- POSTGRES_USER=postgres
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
CREATE DATABASE airflow_db; CREATE USER airflow WITH PASSWORD 'airflow'; ALTER DATABASE airflow_db OWNER TO airflow; GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow; GRANT ALL ON SCHEMA public TO airflow;
This SQL script creates an Airflow-specific database and user with appropriate permissions.
AIRFLOW__CORE__LOAD_EXAMPLES=false AIRFLOW__CORE__DAGS_FOLDER=dags AIRFLOW__CORE__EXECUTOR=LocalExecutor AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow_db
The key changes here:
AIRFLOW__CORE__EXECUTOR=LocalExecutor to enable parallel task execution{
"name": "Airflow devcontainer",
"dockerFile": "Dockerfile",
"runArgs": [
"--env-file",
".devcontainer/.env",
"--network=airflow_default"
],
"initializeCommand": {
"start db": "docker compose -f ${localWorkspaceFolder}/.devcontainer/docker-compose.yaml up -d --remove-orphans"
},
"postStartCommand": {
"start airflow": "nohup bash -c 'airflow standalone &'",
"create airflow user": "airflow users create --username airflow --password airflow --role Admin --email '' --firstname '' --lastname ''"
},
"forwardPorts": [
8080
]
}
Key enhancements:
initializeCommandWe need to add the PostgreSQL Python client to our requirements.txt:
psycopg2-binary
After rebuilding our devcontainer (CTRL+SHIFT+P -> Dev Containers: Rebuild Container) With our enhanced setup, we can now run the same multi-task DAG from earlier, but this time the tasks will execute in parallel. When triggered in the Airflow UI, we’ll see multiple tasks running simultaneously, significantly reducing the overall execution time.
You can verify this by checking the «Grid» view in the Airflow UI for your parallel DAG, where you’ll notice multiple tasks running concurrently rather than sequentially.


Dev Containers provide an excellent way to standardize development environments for Apache Airflow. We’ve seen how to:
This approach brings several benefits:
With this setup, you can develop and test even complex Airflow DAGs with parallel tasks in an environment that closely resembles production, all within the comfort of your favorite IDE.
For more advanced scenarios, you might consider extending this setup to include additional services like Redis for the Celery executor, or even integrating with Kubernetes for dynamic task allocation.
Schreiben Sie einen Kommentar