How to set up Airflow

Apache Airflow is a workflow orchestrator for data pipelines. This guide shows a safe, minimal setup you can grow later.

1. Choose your installation path

If you want a quick local setup, use Docker Compose. If you are deploying to production, plan for a managed database and a proper executor.

Local (recommended for POC):

Docker
Postgres
LocalExecutor

Production (baseline):

Postgres or MySQL
Redis (for Celery)
CeleryExecutor or KubernetesExecutor

2. Local setup with Docker Compose

Create a new folder and copy the official Airflow docker-compose.yaml.
Set the required environment variables in a .env file.
Initialize the database and create an admin user.

Example commands:

mkdir airflow-poc
cd airflow-poc
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
mkdir -p ./dags ./logs ./plugins

echo "AIRFLOW_UID=50000" > .env

docker compose up airflow-init

docker compose up -d

Once it is running, open http://localhost:8080 and log in.

3. Create your first DAG

Inside ./dags, create a simple DAG file:

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id="hello_airflow",
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    catchup=False,
) as dag:
    task = BashOperator(
        task_id="print_date",
        bash_command="date",
    )

You should see it in the UI and be able to trigger it.

4. Move toward production

Key upgrades:

Use a managed Postgres database.
Store logs in S3 or GCS.
Move to Celery or Kubernetes executor.
Add secrets management (Vault, AWS Secrets Manager).
Add monitoring and alerting.

5. Common pitfalls

Forgetting to disable catchup for non-backfill workloads.
Using SQLite in production.
Running too many parallel tasks without tuning resources.

If you want, I can provide a production-ready reference architecture next.

How to set up Airflow

1. Choose your installation path

2. Local setup with Docker Compose

3. Create your first DAG

4. Move toward production

5. Common pitfalls

SQL do Zero ao Avançado

Recommended Reading

How to Use the SUMPRODUCT Function in Google Spreadsheets: A Comprehensive Guide

How to install Google Analytics on your website

How to use IMPORTRANGE in Google Sheets

Orchestrated Data Pipelines vs. Serverless Data Pipelines: A Comprehensive Comparison

1. Choose your installation path

2. Local setup with Docker Compose

3. Create your first DAG

4. Move toward production

5. Common pitfalls

SQL do Zero ao Avançado

About the author

Recommended Reading

How to Use the SUMPRODUCT Function in Google Spreadsheets: A Comprehensive Guide

How to install Google Analytics on your website

How to use IMPORTRANGE in Google Sheets

Orchestrated Data Pipelines vs. Serverless Data Pipelines: A Comprehensive Comparison