Apache Airflow
Certifications
ToDo
Link to certifications
Install Locally
Astro CLI can be used to install Airflow locally.
To do it, make sure you have Docker and Brew installed and running on your machine.
Step 1: Install astro CLI
Install the latest version of the Astro CLI:
brew install astro
Step 2: Create astro project
astro dev init
This command, will generate the following project structure:
.
├── .env # Local environment variables
├── dags # Where your DAGs go
│ ├── example-dag-basic.py # Example DAG that showcases a simple ETL data pipeline
│ └── example-dag-advanced.py # Example DAG that showcases more advanced Airflow features, such as the TaskFlow API
├── Dockerfile # For the Astro Runtime Docker image, environment variables, and overrides
├── include # For any other files you'd like to include
├── plugins # For any custom or community Airflow plugins
│ └── example-plugin.py
├── tests # For any DAG unit test files to be run with pytest
│ └── test_dag_integrity.py # Test that checks for basic errors in your DAGs
├── airflow_settings.yaml # For your Airflow connections, variables and pools (local only)
├── packages.txt # For OS-level packages
└── requirements.txt # For Python packages
Step 3: Start astro (Airflow) locally
astro dev start
After that, you can access airflow in your browser at http://localhost:8080
.
WARNING
You must restart your environment to apply changes from any of the following files:
packages.txt or Dockerfile or requirements.txt or airflow_settings.yaml
astro dev restart
Step 4: Stop or Kill Airflow
WARNING
astro dev stop
command stop the containers execution but keep the data (volumes).
astro dev kill
command stop and remove all containers, including all data (volumes).
Install on kubernetes
We can use airflow helm chart to install airflow on Kubernetes.
Helm chart used to deploy Airflow on K8s:
Deploy on ArgoCD
The helm chart need some changes to be deployed on ArgoCD. Inside values.yaml, when deploying into a kubernetes cluster, we need to change the following value:
WARNING
If you do not change these values, the deployment will fail.
Do not forget to change other values you may need to.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: airflow
namespace: argocd
spec:
project: default
source:
repoURL: ${REPO_URL}
targetRevision: ${REPO_BRANCH}
path: ${REPO_PATH}
directory:
recursive: true
helm:
passCredentials: true
chart: airflow
parameters:
- name: executor
value: KubernetesExecutor
- name: createUserJob.useHelmHooks
value: "false"
- name: migrateDatabaseJob.useHelmHooks
value: "false"
- name: "migrateDatabaseJob.jobAnnotations.argocd.argoproj.io/hook"
value: "Sync"
Install Managed Airflow on AWS, GCP, Azure
- AWS -> MWAA
- GCP -> Cloud Composer
- AZURE -> Azure Data Factory
Important Concepts
Learn more about important Apache Airflow concepts: