Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 9 additions & 14 deletions .envExample
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIRFLOW_ADMIN_MAIL=<fill here>
AIRFLOW_ADMIN_FIRSTNAME=<fill here>
AIRFLOW_ADMIN_NAME=<fill here>
AIRFLOW_ADMIN_PASSWORD=<fill here>

Those env vars are not used anymore from what I see in docker-compose.yml, but I'd advice to add the new login vars to .envExample so we know the login details by default + where to update them without going into the docker compose file:

_AIRFLOW_WWW_USER_USERNAME: <username to fill> # by default "airflow"
_AIRFLOW_WWW_USER_PASSWORD: <password to fill> # by default "airflow"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm weird because I have not set the two ones in the docker-compose, but the ones in .env instead, and they are the ones that worked in the UI 🤔

Original file line number Diff line number Diff line change
@@ -1,24 +1,19 @@
AIRFLOW_UID=1000
POSTGRES_USER=<fill here>
POSTGRES_PASSWORD=<fill here>
POSTGRES_DB=<fill here>
AIRFLOW_ADMIN_MAIL=<fill here>
AIRFLOW_ADMIN_FIRSTNAME=<fill here>
AIRFLOW_ADMIN_NAME=<fill here>
AIRFLOW_ADMIN_PASSWORD=<fill here>
AIRFLOW__SMTP__SMTP_HOST=mail.data.gouv.fr
_AIRFLOW_WWW_USER_USERNAME=<fill here>
_AIRFLOW_WWW_USER_PASSWORD=<fill here>
AIRFLOW_POSTGRES_PORT=5432
AIRFLOW_WEBSERVER_PORT=8080
AIRFLOW_LOG_SERVER_PORT=5894
AIRFLOW__SMTP__SMTP_PORT=587
AIRFLOW__WEBSERVER__BASE_URL=http://localhost:$AIRFLOW_WEBSERVER_PORT
AIRFLOW__SMTP__SMTP_USER=<fill here>
AIRFLOW__SMTP__SMTP_PASSWORD=<fill here>
AIRFLOW__SMTP__SMTP_MAIL_FROM=<fill here>
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@postgres:5432/$POSTGRES_DB
AIRFLOW__CORE__FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs=
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgres+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@postgres:5432/$POSTGRES_DB
AIRFLOW_CONN_METADATA_DB=postgres+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@postgres:5432/$POSTGRES_DB
AIRFLOW_VAR__METADATA_DB_SCHEMA=$POSTGRES_DB
AIRFLOW_ENV_TYPE=demo
AIRFLOW_ENV_NAME=test

AIRFLOW_ENV=dev
AIRFLOW_DAG_HOME=/opt/airflow/dags
AIRFLOW_DAG_TMP=/tmp/
DATAGOUV_URL=https://demo.data.gouv.fr
DATAGOUV_SECRET_API_KEY=<fill here>
9 changes: 0 additions & 9 deletions 1_prepareDirs.sh

This file was deleted.

4 changes: 0 additions & 4 deletions 2_prepare_env.sh

This file was deleted.

36 changes: 4 additions & 32 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,4 @@

FROM apache/airflow:2.10.5-python3.12

USER root

ARG AIRFLOW_HOME=/opt/airflow

ADD dags /opt/airflow/dags

ADD airflow.cfg /opt/airflow/airflow.cfg

USER airflow

RUN pip install --upgrade pip
FROM apache/airflow:3.1.7-python3.12

USER root

Expand All @@ -28,22 +15,7 @@ RUN apt-get install nano -y
RUN apt-get install jq -y
RUN apt-get install libmagic1 -y

RUN chown -R "airflow:root" /opt/airflow/

ADD ssh /home/airflow/.ssh/
RUN chown -R airflow:root /home/airflow/.ssh

USER airflow

RUN pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org boto3


# USER ${AIRFLOW_UID}
USER airflow

ADD requirements.txt /requirements.txt

RUN pip install -r /requirements.txt

RUN git config --global user.email "your email"
RUN git config --global user.name "your username"
RUN pip install --upgrade pip
ADD requirements.txt .
RUN pip install apache-airflow==${AIRFLOW_VERSION} -r requirements.txt
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,39 @@
# Infrastructure Airflow

Ce repository a pour objectif de mettre en place rapidement une infrastructure Airflow permettant à chacun de tester son DAG avant mise en production.

L'infrastructure actuelle est basée sur du LocalExecutor (le scheduler, le webserver et worker sont hébergés sur le même container)
Ce repository a pour objectif de mettre en place rapidement une infrastructure Airflow permettant à chacun.e de tester son DAG avant mise en production. Il est basé sur le [guide d'initialisation](https://airflow.apache.org/docs/apache-airflow/3.1.7/howto/docker-compose/index.html) d'une instance Airflow (s'y référer pour plus de détails). Version actuelle : 3.1.7

## Installation

```
```bash
git clone git@github.com:etalab/data-engineering-stack.git
cd data-engineering-stack

# Create directories necessary for Airflow to work
./1_prepareDirs.sh
./prepareDirs.sh

# Prepare .env file
./2_prepare_env.sh
nano .env
# Edit POSTGRES_USER ; POSTGRES_PASSWORD ; POSTGRES_DB ; AIRFLOW_ADMIN_MAIL ; AIRFLOW_ADMIN_FIRSTNAME ; AIRFLOW_ADMIN_NAME ; AIRFLOW_ADMIN_PASSWORD
# Prepare .env file:
# Create a .env file from the .envExample and fill in the required variables.
# You may also add more variables there for specific DAGs to run.

# For MacOS with ARM:
# export DOCKER_DEFAULT_PLATFORM="linux/amd64"
# Initialize
docker compose up airflow-init

# Launch services
docker-compose up --build -d
docker compose up -d

# After few seconds, you can connect to http://localhost:<AIRFLOW_WEBSERVER_PORT> with login : AIRFLOW_ADMIN_MAIL and password : AIRFLOW_ADMIN_PASSWORD
# If you have kept the default values: http://localhost:8080 and airflow:airflow as user:pwd
```

# After few seconds, you can connect to http://localhost:8080 with login : AIRFLOW_ADMIN_MAIL and password : AIRFLOW_ADMIN_PASSWORD
## Import our DAGs
```bash
cd dags
git clone git@github.com:datagouv/datagouvfr_data_pipelines.git
```

## Refresh dags

```
```bash
# Airflow used to have a little time before dag refreshing when dag is created. You can force refreshing with :
./refreshBagDags.sh
```

## Connections

Connections can be created manually or with python scripts `createConn.py` (using Airflow API) inside each projects. You need also to add your ssh key inside `ssh` folder of repository for the container to be able to see it in `/home/airflow/.ssh/` folder of container.
Loading