Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update local MWAA to 2.10 #25

Merged
merged 5 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
## Note
Starting from Airflow version 2.9, MWAA has open-sourced the original Docker image used in our production deployments. You can refer to our open-source image repository at https://github.com/aws/amazon-mwaa-docker-images to create a local environment identical to that of MWAA.
You can also continue to use the MWAA Local Runner for testing and packaging requirements for all Airflow versions supported on MWAA.

# About aws-mwaa-local-runner

This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.

*Please note: MWAA/AWS/DAG/Plugin issues should be raised through AWS Support or the Airflow Slack #airflow-aws channel. Issues here should be focused on this local-runner repository.*
_Please note: MWAA/AWS/DAG/Plugin issues should be raised through AWS Support or the Airflow Slack #airflow-aws channel. Issues here should be focused on this local-runner repository._

_Please note: The dynamic configurations which are dependent on the class of an environment are
aligned with the Large environment class in this repository._

## About the CLI

Expand All @@ -14,7 +20,7 @@ The CLI builds a Docker container image locally that’s similar to a MWAA produ
```text
dags/
example_lambda.py
example_dag_with_taskflow_api.py
example_dag_with_taskflow_api.py
example_redshift_data_execute_sql.py
docker/
config/
Expand All @@ -34,7 +40,7 @@ docker/
Dockerfile
plugins/
README.md
requirements/
requirements/
requirements.txt
.gitignore
CODE_OF_CONDUCT.md
Expand Down Expand Up @@ -102,7 +108,7 @@ The following section describes where to add your DAG code and supporting files.

#### Requirements.txt

1. Add Python dependencies to `requirements/requirements.txt`.
1. Add Python dependencies to `requirements/requirements.txt`.
2. To test a requirements.txt without running Apache Airflow, use the following script:

```bash
Expand All @@ -117,7 +123,7 @@ Collecting aws-batch (from -r /usr/local/airflow/dags/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/5d/11/3aedc6e150d2df6f3d422d7107ac9eba5b50261cf57ab813bb00d8299a34/aws_batch-0.6.tar.gz
Collecting awscli (from aws-batch->-r /usr/local/airflow/dags/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/07/4a/d054884c2ef4eb3c237e1f4007d3ece5c46e286e4258288f0116724af009/awscli-1.19.21-py2.py3-none-any.whl (3.6MB)
100% |████████████████████████████████| 3.6MB 365kB/s
100% |████████████████████████████████| 3.6MB 365kB/s
...
...
...
Expand All @@ -136,7 +142,7 @@ For example usage see [Installing Python dependencies using PyPi.org Requirement

#### Custom plugins

- There is a directory at the root of this repository called plugins.
- There is a directory at the root of this repository called plugins.
- In this directory, create a file for your new custom plugin.
- Add any Python dependencies to `requirements/requirements.txt`.

Expand Down Expand Up @@ -165,7 +171,7 @@ The following section contains common questions and answers you may encounter wh
### Can I test execution role permissions using this repository?

- You can setup the local Airflow's boto with the intended execution role to test your DAGs with AWS operators before uploading to your Amazon S3 bucket. To setup aws connection for Airflow locally see [Airflow | AWS Connection](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html)
To learn more, see [Amazon MWAA Execution Role](https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html).
To learn more, see [Amazon MWAA Execution Role](https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html).
- You can set AWS credentials via environment variables set in the `docker/config/.env.localrunner` env file. To learn more about AWS environment variables, see [Environment variables to configure the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) and [Using temporary security credentials with the AWS CLI](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html#using-temp-creds-sdk-cli). Simply set the relevant environment variables in `.env.localrunner` and `./mwaa-local-env start`.

### How do I add libraries to requirements.txt and test install?
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.8.1
2.10.1
6 changes: 3 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ LABEL maintainer="amazon"

# Airflow
## Version specific ARGs
ARG AIRFLOW_VERSION=2.8.1
ARG WATCHTOWER_VERSION=3.0.1
ARG PROVIDER_AMAZON_VERSION=8.16.0
ARG AIRFLOW_VERSION=2.10.1
ARG WATCHTOWER_VERSION=3.3.1
ARG PROVIDER_AMAZON_VERSION=8.28.0

## General ARGs
ARG AIRFLOW_USER_HOME=/usr/local/airflow
Expand Down
26 changes: 22 additions & 4 deletions docker/config/airflow.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ sensitive_var_conn_names =
# Task Slot counts for ``default_pool``. This setting would not have any effect in an existing
# deployment where the ``default_pool`` is already created. For existing deployments, users can
# change the number of slots using Webserver, API or the CLI
default_pool_task_slot_count = 10000
default_pool_task_slot_count = 200

[database]
# Collation for ``dag_id``, ``task_id``, ``key`` columns in case they have different encoding.
Expand Down Expand Up @@ -342,7 +342,7 @@ backend = airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBac
# See documentation for the secrets backend you are using. JSON is expected.
# Example for AWS Systems Manager ParameterStore:
# ``{{"connections_prefix": "/airflow/connections", "profile_name": "default"}}``
backend_kwargs = {"connections_prefix" : "airflow-prod/connection", "variables_prefix" : "airflow-prod/variable", "config_prefix": "airflow-prod/config"}
backend_kwargs = {"connections_prefix" : "airflow-prod/connection", "variables_prefix" : "airflow-prod/variable", "config_prefix": "airflow-prod/config", "connections_lookup_pattern":"^(?!aws_default$).*$"}

[cli]
# In what way should the cli access the API. The LocalClient will use the
Expand Down Expand Up @@ -815,7 +815,7 @@ catchup_by_default = True
# complexity of query predicate, and/or excessive locking.
# Additionally, you may hit the maximum allowable query length for your db.
# Set this to 0 for no limit (not advised)
max_tis_per_query = 512
max_tis_per_query = 16

# Should the scheduler issue ``SELECT ... FOR UPDATE`` in relevant queries.
# If this is set to False then you should not run more than a single
Expand All @@ -832,7 +832,7 @@ max_dagruns_per_loop_to_schedule = 20
# Should the Task supervisor process perform a "mini scheduler" to attempt to schedule more tasks of the
# same DAG. Leaving this on will mean tasks in the same DAG execute quicker, but might starve out other
# dags in some circumstances
schedule_after_task_execution = True
schedule_after_task_execution = False

# The scheduler can run multiple processes in parallel to parse dags.
# This defines how many processes will run.
Expand Down Expand Up @@ -1031,3 +1031,21 @@ shards = 5

# comma separated sensor classes support in smart_sensor.
sensors_enabled = NamedHivePartitionSensor

[usage_data_collection]
# Airflow integrates `Scarf <https://about.scarf.sh/>`__ to collect basic platform and usage data
# during operation. This data assists Airflow maintainers in better understanding how Airflow is used.
# Insights gained from this telemetry are critical for prioritizing patches, minor releases, and
# security fixes. Additionally, this information supports key decisions related to the development road map.
# Check the FAQ doc for more information on what data is collected.
#
# Deployments can opt-out of analytics by setting the ``enabled`` option
# to ``False``, or the ``SCARF_ANALYTICS=false`` environment variable.
# Individual users can easily opt-out of analytics in various ways documented in the
# `Scarf Do Not Track docs <https://docs.scarf.sh/gateway/#do-not-track>`__.

# Enable or disable usage data collection and sending.
#
# Variable: AIRFLOW__USAGE_DATA_COLLECTION__ENABLED
#
enabled = False
Loading
Loading