1. 安装
airflow的安装非常简单,安装官方文档可以轻松完成,我是在macOS上安装的,Linux下应该是一样的,Windows就不清楚了。
# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=~/airflow
# install from pypi using pip
pip install apache-airflow
# initialize the database
airflow initdb
# start the web server, default port is 8080
airflow webserver -p 8080
- airflow 需要设置一个环境变量,默认路径是~/airflow,当然你也可以选择其他路径
- 使用pip安装
- 初始化数据库
- 启动web服务,默认端口号是8080
2.配置
运行完上面的安装命令后,airflow会默认创建环境变量$AIRFLOW_HOME
对应的文件目录,并创建文件“airflow.cfg”。你可以打开$AIRFLOW_HOME/airflow.cfg
或者在web端通过菜单Admin->Configuration
查看文件的内容。PID文件在$AIRFLOW_HOME/airflow-webserver.pid
目录下或者如果被系统共享的话在/run/airflow/webserver.pid
目录下。(我的是在$AIRFLOW_HOME/airflow-webserver.pid
这里)
airflow使用sqlite数据库,不需要你进行后端操作,可以快速的应用airflow。它与SequentialExecutor
一起工作,它会按顺序运行任务实例。虽然功能有限,但是利用它可以快速的开始体验airflow的功能,并且可以访问UI和命令行工具。
下面是几个命令,会触发几个任务实例。运行下面的命令,能在DAG中看到job的运行状态。
# run your first task instance
airflow run example_bash_operator runme_0 2015-01-01
# run a backfill over 2 days
airflow backfill example_bash_operator -s 2015-01-01 -e 2015-01-02
3.扩展包
airflow有很多扩展包,如下图:
subpackage | install command | enables |
---|---|---|
all | pip install apache-airflow[all] |
All Airflow features known to man |
all_dbs | pip install apache-airflow[all_dbs] |
All databases integrations |
async | pip install apache-airflow[async] |
Async worker classes for gunicorn |
devel | pip install apache-airflow[devel] |
Minimum dev tools requirements |
devel_hadoop | pip install apache-airflow[devel_hadoop] |
Airflow + dependencies on the Hadoop stack |
celery | pip install apache-airflow[celery] |
CeleryExecutor |
crypto | pip install apache-airflow[crypto] |
Encrypt connection passwords in metadata db |
druid | pip install apache-airflow[druid] |
Druid.io related operators & hooks |
gcp_api | pip install apache-airflow[gcp_api] |
Google Cloud Platform hooks and operators (using google-api-python-client) |
jdbc | pip install apache-airflow[jdbc] |
JDBC hooks and operators |
hdfs | pip install apache-airflow[hdfs] |
HDFS hooks and operators |
hive | pip install apache-airflow[hive] |
All Hive related operators |
kerberos | pip install apache-airflow[kerberos] |
kerberos integration for kerberized hadoop |
ldap | pip install apache-airflow[ldap] |
ldap authentication for users |
mssql | pip install apache-airflow[mssql] |
Microsoft SQL operators and hook, support as an Airflow backend |
mysql | pip install apache-airflow[mysql] |
MySQL operators and hook, support as an Airflow backend |
password | pip install apache-airflow[password] |
Password Authentication for users |
postgres | pip install apache-airflow[postgres] |
Postgres operators and hook, support as an Airflow backend |
qds | pip install apache-airflow[qds] |
Enable QDS (qubole data services) support |
rabbitmq | pip install apache-airflow[rabbitmq] |
Rabbitmq support as a Celery backend |
s3 | pip install apache-airflow[s3] |
S3KeySensor, S3PrefixSensor |
samba | pip install apache-airflow[samba] |
Hive2SambaOperator |
slack | pip install apache-airflow[slack] |
SlackAPIPostOperator |
vertica | pip install apache-airflow[vertica] |
Vertica hook support as an Airflow backend |
cloudant | pip install apache-airflow[cloudant] |
Cloudant hook |
redis | pip install apache-airflow[redis] |
Redis hooks and sensors |