0X-1 踩坑記錄
1、
版本信息:
Python3.6
django-celery3.3
rabbitmq 3.6.10-1
使用上述版本運行任務會碰到以下問題:
File "/usr/local/lib/python3.6/dist-packages/celery/backends/amqp.py", line 201, in drain_events
if timeout and now() - time_start >= timeout:
TypeError: '>=' not supported between instances of 'float' and 'str'
因爲python2支持fload和str進行比較,而python3不支持了,會拋出異常。
以上分析錯誤,出現問題的真實原因在於使用@app.task裝飾器包裹的函數在執行任務後會返回<class 'celery.result.AsyncResult'>對象,而不是你在函數中返回的值,如果服務器直接將該對象返回會報錯。
2、2019-7-6踩坑
場景描述:投放兩個任務對數據表同一條記錄的不同字段進行更新。問題在於兩個任務要不是A任務沒有更新成功,要不就是B任務沒有更新成功,總是有字段沒有更新成功。該問題在Worker數量>1纔會出現,因爲Worker數量等於1時,所有的Task是線性執行的,存在多個Worker的話,多個任務會並行執行。
後經查閱資料發現,這是由於Django ORM的save方法引起的,當進行了UPDATE操作後,save的時候UPDATE的是這條記錄的所有字段,而不是更新的那個字段。如果別的字段在該事務提交之前被更新了,那麼該事務提交的時候將會把髒數據更新到數據庫。
解決方法:save的時候指定更新的字段,僅僅更新那個字段
obj.save(update_fields=['name'])
傳送門:https://blog.csdn.net/yongche_shi/article/details/49096043
0X00 什麼是Celery
任務隊列是一種在線程或機器間分發任務的機制。
消息隊列的輸入是工作的一個單元,稱爲任務,獨立的職程(Worker)進程持續監視隊列中是否有需要處理的新任務。
Celery 用消息通信,通常使用中間人(Broker)在客戶端和職程間斡旋。這個過程從客戶端向隊列添加消息開始,之後中間人把消息派送給職程。
Celery 系統可包含多個職程和中間人,以此獲得高可用性和橫向擴展能力。
Celery 是用 Python 編寫的,但協議可以用任何語言實現。迄今,已有 Ruby 實現的 RCelery 、node.js 實現的 node-celery 以及一個 PHP 客戶端 ,語言互通也可以通過 using webhooks 實現。
0X01 DEMO
1.編寫一個應用,tasks.py
from celery import Celery
app = Celery('tasks', broker='amqp://guest@localhost//')
# 'tasks'爲當前模塊名稱,broker指定所使用的消息中間件
@app.task
def add(x, y):
return x + y
2.運行Worker服務器
celery worker -A tasks -l info
3.運行了Worker服務器後,tasks.py會被加入到python sys_path,可以直接導入使用
from tasks import add
result = add.delay(4, 4)
"""
這個任務已經由之前啓動的職程執行,並且你可以查看職程的控制檯輸出來驗證。
調用任務會返回一個 AsyncResult 實例,可用於檢查任務的狀態,等待任務完成或獲取返回值(如果任務失敗,則爲異常和回溯)。 但這個功能默認是不開啓的,你需要設置一個 Celery 的結果後端,下一節將會詳細介紹。
"""
result.ready() # 查看任務是否完成
result.result # 獲取任務執行結果
4.從py文件更新celery配置
app.config_from_object('celeryconfig')
配置文件celeryconfig.py
BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json']
CELERY_ENABLE_UTC = True
驗證配置文件是否合法:
python -m celeryconfig
0X02 使用django-celery
1.安裝
apt-get install rabbitmq-server
pip install celery
pip install django-celery
2.加載
INSTALLED_APPS = [
'djcelery',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
]
0X03 使用django-celery-beat
一個DEMO:https://github.com/celery/celery/tree/master/examples/django/
celery-4.3.0
django-celery-beat-1.5.0
1、同步數據庫
python3 manage.py migrate
2、運行WebServer
3、運行celery worker
python3 -m celery worker -A celery_test -l info --uid=113 --gid=116
celery -A celery_test beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach # --detach後臺運行
使用過程中發現BUG太多,暫不使用。可在Linux層面使用crontab來實現週期任務
0X04 最新配置
僅使用celery:
pip install celery==4.3.0
# BROKER和BACKEND可以使用redis和RabbitMQ
yum install rabbitmq-server
# yum install redis
不使用django-celery的原因:
安裝django-celery會默認安裝3.1版本的celery,而celery最新版爲4.3,爲了減少BUG,使用最新版比較妥當。而且使用celery已經可以滿足調用任務隊列的需求,不是太需要django層面的封裝。django-celery-beat倒是比較有用,支持週期性下發任務,並可以改變週期。
目錄結構,Django項目:
celery_test/
├── celery_test
│ ├── celery_config.py # celery配置文件
│ ├── celery.py # celery實例APP
│ ├── __init__.py # 在init加載celery,讓celery跟隨Django啓動
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── main
│ ├── admin.py
│ ├── apps.py
│ ├── __init__.py
│ ├── migrations
│ │ └── __init__.py
│ ├── models.py
│ ├── tasks.py # 任務,放在每個APP目錄下的tasks.py文件中
│ ├── tests.py
│ └── views.py
├── manage.py
1、celery_config.py
# -*- coding: utf-8 -*-
# @Time : 2019/7/3 17:17
# @Author : Zcs
# @File : celery_config.py
CELERY_BROKER_URL= 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_RESULT_BACKEND = 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_TIMEZONE = 'Asia/Shanghai' # 時區設置
CELERY_TASK_SERIALIZER = 'pickle' # 任務序列化器
CELERY_RESULT_SERIALIZER = 'pickle' # 結果序列化器
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERY_RESULT_EXPIRES = 3600
# CELERY_WORKER_LOG_FORMAT = '%(asctime)s [%(module)s %(levelname)s] %(message)s'
# CELERY_WORKER_TASK_LOG_FORMAT = '%(task_id)s %(task_name)s %(message)s'
CELERY_WORKER_TASK_LOG_FORMAT = '%(message)s'
CELERY_WORKER_LOG_FORMAT = '%(message)s'
CELERY_TASK_EAGER_PROPAGATES = True
CELERY_WORKER_REDIRECT_STDOUTS = True
CELERY_WORKER_REDIRECT_STDOUTS_LEVEL = "INFO"
# CELERY_WORKER_HIJACK_ROOT_LOGGER = True
CELERY_WORKER_MAX_TASKS_PER_CHILD = 40
CELERY_TASK_SOFT_TIME_LIMIT = 3600
詳細配置:http://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERYD_CONCURRENCY
2、celery.py
# -*- coding: utf-8 -*-
# @Time : 2019/6/28 9:54
# @Author : Zcs
# @File : celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, platforms
from celery_test import settings
# 支持使用root用戶啓動worker
platforms.C_FORCE_ROOT = True
# 解決win64下的一個bug需要用到該設置
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')
# 設置celery命令行默認環境變量,不添加該變量celery會找不到各APP的tasks
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'celery_test.settings')
# 創建celery實例
app = Celery('celery_test')
# 配置參數,使用celery_config
app.config_from_object('celery_test.celery_config')
# 設置配置以CELERY_開頭
app.namespace = 'CELERY'
# 加載所有APP中的tasks.py
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
# @shared_task 裝飾器能讓你在沒有具體的 Celery 實例時創建任務
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
3、__init__.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
__all__ = ('celery_app',)
4、tasks.py
# -*- coding: utf-8 -*-
# @Time : 2019/6/27 14:44
# @Author : Zcs
# @File : tasks.py
from celery_test.celery import app
import time
@app.task # 該裝飾器將目標函數變成任務,返回值爲任務的uniq_id,寫在函數中的返回值不會直接返回給調用者,而是在任務完成後返回到BACKEND
def send_mail(arg):
#time.sleep(20)
return arg
5、運行項目
python3 manage.py runserver 0.0.0.0:8000
celery worker -A project_name -l info --uid=993 --gid=989 # 啓動worker,-A指定celery app目錄,在本例中應爲"celery_test"
flower --port=5555 --broker='redis://127.0.0.1:6379/2' # 運行celery監控,瀏覽器訪問5555端口即可
6.調用任務
from django.views import View
from .tasks import send_mail
from django.http import HttpResponse
class main_view(View):
def get(self, request):
r = send_mail.delay('a') # 返回的r不爲Done,而是該任務的uniq_id
return HttpResponse(r)
運行celery worker的一些參數:
Examples:
$ celery worker --app=proj -l info
$ celery worker -A proj -l info -Q hipri,lopri
$ celery worker -A proj --concurrency=4
$ celery worker -A proj --concurrency=1000 -P eventlet
$ celery worker --autoscale=10,0
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Global Options:
-A APP, --app APP
-b BROKER, --broker BROKER
--result-backend RESULT_BACKEND
--loader LOADER
--config CONFIG
--workdir WORKDIR Optional directory to change to after detaching.
--no-color, -C
--quiet, -q
Worker Options:
-n HOSTNAME, --hostname HOSTNAME
Set custom hostname (e.g., 'w1@%h'). Expands: %h
(hostname), %n (name) and %d, (domain).
-D, --detach Start worker as a background process.
-S STATEDB, --statedb STATEDB
Path to the state database. The extension '.db' may be
appended to the filename. Default: None
-l LOGLEVEL, --loglevel LOGLEVEL
Logging level, choose between DEBUG, INFO, WARNING,
ERROR, CRITICAL, or FATAL.
-O OPTIMIZATION
--prefetch-multiplier PREFETCH_MULTIPLIER
Set custom prefetch multiplier value for this worker
instance.
Pool Options:
-c CONCURRENCY, --concurrency CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL, --pool POOL Pool implementation: prefork (default), eventlet,
gevent or solo.
-E, --task-events, --events
Send task-related events that can be captured by
monitors like celery events, celerymon, and others.
--time-limit TIME_LIMIT
Enables a hard time limit (in seconds int/float) for
tasks.
--soft-time-limit SOFT_TIME_LIMIT
Enables a soft time limit (in seconds int/float) for
tasks.
--max-tasks-per-child MAX_TASKS_PER_CHILD, --maxtasksperchild MAX_TASKS_PER_CHILD
Maximum number of tasks a pool worker can execute
before it's terminated and replaced by a new worker.
--max-memory-per-child MAX_MEMORY_PER_CHILD, --maxmemperchild MAX_MEMORY_PER_CHILD
Maximum amount of resident memory, in KiB, that may be
consumed by a child process before it will be replaced
by a new one. If a single task causes a child process
to exceed this limit, the task will be completed and
the child process will be replaced afterwards.
Default: no limit.
Queue Options:
--purge, --discard Purges all waiting tasks before the daemon is started.
**WARNING**: This is unrecoverable, and the tasks will
be deleted from the messaging server.
--queues QUEUES, -Q QUEUES
List of queues to enable for this worker, separated by
comma. By default all configured queues are enabled.
Example: -Q video,image
--exclude-queues EXCLUDE_QUEUES, -X EXCLUDE_QUEUES
List of queues to disable for this worker, separated
by comma. By default all configured queues are
enabled. Example: -X video,image.
--include INCLUDE, -I INCLUDE
Comma separated list of additional modules to import.
Example: -I foo.tasks,bar.tasks
Features:
--without-gossip Don't subscribe to other workers events.
--without-mingle Don't synchronize with other workers at start-up.
--without-heartbeat Don't send event heartbeats.
--heartbeat-interval HEARTBEAT_INTERVAL
Interval in seconds at which to send worker heartbeat
--autoscale AUTOSCALE
Enable autoscaling by providing max_concurrency,
min_concurrency. Example:: --autoscale=10,3 (always
keep 3 processes, but grow to 10 if necessary)
Daemonization Options:
-f LOGFILE, --logfile LOGFILE
Path to log file. If no logfile is specified, stderr
is used.
--pidfile PIDFILE Optional file used to store the process pid. The
program won't start if this file already exists and
the pid is still alive.
--uid UID User id, or user name of the user to run as after
detaching.
--gid GID Group id, or group name of the main group to change to
after detaching.
--umask UMASK Effective umask(1) (in octal) of the process after
detaching. Inherits the umask(1) of the parent process
by default.
--executable EXECUTABLE
Executable to use for the detached process.
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler.
Please note that there must only be one instance of
this service. .. note:: -B is meant to be used for
development purposes. For production environment, you
need to start celery beat separately.
-s SCHEDULE_FILENAME, --schedule-filename SCHEDULE_FILENAME, --schedule SCHEDULE_FILENAME
Path to the schedule database if running with the -B
option. Defaults to celerybeat-schedule. The extension
".db" may be appended to the filename. Apply
optimization profile. Supported: default, fair
--scheduler SCHEDULER
Scheduler class to use. Default is
celery.beat.PersistentScheduler
6、可以給redis或rabbitmq設置密碼登錄,改一下配置即可
修改rabbitmq默認guest用戶的密碼(默認無密碼,但只允許本地登錄):
rabbitmqctl change_password guest your_password
修改後更改celery_config.py中的配置:
CELERY_BROKER_URL= 'amqp://guest:your_password@localhost//'
CELERY_RESULT_BACKEND = 'amqp://guest:your_password@localhost//'
redis:
CELERY_BROKER_URL= 'redis://:[email protected]:6379/2'
CELERY_RESULT_BACKEND = 'redis://:[email protected]:6379/2'
7、更多的需求
1)讓多個任務鏈式執行:
https://celery.readthedocs.io/en/latest/userguide/canvas.html#chains
>>> from celery import chain
>>> from proj.tasks import add, mul
>>> # (4 + 4) * 8 * 10
>>> res = chain(add.s(4, 4), mul.s(8), mul.s(10))
proj.tasks.add(4, 4) | proj.tasks.mul(8) | proj.tasks.mul(10)
https://www.cnblogs.com/wdliu/p/9517535.html
2)讓多個任務鏈式執行,但不傳遞上個任務的結果
chain,管道,chord默認都會將上個任務結果傳給下個任務,對於只想鏈式執行的任務,沒有必要傳遞結果,默認傳遞結果還會使你的參數不容易配置。所以使用si()函數完成異步調用,而不是s()。當然,這並不影響你傳遞自己想要傳遞的參數,EXP:
task = chain(stop.si(),
init.si(args),
start.si())
task()