Django使用Celery

0X-1 踩坑記錄

1、

版本信息:
Python3.6
django-celery3.3
rabbitmq 3.6.10-1

使用上述版本運行任務會碰到以下問題:

  File "/usr/local/lib/python3.6/dist-packages/celery/backends/amqp.py", line 201, in drain_events
    if timeout and now() - time_start >= timeout:
TypeError: '>=' not supported between instances of 'float' and 'str'

因爲python2支持fload和str進行比較,而python3不支持了,會拋出異常。

以上分析錯誤,出現問題的真實原因在於使用@app.task裝飾器包裹的函數在執行任務後會返回<class 'celery.result.AsyncResult'>對象,而不是你在函數中返回的值,如果服務器直接將該對象返回會報錯。

2、2019-7-6踩坑

場景描述:投放兩個任務對數據表同一條記錄的不同字段進行更新。問題在於兩個任務要不是A任務沒有更新成功,要不就是B任務沒有更新成功,總是有字段沒有更新成功。該問題在Worker數量>1纔會出現,因爲Worker數量等於1時,所有的Task是線性執行的,存在多個Worker的話,多個任務會並行執行。

後經查閱資料發現,這是由於Django ORM的save方法引起的,當進行了UPDATE操作後,save的時候UPDATE的是這條記錄的所有字段,而不是更新的那個字段。如果別的字段在該事務提交之前被更新了,那麼該事務提交的時候將會把髒數據更新到數據庫。

解決方法:save的時候指定更新的字段,僅僅更新那個字段

obj.save(update_fields=['name'])

傳送門:https://blog.csdn.net/yongche_shi/article/details/49096043

0X00 什麼是Celery

任務隊列是一種在線程或機器間分發任務的機制。

消息隊列的輸入是工作的一個單元,稱爲任務,獨立的職程(Worker)進程持續監視隊列中是否有需要處理的新任務。

Celery 用消息通信,通常使用中間人(Broker)在客戶端和職程間斡旋。這個過程從客戶端向隊列添加消息開始,之後中間人把消息派送給職程。

Celery 系統可包含多個職程和中間人,以此獲得高可用性和橫向擴展能力。

Celery 是用 Python 編寫的,但協議可以用任何語言實現。迄今,已有 Ruby 實現的 RCelery 、node.js 實現的 node-celery 以及一個 PHP 客戶端 ,語言互通也可以通過 using webhooks 實現。

0X01 DEMO

1.編寫一個應用,tasks.py

from celery import Celery

app = Celery('tasks', broker='amqp://guest@localhost//')
#  'tasks'爲當前模塊名稱,broker指定所使用的消息中間件

@app.task
def add(x, y):
    return x + y

2.運行Worker服務器

celery worker -A tasks -l info

3.運行了Worker服務器後,tasks.py會被加入到python sys_path,可以直接導入使用

from tasks import add

result = add.delay(4, 4)
"""
這個任務已經由之前啓動的職程執行,並且你可以查看職程的控制檯輸出來驗證。
調用任務會返回一個 AsyncResult 實例,可用於檢查任務的狀態,等待任務完成或獲取返回值(如果任務失敗,則爲異常和回溯)。 但這個功能默認是不開啓的,你需要設置一個 Celery 的結果後端,下一節將會詳細介紹。
"""

result.ready()  # 查看任務是否完成
result.result  # 獲取任務執行結果

4.從py文件更新celery配置

app.config_from_object('celeryconfig')

配置文件celeryconfig.py

BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'

CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json']
CELERY_ENABLE_UTC = True

驗證配置文件是否合法:

python -m celeryconfig

0X02 使用django-celery

1.安裝

apt-get install rabbitmq-server
pip install celery
pip install django-celery

2.加載

INSTALLED_APPS = [
    'djcelery',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

0X03 使用django-celery-beat

一個DEMO:https://github.com/celery/celery/tree/master/examples/django/

celery-4.3.0
django-celery-beat-1.5.0

1、同步數據庫

python3 manage.py migrate

2、運行WebServer

3、運行celery worker

python3 -m celery worker -A celery_test -l info --uid=113 --gid=116

celery -A celery_test beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach  # --detach後臺運行

使用過程中發現BUG太多,暫不使用。可在Linux層面使用crontab來實現週期任務

0X04 最新配置

僅使用celery:

pip install celery==4.3.0
# BROKER和BACKEND可以使用redis和RabbitMQ
yum install rabbitmq-server
# yum install redis

不使用django-celery的原因:

安裝django-celery會默認安裝3.1版本的celery,而celery最新版爲4.3,爲了減少BUG,使用最新版比較妥當。而且使用celery已經可以滿足調用任務隊列的需求,不是太需要django層面的封裝。django-celery-beat倒是比較有用,支持週期性下發任務,並可以改變週期。

目錄結構,Django項目:

celery_test/
├── celery_test
│   ├── celery_config.py # celery配置文件
│   ├── celery.py # celery實例APP
│   ├── __init__.py # 在init加載celery,讓celery跟隨Django啓動
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── main
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── tasks.py # 任務,放在每個APP目錄下的tasks.py文件中
│   ├── tests.py
│   └── views.py
├── manage.py

1、celery_config.py

# -*- coding: utf-8 -*-
# @Time    : 2019/7/3 17:17
# @Author  : Zcs
# @File    : celery_config.py

CELERY_BROKER_URL= 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_RESULT_BACKEND = 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_TIMEZONE = 'Asia/Shanghai'  # 時區設置
CELERY_TASK_SERIALIZER = 'pickle'  # 任務序列化器
CELERY_RESULT_SERIALIZER = 'pickle'  # 結果序列化器
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERY_RESULT_EXPIRES = 3600
# CELERY_WORKER_LOG_FORMAT = '%(asctime)s [%(module)s %(levelname)s] %(message)s'
# CELERY_WORKER_TASK_LOG_FORMAT = '%(task_id)s %(task_name)s %(message)s'
CELERY_WORKER_TASK_LOG_FORMAT = '%(message)s'
CELERY_WORKER_LOG_FORMAT = '%(message)s'
CELERY_TASK_EAGER_PROPAGATES = True
CELERY_WORKER_REDIRECT_STDOUTS = True
CELERY_WORKER_REDIRECT_STDOUTS_LEVEL = "INFO"
# CELERY_WORKER_HIJACK_ROOT_LOGGER = True
CELERY_WORKER_MAX_TASKS_PER_CHILD = 40
CELERY_TASK_SOFT_TIME_LIMIT = 3600

詳細配置:http://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERYD_CONCURRENCY

2、celery.py

# -*- coding: utf-8 -*-
# @Time    : 2019/6/28 9:54
# @Author  : Zcs
# @File    : celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, platforms
from celery_test import settings

# 支持使用root用戶啓動worker
platforms.C_FORCE_ROOT = True

# 解決win64下的一個bug需要用到該設置
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')

# 設置celery命令行默認環境變量,不添加該變量celery會找不到各APP的tasks
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'celery_test.settings')

# 創建celery實例
app = Celery('celery_test')

# 配置參數,使用celery_config
app.config_from_object('celery_test.celery_config')

# 設置配置以CELERY_開頭
app.namespace = 'CELERY'

# 加載所有APP中的tasks.py
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

# @shared_task 裝飾器能讓你在沒有具體的 Celery 實例時創建任務

@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

3、__init__.py

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app

__all__ = ('celery_app',)

4、tasks.py

# -*- coding: utf-8 -*-
# @Time    : 2019/6/27 14:44
# @Author  : Zcs
# @File    : tasks.py
from celery_test.celery import app
import time


@app.task # 該裝飾器將目標函數變成任務,返回值爲任務的uniq_id,寫在函數中的返回值不會直接返回給調用者,而是在任務完成後返回到BACKEND
def send_mail(arg):
    #time.sleep(20)
    return arg

5、運行項目

python3 manage.py runserver 0.0.0.0:8000
celery worker -A project_name -l info --uid=993 --gid=989  # 啓動worker,-A指定celery app目錄,在本例中應爲"celery_test"
flower --port=5555 --broker='redis://127.0.0.1:6379/2'  # 運行celery監控,瀏覽器訪問5555端口即可

6.調用任務

from django.views import View
from .tasks import send_mail
from django.http import HttpResponse


class main_view(View):

    def get(self, request):
        r = send_mail.delay('a') # 返回的r不爲Done,而是該任務的uniq_id
        return HttpResponse(r)

運行celery worker的一些參數:

Examples:

        $ celery worker --app=proj -l info
        $ celery worker -A proj -l info -Q hipri,lopri

        $ celery worker -A proj --concurrency=4
        $ celery worker -A proj --concurrency=1000 -P eventlet
        $ celery worker --autoscale=10,0

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Global Options:
  -A APP, --app APP
  -b BROKER, --broker BROKER
  --result-backend RESULT_BACKEND
  --loader LOADER
  --config CONFIG
  --workdir WORKDIR     Optional directory to change to after detaching.
  --no-color, -C
  --quiet, -q

Worker Options:
  -n HOSTNAME, --hostname HOSTNAME
                        Set custom hostname (e.g., 'w1@%h'). Expands: %h
                        (hostname), %n (name) and %d, (domain).
  -D, --detach          Start worker as a background process.
  -S STATEDB, --statedb STATEDB
                        Path to the state database. The extension '.db' may be
                        appended to the filename. Default: None
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Logging level, choose between DEBUG, INFO, WARNING,
                        ERROR, CRITICAL, or FATAL.
  -O OPTIMIZATION
  --prefetch-multiplier PREFETCH_MULTIPLIER
                        Set custom prefetch multiplier value for this worker
                        instance.

Pool Options:
  -c CONCURRENCY, --concurrency CONCURRENCY
                        Number of child processes processing the queue. The
                        default is the number of CPUs available on your
                        system.
  -P POOL, --pool POOL  Pool implementation: prefork (default), eventlet,
                        gevent or solo.
  -E, --task-events, --events
                        Send task-related events that can be captured by
                        monitors like celery events, celerymon, and others.
  --time-limit TIME_LIMIT
                        Enables a hard time limit (in seconds int/float) for
                        tasks.
  --soft-time-limit SOFT_TIME_LIMIT
                        Enables a soft time limit (in seconds int/float) for
                        tasks.
  --max-tasks-per-child MAX_TASKS_PER_CHILD, --maxtasksperchild MAX_TASKS_PER_CHILD
                        Maximum number of tasks a pool worker can execute
                        before it's terminated and replaced by a new worker.
  --max-memory-per-child MAX_MEMORY_PER_CHILD, --maxmemperchild MAX_MEMORY_PER_CHILD
                        Maximum amount of resident memory, in KiB, that may be
                        consumed by a child process before it will be replaced
                        by a new one. If a single task causes a child process
                        to exceed this limit, the task will be completed and
                        the child process will be replaced afterwards.
                        Default: no limit.

Queue Options:
  --purge, --discard    Purges all waiting tasks before the daemon is started.
                        **WARNING**: This is unrecoverable, and the tasks will
                        be deleted from the messaging server.
  --queues QUEUES, -Q QUEUES
                        List of queues to enable for this worker, separated by
                        comma. By default all configured queues are enabled.
                        Example: -Q video,image
  --exclude-queues EXCLUDE_QUEUES, -X EXCLUDE_QUEUES
                        List of queues to disable for this worker, separated
                        by comma. By default all configured queues are
                        enabled. Example: -X video,image.
  --include INCLUDE, -I INCLUDE
                        Comma separated list of additional modules to import.
                        Example: -I foo.tasks,bar.tasks

Features:
  --without-gossip      Don't subscribe to other workers events.
  --without-mingle      Don't synchronize with other workers at start-up.
  --without-heartbeat   Don't send event heartbeats.
  --heartbeat-interval HEARTBEAT_INTERVAL
                        Interval in seconds at which to send worker heartbeat
  --autoscale AUTOSCALE
                        Enable autoscaling by providing max_concurrency,
                        min_concurrency. Example:: --autoscale=10,3 (always
                        keep 3 processes, but grow to 10 if necessary)

Daemonization Options:
  -f LOGFILE, --logfile LOGFILE
                        Path to log file. If no logfile is specified, stderr
                        is used.
  --pidfile PIDFILE     Optional file used to store the process pid. The
                        program won't start if this file already exists and
                        the pid is still alive.
  --uid UID             User id, or user name of the user to run as after
                        detaching.
  --gid GID             Group id, or group name of the main group to change to
                        after detaching.
  --umask UMASK         Effective umask(1) (in octal) of the process after
                        detaching. Inherits the umask(1) of the parent process
                        by default.
  --executable EXECUTABLE
                        Executable to use for the detached process.

Embedded Beat Options:
  -B, --beat            Also run the celery beat periodic task scheduler.
                        Please note that there must only be one instance of
                        this service. .. note:: -B is meant to be used for
                        development purposes. For production environment, you
                        need to start celery beat separately.
  -s SCHEDULE_FILENAME, --schedule-filename SCHEDULE_FILENAME, --schedule SCHEDULE_FILENAME
                        Path to the schedule database if running with the -B
                        option. Defaults to celerybeat-schedule. The extension
                        ".db" may be appended to the filename. Apply
                        optimization profile. Supported: default, fair
  --scheduler SCHEDULER
                        Scheduler class to use. Default is
                        celery.beat.PersistentScheduler

6、可以給redis或rabbitmq設置密碼登錄,改一下配置即可

修改rabbitmq默認guest用戶的密碼(默認無密碼,但只允許本地登錄):

rabbitmqctl change_password guest your_password

修改後更改celery_config.py中的配置:

CELERY_BROKER_URL= 'amqp://guest:your_password@localhost//'
CELERY_RESULT_BACKEND = 'amqp://guest:your_password@localhost//'

redis:

CELERY_BROKER_URL= 'redis://:[email protected]:6379/2'
CELERY_RESULT_BACKEND = 'redis://:[email protected]:6379/2'

7、更多的需求

1)讓多個任務鏈式執行:

https://celery.readthedocs.io/en/latest/userguide/canvas.html#chains

>>> from celery import chain
>>> from proj.tasks import add, mul

>>> # (4 + 4) * 8 * 10
>>> res = chain(add.s(4, 4), mul.s(8), mul.s(10))
proj.tasks.add(4, 4) | proj.tasks.mul(8) | proj.tasks.mul(10)

https://www.cnblogs.com/wdliu/p/9517535.html

2)讓多個任務鏈式執行,但不傳遞上個任務的結果

chain,管道,chord默認都會將上個任務結果傳給下個任務,對於只想鏈式執行的任務,沒有必要傳遞結果,默認傳遞結果還會使你的參數不容易配置。所以使用si()函數完成異步調用,而不是s()。當然,這並不影響你傳遞自己想要傳遞的參數,EXP:

task = chain(stop.si(),
             init.si(args),
             start.si())
task()

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章