Ansible 性能優化(二):調整DEFAULT_INTERNAL_POLL_INTERVAL參數,降低結果處理輪詢的頻率

(本文基於Ansible 2.7)
在base.yml中對DEFAULT_INTERNAL_POLL_INTERVAL參數是如下定義的:

DEFAULT_INTERNAL_POLL_INTERVAL:
  name: Internal poll interval
  default: 0.001
  env: []
  ini:
  - {key: internal_poll_interval, section: defaults}
  type: float
  version_added: "2.2"
  description:
    - This sets the interval (in seconds) of Ansible internal processes polling each other.
      Lower values improve performance with large playbooks at the expense of extra CPU load.
      Higher values are more suitable for Ansible usage in automation scenarios,
      when UI responsiveness is not required but CPU usage might be a concern.
    - "The default corresponds to the value hardcoded in Ansible <= 2.1"

這個描述已經寫得很清楚了:如果該參數值設置得較小,則以較高的CPU額外負載來換取大型playbook的執行性能。而在(運維)自動化應用場景中,更在乎CPU使用率而非界面響應(充分利用計算資源)的條件下,使用較低的值比較合適。

這個參數只在Strategy裏用,如StrategyBase:

    def _wait_on_handler_results(self, iterator, handler, notified_hosts):
        '''
        Wait for the handler tasks to complete, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []
        handler_results = 0

        display.debug("waiting for handler results...")
        while (self._pending_results > 0 and
               handler_results < len(notified_hosts) and
               not self._tqm._terminated):

            if self._tqm.has_dead_workers():
                raise AnsibleError("A worker was found in a dead state")

            results = self._process_pending_results(iterator)
            ret_results.extend(results)
            handler_results += len([
                r._host for r in results if r._host in notified_hosts and
                r.task_name == handler.name])
            if self._pending_results > 0:
                time.sleep(C.DEFAULT_INTERNAL_POLL_INTERVAL)

        display.debug("no more pending handlers, returning what we have")

        return ret_results

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
                raise AnsibleError("A worker was found in a dead state")

            results = self._process_pending_results(iterator)
            ret_results.extend(results)
            if self._pending_results > 0:
                time.sleep(C.DEFAULT_INTERNAL_POLL_INTERVAL)

        display.debug("no more pending results, returning what we have")

        return ret_results

都是循環裏面sleep,1ms太小,修改爲0.1即100ms即可有較佳表現。

總結:本文與Ansible 性能優化(一):降低工作進程(Worker Process)列表檢查頻率所討論的內容相同,都是通過降低輪詢的頻率來降低CPU的使用率。高頻的輪詢造成的後果不止是服務器承載能力的下降,任務規劃策略的困難,還表現爲同一個play中task執行數據的巨大差異,有時執行內容完全相同,僅目標不同的task執行時間差距可以達到數百倍,對我們的數據統計分析也造成了很大困擾。正如上文引用的參數描述中所說的,針對自動化場景降低CPU的使用率十分有益。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章