使用OpenStack創建VM,遇到No valid host was found 錯誤的分析

對於一些剛接觸OpenStack的新人,辛苦兩天終於把OpenStack部署好了,建立實例時卻失敗了。這是一件很鬱悶的事情。
大家試想一下,如果你準備啓動一臺物理服務器,如果服務器的CPU,內存,存儲發生錯誤,在開機自檢階段就過不了,即意味着服務器掛掉,無法啓動了。
同樣,生成VM時常見的報錯 " No valid host was found. There are not enough hosts available "也基本是CPU,內存,存儲出錯導致的。創建VM時錯誤地使用了external類型的網絡,也會產生這個報錯。

1. CPU虛擬化參數配置錯誤

查看nova-compute 日誌:

couldn't obtain the vcpu count fromdomain id: 769f95ac-d8da-41be-8e29-f326f03a762f, exception: Requested operationis not valid: cpu affinity is not supported

**分析:**日誌出現綁定CPU失敗的錯誤,立刻想到和CPU虛擬化相關。/etc/nova/nova.conf
中的virt_type參數設置得不對
**處理:**修改compute節點的配置文件/etc/nova/nova.conf

 如果compute節點是物理機或開啓嵌套虛擬化(CPU硬件加速)的虛擬機: virt_type=kvm
 如果compute節點是未開啓嵌套虛擬化的虛擬機:virt_type=qemu

2. 內存不足導致報錯

 用一個規格比較高的flavor創建實例:

這裏寫圖片描述
nova-conductor.log 報錯:

qemu-kvm:cannot set up guest memory 'pc.ram': Cannot allocate memory\n"]

nova-scheduler.log :
Filter RetryFilter returned 0 hosts

**分析:**日誌已經給出原因:無法分配內存,即內存空間不足。特別是RetryFilter沒有篩選出可以提供符合flavor中資源數量的host,這時應該去確認host中的資源是否不夠用了。
**處理:**增加計算節點內存

3. 存儲相關的原因導致報錯

  1. 用自制的Ubuntu鏡像創建實例時,沒有配置合適的volume大小
    這裏寫圖片描述
    nova日誌報錯:

    Image 1c63d80b-dc44-4785-bf20-4cdb47d7b2c6 is unacceptable: Imagevirtual size is 18GB and doesn't fit in a volume of size 1GB.
    

    **分析和處理:**鏡像中的文件系統是18GB,上圖紅圈中的參數必須大於等於18GB

  2. Block Device Mapping is Invalid 也是很常見錯誤

    **分析和處理:**通常是由於cinder或ceph等backend配置錯誤引起的塊存儲設備報錯。這時就需要查看cinder和ceph的服務狀態是否正常,這裏有幾個常用命令:
    ceph -s
    vgs
    cinder service-list
    cinder list
    lsblk
    進一步排查原因的話還需要分析日誌。

  3. 權限不足(導致沒有可用的存儲)
    查看nova-compute 日誌:

    InvalidDiskInfo:Disk info file is invalid: qemu-img failed to execute on/var/lib/libvirt/images/centos7.0.qcow2 : Unexpected error while runningcommand.
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Command: /usr/bin/python2 -moslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C info /var/lib/libvirt/images/centos7.0.qcow2
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Exit code: 1
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stdout: u''
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stderr: u"qemu-img: Couldnot open '/var/lib/libvirt/images/centos7.0.qcow2': Could not open '/var/lib/libvirt/images/centos7.0.qcow2':Permission denied\n"
    InvalidDiskInfo:Disk info file is invalid: qemu-img failed to execute on/var/lib/libvirt/images/centos7.0.qcow2 : Unexpected error while runningcommand.
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Command: /usr/bin/python2 -moslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C info /var/lib/libvirt/images/centos7.0.qcow2
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Exit code: 1
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stdout: u''
    2017-01-1911:32:27.736 25979 ERROR nova.compute.manager Stderr: u"qemu-img: Couldnot open '/var/lib/libvirt/images/centos7.0.qcow2': 
    Could not open '/var/lib/libvirt/images/centos7.0.qcow2':Permission denied\n"
    

    分析:/var/lib/libvirt/images/centos7.0.qcow2是我之前做實驗用virt-manager在host OS創建的虛擬機,也就是說這個虛擬機鏡像被virt-manager所管理,導致openstack沒有權限使用啓動這個虛擬機(即沒有可用的存儲)。
    **處理:**在host os 上,卸載virt-manager

4. 創建虛擬機時的錯誤操作

外部網絡(Provider / external / public network)一般不能直接用來創建VM(除非該外部網絡同時是shared network)這裏寫圖片描述

如果創建VM時直接使用了外部網絡,則報no valid host was found 錯誤。查看拓撲圖會發現這臺VM沒有鏈路。
這裏寫圖片描述

controller節點 nova-conductor日誌

==> nova-conductor.log <==
2017-10-16 11:02:23.153 2090 ERROR nova.scheduler.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] [instance: 9b2e0b9e-6d16-4576-b289-075761e3d449] Error from last host: compute (node compute): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1905, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2058, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 9b2e0b9e-6d16-4576-b289-075761e3d449 was re-scheduled: Binding failed for port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f, please check neutron logs for more information.\n']


2017-10-16 11:02:23.226 2090 WARNING nova.scheduler.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 84, in select_destinations
    filter_properties)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 90, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

2017-10-16 11:02:23.227 2090 WARNING nova.scheduler.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] [instance: 9b2e0b9e-6d16-4576-b289-075761e3d449] Setting instance to ERROR state.

controller節點 neutron-server日誌

2017-10-16 11:02:22.260 2189 ERROR neutron.plugins.ml2.managers [req-cdcb2a89-98f5-45e7-827f-812a01dc4dd6 0ebb9d3a4d52454ca74d1c45a382795a ed964295d9ca4f878fe4c25478aaeca0 - - -] Failed to bind port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f on host compute

compute節點nova-compute日誌

2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager [-] Instance failed network setup after 1 attempt(s)
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager Traceback (most recent call last):
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1564, in _allocate_network_async
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager     dhcp_options=dhcp_options)
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 744, in allocate_for_instance
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager     self._delete_ports(neutron, instance, created_port_ids)
2017-10-16 11:02:23.708 1408 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__

2017-10-16 11:02:24.004 1408 DEBUG nova.compute.utils [req-370255d8-f462-4090-9203-8db574e8f589 0817b805f51e4877a383dd401a318bee b0d993d7027e457189f70bae70f870a9 - - -] [instance: 9b2e0b9e-6d16-4576-b289-075761e3d449] Binding failed for port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f, please check neutron logs for more information. 

compute節點/var/log/neutron下的openvswitch-agent.log日誌

2017-10-16 11:02:23.138 1417 INFO neutron.agent.securitygroups_rpc [req-cdcb2a89-98f5-45e7-827f-812a01dc4dd6 0ebb9d3a4d52454ca74d1c45a382795a ed964295d9ca4f878fe4c25478aaeca0 - - -] Security group member updated [u'312e84c2-ff27-460f-802a-10eaabb3bd19']
2017-10-16 11:02:23.652 1417 INFO neutron.agent.securitygroups_rpc [req-2493c53e-7b58-47f5-a2af-5f6c49af0f5e 0ebb9d3a4d52454ca74d1c45a382795a ed964295d9ca4f878fe4c25478aaeca0 - - -] Security group member updated [u'312e84c2-ff27-460f-802a-10eaabb3bd19']
2017-10-16 11:02:24.358 1417 INFO neutron.agent.common.ovs_lib [req-1241fe20-da43-457f-9ec8-365111248723 - - - - -] Port 41b8cdb2-3b1d-4661-bb47-f567c57bda5f not present in bridge br-int
2017-10-16 11:02:24.359 1417 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-1241fe20-da43-457f-9ec8-365111248723 - - - - -] port_unbound(): net_uuid None not in local_vlan_map
2017-10-16 11:02:24.360 1417 INFO neutron.agent.securitygroups_rpc [req-1241fe20-da43-457f-9ec8-365111248723 - - - - -] Remove device filter for [u'41b8cdb2-3b1d-4661-bb47-f567c57bda5f']

**分析:**上述日誌中有大量port相關報錯。根本原因是上面提到的創建虛擬機時不能直接使用外部網絡(即無法給VM分配port)。如果要訪問外部網絡,必須經過路由器中轉。

**處理:**使用內部網絡(私有網絡/private網絡)或者shared類型的外部網絡創建虛擬機

網絡的思考:##

通常情況下,物理機不會因爲自身網絡/網卡問題而無法啓動,同樣虛擬機也不會在啓動過程中由於網絡問題而出現No valid host was found 錯誤 。例外的場景是在Openstack中增加第三方SDN控制器後,SDN控制器的某些錯誤配置會引起創建VM時會報No valid host was found 報錯; 以及錯誤的使用外部網絡創建虛擬機時也會報No valid host was found錯誤。

5. 沒有可用的服務

查看日誌可以發現報錯:
這裏寫圖片描述

**分析:**日誌已經給出原因:ServiceNotFound , 即未找到服務供OpenStack使用。
**處理:**有服務沒有安裝好,或者安裝好了沒啓動。沒啓動的話就用手動用命令將其啓動。

小結

查看日誌的技巧——重點看scheduler 日誌中的Filter字段來確定哪種資源不足

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章