我們本次分析的代碼版本是nova-2011,從代碼的分析來看,代碼組織結構還是挺規整的。
我們先看目錄樹,如下:
bin目錄之前已經介紹過了,他們是服務啓動腳本工具,我們主要看nova目錄,nova目錄下基本上每個目錄就是一個nova子服務,這些子服務都一定程度上繼承了nova目錄下的各個模塊文件。
service.py:用來創建服務,啓動服務實例的基礎類定義,novs所有的組件服務都通過他實例化
rpc.py:os各個組件之間的通信都是依靠該模塊提供的rpc機制實現,主要是rabbitmq服務,裏邊定義了衆多的消費者和生產者類的定義,比如:
class AdapterConsumer(TopicConsumer):
"""Calls methods on a proxy object based on method and args"""
def __init__(self, connection=None, topic="broadcast", proxy=None):
LOG.debug(_('Initing the Adapter Consumer for %s') % topic)
self.proxy = proxy
super(AdapterConsumer, self).__init__(connection=connection,
topic=topic)
基於topic的消息消費者類型,初始化的時候除了連接mq的conn對象和topic之外,還有一個proxy對象,這是什麼呢?從代碼看,這裏傳的是compute_manager類的實例化後的對象,該對象定義真正的對vm生命週期的管理操作,所以這個proxy是這個類的實例化的對象值,可以用來操作vm。這個類裏有個方法:
@exception.wrap_exception
def receive(self, message_data, message):
"""Magically looks for a method on the proxy object and calls it
Message data should be a dictionary with two keys:
method: string representing the method to call
args: dictionary of arg: value
Example: {'method': 'echo', 'args': {'value': 42}}
"""
LOG.debug(_('received %s') % message_data)
msg_id = message_data.pop('_msg_id', None)
ctxt = _unpack_context(message_data)
method = message_data.get('method')
args = message_data.get('args', {})
message.ack()
if not method:
# NOTE(vish): we may not want to ack here, but that means that bad
# messages stay in the queue indefinitely, so for now
# we just log the message and send an error string
# back to the caller
LOG.warn(_('no method for message: %s') % message_data)
msg_reply(msg_id, _('No method for message: %s') % message_data)
return
node_func = getattr(self.proxy, str(method))
node_args = dict((str(k), v) for k, v in args.iteritems())
# NOTE(vish): magic is fun!
try:
rval = node_func(context=ctxt, **node_args)
if msg_id:
msg_reply(msg_id, rval, None)
except Exception as e:
logging.exception("Exception during message handling")
if msg_id:
msg_reply(msg_id, None, sys.exc_info())
return
當接收到mq傳來的消息時,我們看到消息體裏有請求上下文,需要執行的方法以及傳給方法的參數。我們要執行消息告訴給我們的方式時,肯定是某個manager提供的,所以會有 getattr(self.proxy, str(method))這個操作,方法要麼是某個類的方法,要麼是單獨的模塊方法。顯然這裏調用的是類實例方法,但是從manager的構造來看,getattr是不能直接獲取到對應的method的,我們繼續查看代碼,發現service.py 裏的服務創建類裏做了一個getattr重寫:
def __getattr__(self, key):
manager = self.__dict__.get('manager', None)
return getattr(manager, key)
這也是rpc模塊能正確獲取到實例方法地址的訣竅之處。
manager.py:這個模塊是nova所有組件manager都要繼承依賴的操作類,其他組件負責實現增加自己的所有操作,該頂級模塊只是定義了簡單的幾個沒有實現的抽象方法,比如:週期函數periodic_tasks,各組件初始化函數:init_host,以及db的實例化操作對象。我們看到每個組件代碼目錄下都有一個manager.py,繼承了上層目錄的manager.py,理解起來也不難。
flags.py:該模塊定義了nova服務生命週期裏需要的環境配置類信息,方便使用一些固化的配置信息
exception.py:這裏定義了基本的異常處理類和函數,主要是一堆裝飾器函數。
log.py:主要定義nova的日誌模塊通用的使用方法。
我們再來看消息處理。
消息這塊其實不太複雜,主要就是定義好連接mq的信息以及消息發佈者和訂閱者的處理就算基本搭好了框架。
這一塊主要都在rpc.py模塊裏,如下:
mq的框架模塊主要用的carrot。
模塊一開始就定義了mq連接類:
class Connection(carrot_connection.BrokerConnection):
"""Connection instance object"""
一個連接大類,連接mq broker服務。
其次分別是消息消費者類和消息生產者類的定義
class Consumer(messaging.Consumer):
"""Consumer base class
Contains methods for connecting the fetch method to async loops
"""
def __init__(self, *args, **kwargs):
for i in xrange(FLAGS.rabbit_max_retries):
if i > 0:
time.sleep(FLAGS.rabbit_retry_interval)
try:
super(Consumer, self).__init__(*args, **kwargs)
self.failed_connection = False
break
except: # Catching all because carrot sucks
fl_host = FLAGS.rabbit_host
fl_port = FLAGS.rabbit_port
fl_intv = FLAGS.rabbit_retry_interval
LOG.exception(_("AMQP server on %(fl_host)s:%(fl_port)d is"
" unreachable. Trying again in %(fl_intv)d seconds.")
% locals())
self.failed_connection = True
if self.failed_connection:
LOG.exception(_("Unable to connect to AMQP server "
"after %d tries. Shutting down."),
FLAGS.rabbit_max_retries)
sys.exit(1)
def fetch(self, no_ack=None, auto_ack=None, enable_callbacks=False):
"""Wraps the parent fetch with some logic for failed connections"""
# TODO(vish): the logic for failed connections and logging should be
# refactored into some sort of connection manager object
...
class Publisher(messaging.Publisher):
"""Publisher base class"""
pass
我們看到publisher類直接繼承的是庫裏的基礎類,consumer類不僅做了繼承還做了修改,主要是處理異常。他們繼承的類模塊信息如下:
from carrot import connection as carrot_connection
from carrot import messaging
上邊說的publisher和consumer是兩個基類,其他的所有消費者類和生產者類都是基於他們做的繼承,除此之外,還有兩個重要的方法:
call方法:消息發送者可以調用該方法發送消息並且需要等待響應才能結束,其定義如下:
def call(context, topic, msg):
"""Sends a message on a topic and wait for a response"""
LOG.debug(_("Making asynchronous call..."))
msg_id = uuid.uuid4().hex
msg.update({'_msg_id': msg_id})
LOG.debug(_("MSG_ID is %s") % (msg_id))
_pack_context(msg, context)
class WaitMessage(object):
def __call__(self, data, message):
"""Acks message and sets result."""
message.ack()
if data['failure']:
self.result = RemoteError(*data['failure'])
else:
self.result = data['result']
wait_msg = WaitMessage()
conn = Connection.instance(True)
consumer = DirectConsumer(connection=conn, msg_id=msg_id)
consumer.register_callback(wait_msg)
conn = Connection.instance()
publisher = TopicPublisher(connection=conn, topic=topic)
publisher.send(msg)
publisher.close()
try:
consumer.wait(limit=1)
except StopIteration:
pass
consumer.close()
# NOTE(termie): this is a little bit of a change from the original
# non-eventlet code where returning a Failure
# instance from a deferred call is very similar to
# raising an exception
if isinstance(wait_msg.result, Exception):
raise wait_msg.result
return wait_msg.result
cast方法:該方法也是發送一個消息,但是無需等待響應,也即是不需要響應。如下:
def cast(context, topic, msg):
"""Sends a message on a topic without waiting for a response"""
LOG.debug(_("Making asynchronous cast..."))
_pack_context(msg, context)
conn = Connection.instance()
publisher = TopicPublisher(connection=conn, topic=topic)
publisher.send(msg)
publisher.close()
消息的定義基本如上。消息的具體發送和接收以及處理下期待續。