python 處理protobuf數據示例

google protobuffer

https://github.com/protocolbuffers/protobuf
protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the “old” format.
來自google protobuf 手冊

簡單來說就是protobuf 格式數據(解析)更快、(使用)更簡單(我並不覺得)、(體積)更輕便。
正如google的其他工具一樣,protobuf缺少手冊、生態和必要的測試數據,所以最近項目需要從json切換到protobuf搞得我痛不欲生。

環境準備

我本機環境:

ProductName:    Mac OS X
ProductVersion: 10.14.6
python 3.7.3

install proc

install manual
第一步:brew install protobuf

第二步:brew upgrade protobuf

第三步:
PROTOC_ZIP=protoc-3.7.1-osx-x86_64.zip
curl -OL https://github.com/google/protobuf/releases/download/v3.7.1/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local include/*
rm -f $PROTOC_ZIP
(每一步都是一次複製一次粘貼)

PROTO文件定義

manual
以下是我寫的一個測試proto文件:

syntax = "proto2";
option java_outer_classname = "OpenRtb";
package com.google.openrtb;
message Person {
required string name = 1;
required int32 id = 2;
}

當然由於我使用的是proto2版本,以及協議是OpenRTB,如果不需要可以這樣寫:

syntax = "proto2";
message Person {
required string name = 1;
required int32 id = 2;
}

文件名爲person.proto

編譯proto文件

python環境會將proto文件編譯爲*_pb2.py(比如person.proto編譯爲person_pb2.py),python內部調用就需要import *_pb2。
具體如何使用請繼續向下閱讀。
編譯命令:

protoc --proto_path=包含proto文件的目錄(最好是絕對路徑) --python_out=pb2.py文件的存放目錄
(最好絕對路徑) .proto文件的存放路徑(最好是絕對路徑)
# 所以我的編譯命令是
protoc --proto_path=/USER/XXXX/protobuf   --python_out=./  /USER/XXXX/protobuf/person.proto

person.proto文件內容爲

syntax = "proto2";
option java_outer_classname = "OpenRtb";
package com.google.openrtb;
message Person {
required string name = 1;
required int32 id = 2;
}

輸出爲:
person_pb2.py

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: person.proto

import sys
_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor.FileDescriptor(
  name='person.proto',
  package='com.google.openrtb',
  syntax='proto2',
  serialized_options=_b('B\007OpenRtb'),
  serialized_pb=_b('\n\x0cperson.proto\x12\x12\x63om.google.openrtb\"\"\n\x06Person\x12\x0c\n\x04name\x18\x01 \x02(\t\x12\n\n\x02id\x18\x02 \x02(\x05\x42\tB\x07OpenRtb')
)




_PERSON = _descriptor.Descriptor(
  name='Person',
  full_name='com.google.openrtb.Person',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  fields=[
    _descriptor.FieldDescriptor(
      name='name', full_name='com.google.openrtb.Person.name', index=0,
      number=1, type=9, cpp_type=9, label=2,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='id', full_name='com.google.openrtb.Person.id', index=1,
      number=2, type=5, cpp_type=1, label=2,
      has_default_value=False, default_value=0,
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  nested_types=[],
  enum_types=[
  ],
  serialized_options=None,
  is_extendable=False,
  syntax='proto2',
  extension_ranges=[],
  oneofs=[
  ],
  serialized_start=36,
  serialized_end=70,
)

DESCRIPTOR.message_types_by_name['Person'] = _PERSON
_sym_db.RegisterFileDescriptor(DESCRIPTOR)

Person = _reflection.GeneratedProtocolMessageType('Person', (_message.Message,), dict(
  DESCRIPTOR = _PERSON,
  __module__ = 'person_pb2'
  # @@protoc_insertion_point(class_scope:com.google.openrtb.Person)
  ))
_sym_db.RegisterMessage(Person)


DESCRIPTOR._options = None
# @@protoc_insertion_point(module_scope)

使用protobuf

測試代碼

from google.protobuf import json_format
from ppydsp.protobuf import person_pb2
import json
# 將數據轉爲protobuf格式
person = person_pb2.Person()  
person.id = 123                                                                                                                                                                  
person.name = "abc"                                                                                                                                                              
p = person.SerializeToString() 
print(p)
# 值得注意的是如果使用的是sanic框架,response的返回如下:
response.row(p)
# row()特意爲protobuf這樣的字節流數據準備的,content-type默認爲:application/octet-stream
# b'\n\x03abc\x10{'

# 將protobuf數據轉爲json格式
 persons = person_pb2.Person()  
 persons.ParseFromString(p) 
 p = json_format.MessageToJson(persons)    
print(p)
# 
#{
#  "name": "abc",
#  "id": 123
#}

當然可以看到對於person的attr(name和id),測試中是逐個賦值的(現實場景中字段往往很多,比如openrtb協議中的字段有幾百個),這其實是可以優化的。
文章有些連接訪問可能需要科學上網,不過重要的信息我都有複製到文章中,影響不大。
測試代碼地址:https://github.com/SchopenhauerZhang/py_protobuf

C++ 使用教程

https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/index.html

protobuf性能測試

protobuf 與json、xml的性能對比:
1、 https://www.infoq.cn/article/json-is-5-times-faster-than-protobuf
2、 https://tech.meituan.com/2015/02/26/serialization-vs-deserialization.html(https://code.google.com/archive/p/thrift-protobuf-compare/wikis/Benchmarking.wiki)

擴展閱讀

SerializeToString使用介紹:
http://cn.voidcc.com/question/p-crpmraiz-va.html
json_format手冊:
https://developers.google.com/protocol-buffers/docs/reference/python/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章