Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
欢迎访问协议缓冲区的开发人员文档——一种语言无关、平台无关、可扩展的序列化结构化数据的方法,可用于通信协议、数据存储等。
This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.
本文档针对希望在其应用程序中使用协议缓冲区的Java、c++或Python开发人员。这篇概述介绍了协议缓冲区,并告诉您需要做什么来开始——然后您可以继续学习教程或更深入地研究协议缓冲区编码。还为所有这三种语言提供了API参考文档,以及编写.proto文件的语言和样式指南。
What are protocol buffers?
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
协议缓冲区是一种序列化结构化数据的灵活、高效、自动化的机制。想想XML,但它更小、更快、更简单。一旦定义了数据的结构化方式,就可以使用生成的各种语言的源代码轻松地在各种数据流之间读写结构化数据。您甚至可以在不破坏已部署程序的情况下更新数据结构,这些程序是根据“旧”格式编译的。
How do they work?
You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:
通过在.proto文件中定义协议缓冲区消息类型,可以指定要序列化的信息的结构。每个协议缓冲区消息都是一个很小的信息逻辑记录,包含一系列名称-值对。下面是一个非常基本的.proto文件的例子,它定义了一个包含个人信息的消息:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.
正如你所看到的,消息格式很简单——每个消息类型都有一个或多个独特的编号字段,每个字段都有一个名称和一个值类型、值类型可以是数字(整数或浮点)、布尔值、字符串、原始字节,甚至(在上面的示例中)其他协议缓冲消息类型,允许你结构化分层次的数据。可以指定可选字段、必需字段和重复字段。您可以在协议缓冲区语言指南中找到关于编写.proto文件的更多信息。
Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:
一旦定义了消息,就可以在.proto文件上运行应用程序语言的协议缓冲区编译器来生成数据访问类。它们为每个字段提供简单的访问器(如name()和set_name()),以及将整个结构与原始字节的序列化和解析 方法——例如,如果您选择的语言是c++,那么在上面的例子中运行编译器将生成一个名为Person的类。然后可以在应用程序中使用这个类来填充、序列化和检索Person协议缓冲区消息。你可以这样写一些代码:
Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("jdoe@example.com");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);
Then, later on, you could read your message back in:
fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;
You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.
可以在不破坏向后兼容性的情况下向消息格式添加新字段;旧的二进制文件在解析时直接忽略新字段。因此,如果您有一个使用协议缓冲区作为其数据格式的通信协议,您可以扩展您的协议,而不必担心破坏现有的代码。
You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.
您将在API参考部分找到使用生成的协议缓冲区代码的完整参考,并且您可以了解更多关于协议缓冲区消息如何在协议缓冲区编码中编码的信息
Why not just use XML?
Protocol buffers have many advantages over XML for serializing structured data.协议缓冲区在序列化结构化数据方面比XML有很多优势 Protocol buffers:
- are simpler
- are 3 to 10 times smaller
- are 20 to 100 times faster
- are less ambiguous
- generate data access classes that are easier to use programmatically
For example, let's say you want to model a person with a name and an email. In XML, you need to do:
<person>
<name>John Doe</name>
<email>jdoe@example.com</email>
</person>
while the corresponding protocol buffer message (in protocol buffer text format) is:
Textual representation of a protocol buffer.
This is not the binary format used on the wire.
person {
name: "John Doe"
email: "jdoe@example.com"
}
When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.
当将此消息编码为协议缓冲区二进制格式(上面的文本格式只是一种便于人类调试和编辑的可读表示)时,它可能有28字节长,需要大约100-200纳秒来解析。如果删除空白,那么XML版本至少是69字节,解析大约需要5000 - 10000纳秒。
Also, manipulating a protocol buffer is much easier:
而且,操作协议缓冲区要容易得多
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;
Whereas with XML you would have to do something like:
cout << "Name: "
<< person.getElementsByTagName("name")->item(0)->innerText()
<< endl;
cout << "E-mail: "
<< person.getElementsByTagName("email")->item(0)->innerText()
<< endl;
However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup VB(e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).
然而,协议缓冲区并不总是比XML更好的解决方案——例如,协议缓冲区不是 使用标记(如HTML)对基于文本的文档建模的好方法,因为您不能轻易地将将结构与文本交织在一起将结构与文本交织在一起。此外,XML是人类可读和可编辑的,协议缓冲区至少在它们的本机格式中不是。在某种程度上,XML也是自描述的。协议缓冲区只有在有消息定义(.proto文件)时才有意义。
Sounds like the solution for me! How do I get started?
Download the package – this contains the complete source code for the Java, Python, and C++ protocol buffer compilers, as well as the classes you need for I/O and testing. To build and install your compiler, follow the instructions in the README.
下载这个包——它包含Java、Python和c++协议缓冲区编译器的完整源代码,以及I/O和测试所需的类。要构建和安装编译器,请遵循自述文件中的说明。
Once you're all set, try following the tutorial for your chosen language – this will step you through creating a simple application that uses protocol buffers.
一旦你准备好了,尝试按照教程为你选择的语言,这将引导您创建一个使用协议缓冲区的简单应用程序。
Introducing proto3
Our most recent version 3 release introduces a new language version - Protocol Buffers language version 3 (aka proto3), as well as some new features in our existing language version (aka proto2). Proto3 simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages: our current release lets you generate protocol buffer code in Java, C++, Python, Java Lite, Ruby, JavaScript, Objective-C, and C#. In addition you can generate proto3 code for Go using the latest Go protoc plugin, available from the golang/protobuf Github repository. More languages are in the pipeline
我们最新的版本3引入了一个新的语言版本——协议缓冲语言版本3 (aka proto3),以及我们现有的语言版本(aka proto2)中的一些新特性。Proto3简化了协议缓冲区语言,既便于使用,又使其在更广泛的编程语言中可用:我们当前的版本允许您用Java、c++、Python、Java Lite、Ruby、JavaScript、Objective-C和c#生成协议缓冲区代码。另外,你可以使用最新的Go protoc插件来为Go生成proto3代码,这个插件可以从[golang/protobuf](https://github.com/golang/protobuf( Github库中获得。更多的语言正在开发中。
Note that the two language version APIs are not completely compatible. To avoid inconvenience to existing users, we will continue to support the previous language version in new protocol buffers releases.
注意,这两个语言版本的api并不完全兼容。为了避免给现有用户带来不便,我们将在新的协议缓冲区版本中继续支持以前的语言版本。.
You can see the major differences from the current default version in the release notes and learn about proto3 syntax in the Proto3 Language Guide. Full documentation for proto3 is coming soon!
您可以在发布说明中看到与当前默认版本的主要区别,并在proto3语言指南中了解proto3语法。proto3的完整文档即将发布!
(If the names proto2 and proto3 seem a little confusing, it's because when we originally open-sourced protocol buffers it was actually Google's second version of the language – also known as proto2. This is also why our open source version number started from v2.0.0).
如果proto2和proto3的名字看起来有点混乱,那是因为当我们最初使用开源协议缓冲区时,它实际上是谷歌语言的第二个版本——也被称为proto2。这也是我们的开源版本号从v2.0.0开始的原因。
A bit of history
Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:
if (version == 3) {
...
} else if (version > 4) {
if (version == 5) {
...
}
...
}
Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protocol.
显式格式化的协议也使新协议版本的推出变得复杂,因为开发人员必须确保请求发起者和处理请求的实际服务器之间的所有服务器都理解新协议,然后才能切换开关开始使用新协议。
Protocol buffers were designed to solve many of these problems:
协议缓冲区被设计来解决许多这样的问题:MN
New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.
可以很容易地引入新字段,不需要检查数据的中间服务器可以简单地解析数据并传递数据,而不需要了解所有字段。
Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)
格式更多地是自描述的,可以从各种语言(c++、Java等)处理。
However,usersstillneededtohand-writetheirownparsingcode.
但是,用户仍然需要手工编写自己的解析代码。
As the system evolved, it acquired a number of other features and uses:
随着系统的发展,它获得了许多其他特点和用途:
Automatically-generated serialization and deserialization code avoided the need for hand parsing.
自动生成的序列化和反序列化代码避免了手工解析。
In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).
除了用于短时间的RPC(远程过程调用)请求之外,人们开始使用协议缓冲区作为一种方便的自描述格式来持久地存储数据(例如,在Bigtable中)。
Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.
服务器RPC接口开始声明为协议文件的一部分,协议编译器生成存根类,用户可以用服务器接口的实际实现覆盖这些存根类。
Protocol buffers are now Google's lingua franca for data – at time of writing, there are 306,747 different message types defined in the Google code tree across 348,952 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.
协议缓冲区现在是谷歌的数据通用语言——在编写本文时,在348,952 .proto文件的谷歌代码树中定义了306,747种不同的消息类型。它们既用于RPC系统,也用于各种存储系统中的数据持久存储。