Protobuf data-structures
Protocol Buffers, or Protobuf, (protobuf.dev) is language-neutral, platform-neutral technology for serializing structured data.
Compared to XML or JSON, Protobuf provide an explicit schema definition language, a more efficient binary encoding, and automatic code generation for multiple programming languages.
Why Protobuf
-
Explicit Schema-Definition Language: Protobuf uses a schema-definition language to define the structure of the data, ensuring well-defined types and nested relationships for each field. The language also supports backward and forward compatibility, allowing to add or remove fields without breaking existing code.
-
Efficient Encoding: It provides an efficient encoding for binary serialization, resulting in compact data that is faster to process.
-
Code Generation Toolkit (protoc): Protobuf includes a toolkit that generates native-code libraries in multiple programming languages (e.g., Python, Java, C++), relieving the programmer from writing and maintaining custom data-structures and serialization methods.
Use Cases
-
gRPC: Protobuf is the default serialization format for gRPC, a high-performance, language-agnostic RPC framework. This combination allows for robust and efficient communication between microservices.
-
Data Storage: Protobuf is often used for storing structured data in databases, where efficient storage and retrieval are critical.
-
Configuration Files: When data is structured using a Protobuf object, it can be printed (and read) in a human-friendly format called
txtpb
. Similar to JSON, this text format is easy to read and edit by users, with two additional benefits:- The schema of the data is explicit and strict, as defined in the associated .proto file.
- The txtpb data can be parsed directly into native-code objects, using the classes generated by the protoc toolkit.
Schema Definition Language
Protobuf provides a flexible and efficient language for defining the structure of your data. In a .proto
file, you can specify various data types, such as int32 for integers, string for text, and bytes for raw binary data.
You can also use the repeated
keyword to define fields that can contain multiple values of the same type, similar to an array or list in other programming languages.
Finally, Protobuf allows for the nesting of messages, enabling you to create complex data structures by embedding one message type within another.
More info on the official website: proto2, proto3.
How Protobuf works
Data Definition
Create a file called person.proto
with the following content:
syntax = "proto3";
message Person {
string name = 1;
int32 id = 2;
string email = 3;
}
Code Generation
Now, use the protoc compiler to generate Python classes from your .proto file.
protoc --python_out=. person.proto
This command will create a person_pb2.py
file in the current directory.
Use the Generate Code
Finally, you can use the generated code to create and serialize/deserialize a Person
object.
import person_pb2
# Create a new Persons object
person = person_pb2.Person()
person.name = "John Doe"
person.id = 1234
person.email = "johndoe@example.com"
# Serialize the object to a binary string
serialized_person = person.SerializeToString()
print("Serialized Person:", serialized_person)
# Deserialize the binary string back to a Person object
new_person = person_pb2.Person()
new_person.ParseFromString(serialized_person)
print("Deserialized Person:")
print(f"Name: {new_person.name}")
print(f"ID: {new_person.id}")
print(f"Email: {new_person.email}")
Learn more
- Protobuf Overview
- Protobuf syntax: proto2, proto3
- Protobuf tutorials for each programming language