Protobuf data-structures

Protobuf data-structures

Protocol Buffers, or Protobuf, (protobuf.dev) is language-neutral, platform-neutral technology for serializing structured data.

Compared to XML or JSON, Protobuf provide an explicit schema definition language, a more efficient binary encoding, and automatic code generation for multiple programming languages.

Why Protobuf

  • Explicit Schema-Definition Language: Protobuf uses a schema-definition language to define the structure of the data, ensuring well-defined types and nested relationships for each field. The language also supports backward and forward compatibility, allowing to add or remove fields without breaking existing code.

  • Efficient Encoding: It provides an efficient encoding for binary serialization, resulting in compact data that is faster to process.

  • Code Generation Toolkit (protoc): Protobuf includes a toolkit that generates native-code libraries in multiple programming languages (e.g., Python, Java, C++), relieving the programmer from writing and maintaining custom data-structures and serialization methods.

Use Cases

  • gRPC: Protobuf is the default serialization format for gRPC, a high-performance, language-agnostic RPC framework. This combination allows for robust and efficient communication between microservices.

  • Data Storage: Protobuf is often used for storing structured data in databases, where efficient storage and retrieval are critical.

  • Configuration Files: When data is structured using a Protobuf object, it can be printed (and read) in a human-friendly format called txtpb. Similar to JSON, this text format is easy to read and edit by users, with two additional benefits:

    1. The schema of the data is explicit and strict, as defined in the associated .proto file.
    2. The txtpb data can be parsed directly into native-code objects, using the classes generated by the protoc toolkit.

Schema Definition Language

Protobuf provides a flexible and efficient language for defining the structure of your data. In a .proto file, you can specify various data types, such as int32 for integers, string for text, and bytes for raw binary data.

You can also use the repeated keyword to define fields that can contain multiple values of the same type, similar to an array or list in other programming languages.

Finally, Protobuf allows for the nesting of messages, enabling you to create complex data structures by embedding one message type within another.

More info on the official website: proto2, proto3.

How Protobuf works

Data Definition

Create a file called person.proto with the following content:

person.proto
syntax = "proto3";

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

Code Generation

Now, use the protoc compiler to generate Python classes from your .proto file.

protoc --python_out=. person.proto

This command will create a person_pb2.py file in the current directory.

Use the Generate Code

Finally, you can use the generated code to create and serialize/deserialize a Person object.

demo.py
import person_pb2

# Create a new Persons object
person = person_pb2.Person()
person.name = "John Doe"
person.id = 1234
person.email = "johndoe@example.com"

# Serialize the object to a binary string
serialized_person = person.SerializeToString()
print("Serialized Person:", serialized_person)

# Deserialize the binary string back to a Person object
new_person = person_pb2.Person()
new_person.ParseFromString(serialized_person)
print("Deserialized Person:")
print(f"Name: {new_person.name}")
print(f"ID: {new_person.id}")
print(f"Email: {new_person.email}")

Learn more