RFC-0171/MessageSerialization

Message Serialization

status: stable

Maintainer(s): Cayle Sharrock Stanley Bondi

Licence

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of this document must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS DOCUMENT IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS", AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Language

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY" and "OPTIONAL" in this document are to be interpreted as described in BCP 14 (covering RFC2119 and RFC8174) when, and only when, they appear in all capitals, as shown here.

Disclaimer

This document and its content are intended for information purposes only and may be subject to change or update without notice.

This document may include preliminary concepts that may or may not be in the process of being developed by the Tari community. The release of this document is intended solely for review and discussion by the community of the technological merits of the potential system outlined herein.

Goals

The aim of this Request for Comment (RFC) is to describe the message serialization formats for message payloads used in the Tari network.

RFC-0710: Tari Communication Network and Network Communication Protocol

Description

One way of interpreting the Tari network is that it is a large peer-to-peer messaging application. The entities chatting on the network include:

Wallets
Base nodes
Validator nodes
Other client applications (e.g. chat)

The types of messages that these entities send might include:

Text messages
Transaction messages
Block propagation messages
Asset creation instructions
Asset state change instructions
State Checkpoint messages

For successful communication to occur, the following needs to happen:

The message is translated from its memory storage format into a standard payload format that will be transported over the wire.
The communication module wraps the payload into a message format, which may entail any/all of
- adding a message header to describe the type of payload;
- encrypting the message;
- signing the message;
- adding destination/recipient metadata.
The communication module then sends the message over the wire.
The recipient receives the message and unwraps it, possibly performing any/all of the following:
- decryption;
- verifying signatures;
- extracting the payload;
- passing the serialized payload to modules that are interested in that particular message type.
The message is deserialized into the correct data structure for use by the receiving software

This document only covers the first and last steps, i.e. serializing data from in-memory objects to a format that can be transmitted over the wire. The other steps are handled by the Tari communication protocol.

In addition to machine-to-machine communication, we also standardize on human-to-machine communication. Use cases for this include:

Handcrafting instructions or transactions. The ideal format here is a very human-readable format.
Copying transactions or instructions from cold wallets. The ideal format here is a compact but easy-to-copy format.
Peer-to-peer text messaging. This is just a special case of what has already been described, with the message structure containing a unicode message_text field.

When sending a message from a human to the network, the following happens:

The message is deserialized into the native structure.
Additional validation can be performed.
The usual machine-to-machine process is followed, as described above.

Binary Serialization Formats

The ideal properties for binary serialization formats are:

widely used across multiple platforms and languages, but with excellent Rust support;
compact binary representation; and
serialization "Just Works"(TM) with little or no additional coding overhead.

Several candidates fulfill these properties to some degree.

bincode

Pros:
- Fast and compact
- Serde support
Cons:
- (Almost) any type changes result in incompatibilities
- language support outside of rust is limited

Message Pack

Pros:
- Compact
- Fast
- Multiple language support
- Good Rust/Serde support
- Native byte encoding (compared to JSON)
Cons:
- No metadata support
- Self-describing overhead

MessagePack has almost the exact same characteristics as JSON, without the syntactical overhead. It is also self-describing, which can be a pro and a con. Like JSON, field names are encoded as strings which adds significant overhead over p2p comms. However, when used in conjunction with msgpack-schemas this overhead is removed and binary representations become characteristically similar to protobuf.

Protobuf

Protobuf is a widely-used binary serialization format that was developed by Google and has excellent Rust support. The Protobuf byte format encodes tag numbers as varints that map to a known schema fields.

Pros:
- Compact
- Fast
- Multiple language support
- Good Rust/Serde support
- Some schema changes are backward compatible
Cons
- Schema must be defined for each message type
- Does not fit into the serde ecosystem, meaning it is hard to swap out later.

In the latest protoV3 spec, all fields are optional. which forces the implementation to check for the presence of required data and allows for. This means that you can change your schema significantly and there is a

It's fairly easy to reason about backwards-compatibility for schema changes once you understand protobuf encoding. Essentially, since all fields in protov3 are optional, a message will usually be able to successfully decode even if message tags are added/changed/removed. It therefore depends on the application whether changes are backward-compatible.

Generally, the following rules apply:

you should not change existing field types to a non-compatible type. For example, changing a uint32 to a uint64 is fine, but changing a uint32 to a string is not.
you should not change existing tag numbers should not be changed, field names may change as needed since they are not included in the byte format.
you may delete optional or repeated fields
you may add new optional or repeated fields as long as you use a new field number

Cap'n Proto

Similar to Protobuf, but claims to be much faster. Rust is supported.

Hand-rolled Serialization

Hintjens recommends using hand-rolled serialization for bulk messaging. While Pieter usually offers sage advice, I'm going to argue against using custom serializers at this stage for the following reasons:

We're unlikely to improve hugely over existing serialization formats.
Since Serde does 95% of our work for us, there's a significant development overhead (and new bugs) involved with a hand-rolled solution.
We'd have to write de/serializers for every language that wants Tari bindings; whereas every major language has a protobuf implementation.

Tari message formats

Wire message format

The decision was taken to use Protobuf encoding for messages on the Tari peer-to-peer wire protocol, as it ticks these boxes:

it is possible to modify message schemas in future versions without breaking network communication with previous versions of node software,
de/encoding is compact and fast, and
it has great Rust support through the prost crate.

Other serialization formats

For human-readable formats, it makes little sense to deviate from JSON. For copy-paste semantics, the extra compression that Base64 offers over raw hex or Base58 makes it attractive.

The standard binary representation used in databases (e.g. blockchain storage, wallets) will make use of bincode. In these cases, a straightforward #[derive(Deserialize, Serialize)] is all that is required to implement de/encoding for the data structure.

However, other structures might need fine-tuning, or hand-written serialization procedures. To capture both use cases, it is proposed that a MessageFormat trait be defined:

#![allow(unused)]
fn main() {
pub trait MessageFormat: Sized {
    fn to_binary(&self) -> Result<Vec<u8>, MessageFormatError>;
    fn to_json(&self) -> Result<String, MessageFormatError>;
    fn to_base64(&self) -> Result<String, MessageFormatError>;

    fn from_binary(msg: &[u8]) -> Result<Self, MessageFormatError>;
    fn from_json(msg: &str) -> Result<Self, MessageFormatError>;
    fn from_base64(msg: &str) -> Result<Self, MessageFormatError>;
}
}

This trait will have default implementations to cover most use cases (e.g. a simple call through to serde_json). Serde also offers significant ability to tweak how a given struct will be serialized through the use of attributes.

Change Log

Date	Change	Author
29 Mar 2019	First draft	CjS77
24 Jul 2019	Technical editing	anselld
13 Oct 2022	Update	sdbondi
26 Oct 2022	Stabilise RFC	CjS77