Encoding
Encoding is the act of converting an in-memory Strata value into Strata Core Binary (.scb).
In Rust, encoding is fully deterministic, side-effect free, and canonical by construction.
There is exactly one valid byte representation for any given Strata value.
Core guarantee
For any value V:
Encoding
Valways produces the same byte sequenceEncoding does not depend on runtime, platform, or environment
Encoding never guesses or normalizes input
Encoding either succeeds canonically or fails explicitly
If two encoders produce different bytes for the same value, at least one of them is wrong.
Entry point
Encoding begins with a Strata Value and produces a byte vector.
encode(&value) -> VecThe encoder walks the value recursively and writes bytes directly to an output buffer.
Canonical by construction
Canonicality is enforced during encoding, not after.
The encoder guarantees:
Fixed tags for each value type
Canonical varint encoding
Canonical key ordering for maps
Exact byte preservation for strings and bytes
There is no separate "canonicalization" step.
Tag emission
Every value begins with a single-byte tag that identifies its type.
Examples:
Null → 0x00
Bool false → 0x01
Bool true → 0x02
Int → 0x10
String → 0x20
Bytes → 0x21
List → 0x30
Map → 0x40
Tags are fixed by the specification and must never change within a version.
Integer encoding
Integers are encoded as:
Tag byte (Int)
Canonical SLEB128 payload
Rules:
Only signed 64-bit integers are allowed
No alternate encodings
No leading redundancy
No truncation
If an integer cannot be encoded canonically, encoding fails.
String encoding
Strings are encoded as:
Tag byte (String)
Length encoded as canonical ULEB128
Raw UTF-8 bytes
Rules:
UTF-8 must already be valid
No normalization
No escaping
No transformation
Rust strings are UTF-8 by construction, but the checks exist for cross-language parity.
Bytes encoding
Bytes are encoded as:
Tag byte (Bytes)
Length encoded as canonical ULEB128
Raw bytes copied verbatim
No interpretation is applied.
Bytes are not strings and are never decoded as text.
List encoding
Lists are encoded as:
Tag byte (List)
Element count as canonical ULEB128
Each element encoded sequentially
Rules:
Order is preserved exactly
Nested structures are encoded depth-first
Empty lists are valid and canonical
Any change in order produces different bytes.
Map encoding
Maps are encoded as:
Tag byte (Map)
Entry count as canonical ULEB128
Entries encoded in sorted key order
Rules:
Keys must be strings
Keys are sorted by UTF-8 byte order
Each key is encoded exactly like a string
Values are encoded immediately after their keys
Rust uses BTreeMap to guarantee ordering at the data structure level.
Duplicate keys
Encoding assumes maps are already resolved.
If duplicate keys existed during parsing:
The last value wins
Only the final value is encoded
No trace of duplication remains
Encoding reflects the final semantic state only.
Error behavior
Encoding can fail only in strictly defined cases:
Invalid UTF-8 (non-Rust implementations)
Invalid integer range
Internal invariant violations
Encoding never silently fixes or ignores errors.
Framing vs encoding
Encoding produces pure Strata Core Binary.
Framing is a separate concern and must not alter encoded bytes.
Encoding defines truth. Framing defines transport.
Stability guarantee
Within a finalized version:
Encoding rules are frozen
Byte layouts must never change
Hashes derived from encoded bytes remain stable forever
Breaking encoding rules requires a new version and a new Northstar.
Summary
Rust encoding is:
Deterministic
Canonical
Explicit
Non-negotiable
If encoding feels strict, that is intentional.
Correctness is cheaper than ambiguity.
Last updated
Was this helpful?