Canonical encoding
Canonical encoding is the foundation of Strata.
It is the reason Strata exists, and the reason every other guarantee holds.
In Strata, a value has exactly one valid binary representation. No alternatives. No "equivalent" encodings. No normalization steps after the fact.
If two encoders produce different bytes for the same logical value, at least one of them is wrong.
What "canonical" means in Strata
Canonical encoding means:
A Strata value maps to one and only one byte sequence
Encoding is fully deterministic
Decoding does not repair, normalize, or reinterpret data
Hashing is performed over canonical bytes only
There is no concept of:
permissive encoding
equivalent representations
platform-dependent behavior
runtime-dependent output
Canonical encoding is not a guideline. It is a hard invariant.
Why canonical encoding matters
Canonical encoding enables guarantees most data formats cannot provide:
Stable hashing across languages and runtimes
Verifiable equality without semantic comparison
Cross-language reproducibility
Auditability and long-term storage correctness
Protocol safety without hidden behavior
Without canonical encoding:
hashes diverge
signatures become unstable
caches fragment
distributed systems silently disagree
Strata chooses correctness over convenience.
Canonical vs "normalized" formats
Many formats claim determinism but rely on normalization:
keys reordered after parsing
values coerced during encoding
floats normalized implicitly
decoders accepting multiple forms
Normalization happens after ambiguity has already entered the system.
Strata rejects this model entirely.
In Strata:
ambiguity is not representable
invalid input is rejected
correctness is enforced at encode time
Scope of canonical rules
Canonical rules apply to:
Binary encoding of values
Ordering of map keys
Integer representation
String encoding (UTF-8)
Byte sequences
Hash input definition
Canonical rules do not apply to:
Transport framing
Streaming boundaries
Envelopes or wrappers
Compression or encryption
Application-level protocols
These layers are explicitly outside the canonical core.
Encoding vs decoding
Encoding and decoding have different responsibilities.
Encoding
Encoding is where truth is enforced.
Only canonical representations may be emitted
Invalid values are rejected
Duplicate map keys are forbidden
Non-canonical states cannot be produced
Decoding
Decoding is observational.
Non-canonical ordering may be preserved
Duplicate keys may exist for inspection
No normalization is applied
Malformed input fails explicitly
Encoding enforces truth. Decoding reveals reality.
Canonical encoding and hashing
All Strata hashes are computed over canonical encoded bytes.
This means:
Hashes do not depend on language
Hashes do not depend on platform
Hashes do not depend on runtime behavior
Hashes are stable for the lifetime of a version line
If two implementations produce different hashes for the same value, canonical rules have been violated.
Stability guarantee
Canonical encoding rules are frozen within a version line.
For example:
All v0.3.x releases share identical canonical encoding
Bytes and hashes must never change within that line
Any change to canonical rules requires a new minor version and a new Northstar
This is a requirement, not a goal.
Summary
Canonical encoding in Strata means:
One value -> one byte sequence
One byte sequence -> one hash
Zero ambiguity
Zero normalization
Zero silent behavior
If correctness matters, canonical encoding is not optional.
It is the contract.
Last updated
Was this helpful?