Encoding vs decoding

Encoding and decoding in Strata are not symmetrical operations.

They serve different purposes, enforce different rules, and carry different guarantees.

Understanding this distinction is critical to using Strata correctly.


Encoding

Encoding transforms a structured value into canonical Strata Core Binary (.scb).

It is a strict, opinionated, and enforcing operation.

Value -> encode -> .scb


What encoding guarantees

Encoding guarantees that:

  • the output is canonical

  • the output is unambiguous

  • the output is deterministic

  • the output is valid Strata Core Binary

  • the output has exactly one possible byte representation

If encoding succeeds, the bytes are correct by definition.


What encoding enforces

Encoding enforces:

  • canonical map ordering

  • canonical integer encoding

  • canonical length encoding

  • canonical tag usage

  • rejection of invalid values

Encoding does not preserve:

  • source formatting

  • original key order

  • comments

  • syntactic sugar

  • author intent

Only semantics survive.


Encoding failure

Encoding fails if:

  • the value is invalid

  • a rule would be violated

  • canonical representation cannot be produced

Encoding never guesses. Encoding never coerces. Encoding never normalizes silently.

Failure is explicit.


Decoding

Decoding transforms raw bytes into a structured value.

Decoding is interpretive, not enforcing.


What decoding guarantees

Decoding guarantees that:

  • the bytes are structurally valid

  • the structure can be interpreted

  • the value can be reconstructed

Decoding does not guarantee:

  • canonical origin

  • uniqueness of representation

  • semantic correctness beyond structure

Decoding answers one question only:

"Do these bytes represent a valid Strata value?"


Strict decoding

Strata decoding is still strict:

  • invalid tags are rejected

  • truncated inputs are rejected

  • malformed varints are rejected

  • invalid UTF-8 is rejected

  • trailing bytes are rejected

Invalid data does not round-trip.


What decoding does not enforce

Decoding does not enforce:

  • canonical map ordering

  • canonical source structure

  • minimal representations

Those guarantees exist only at encoding time.


Canonical boundary

The canonical boundary is encoding, not decoding.

Decoding reveals. Encoding defines.


Hashing relationship

Hashing operates on encoded bytes, never on decoded values.

Never:

If hashing depended on decoding behavior, determinism would be lost.


Round-tripping

A valid round-trip looks like this:

The reverse is not guaranteed:

Encoding after decoding may produce different bytes if the original input was non-canonical.

This is intentional.


Why this asymmetry exists

This asymmetry exists to:

  • eliminate ambiguity

  • centralize correctness

  • simplify reasoning

  • protect hashing and signatures

  • allow safe decoding of hostile input

Encoding is law. Decoding is observation.


Common mistake

"If decoding accepts it, encoding should reproduce it."

No.

If decoding accepted it but encoding changes it, the input was not canonical.


Summary

  • encoding enforces truth

  • decoding interprets data

  • canonical form exists only after encoding

  • hashing depends on encoding, not decoding

  • asymmetry is a feature, not a bug

In Strata, encoding is authority.

Last updated

Was this helpful?