Strata

Deterministic Data. Canonical truth.

What Strata is

Strata is a deterministic binary data format with canonical encoding. Every value has exactly one valid binary representation; identical logical values must produce identical bytes across all implementations.

It is a protocol and specification for representing structured data in a way that is reproducible across implementations, platforms, and time.


Why it exists

Most data formats allow multiple representations of the same logical value.

  • JSON objects can order keys differently.

  • Protocol Buffers permit field reordering.

  • MessagePack supports multiple integer encodings.

These formats prioritize flexibility, evolution, and developer convenience over byte-level determinism.

This flexibility is a liability when correctness depends on exact bytes: content addressing, cryptographic hashing, digital signatures, and distributed consensus.

Strata eliminates representational ambiguity by design. If two values are logically equal, their encodings are byte-identical.


Core guarantees

Canonical encoding. Every value has exactly one valid binary representation. Stable hashing. Hashes are computed over canonical encoded bytes. The encoding is the canonical form. Cross-language determinism. Independent implementations in different languages produce identical bytes for identical input. Strict validation. Non-canonical encodings are rejected during canonical encoding, not silently accepted.


Explicit non-goals

Strata Core Binary (.scb) is intentionally not optimized for the following:

  • Human readability.

  • Compression ratio.

  • Schema evolution.

It does not support optional fields, default values, backward-compatible changes, or floating-point numbers.

These constraints are deliberate. Flexibility in representation introduces ambiguity. Strata chooses precision over convenience.

Strata Text (.st) exists as a human-readable authoring format that compiles into canonical Strata Core Binary.


Value model

Strata supports a minimal, fixed set of value types, each with an unambiguous binary representation:

null - absence of value bool - true or false int - signed 64-bit integers bytes - arbitrary byte sequences string - UTF-8 text list - ordered sequences map - key-value pairs with string keys, sorted by UTF-8 byte order

There are no floats, no optional types, and no undefined behavior.


Stability & Versioning

Strata is versioned. Once a version is finalized, its encoding rules are frozen.

There are no deprecations, no migrations, and no silent changes within a version line.

Code written against a finalized Strata version will produce identical bytes and hashes for the lifetime of that version line. This is a requirement, not a goal.