Hashing

Hashing in Strata is not a utility feature. It is a core contract.

A Strata hash is a cryptographic commitment to a canonical value, not to an in-memory structure, not to source text, and not to transport framing.


Core rule

A Strata hash is defined as:

BLAKE3-256(canonical Strata Core Binary bytes)

Nothing more. Nothing less.

If two values are logically equal, they must hash identically. If two values hash identically, they must decode to the same value.


Hash target

Hashing always operates on canonical encoded bytes.

Hashing never operates on:

  • Strata Text (.st)

  • Parsed ASTs

  • Runtime structures

  • Transport frames

  • Framed streams

  • Pretty-printed or inspected output

All hashing is downstream of canonical encoding.


Entry point

The Rust reference implementation exposes hashing at the value level.

Internally, hashing is equivalent to:

This sequence is not optional.


Algorithm choice

Strata uses BLAKE3-256.

Reasons:

  • Cryptographically strong

  • Deterministic

  • Fast in software

  • Stable across platforms

  • Well-specified

The algorithm is part of the hashing contract.

Changing it requires a new version and a new Northstar.


Stability guarantees

For a finalized Strata version:

  • Hashes are stable forever

  • Identical values always hash identically

  • Hashes do not depend on:

    • CPU architecture

    • Endianness

    • Compiler version

    • Optimization level

    • Programming language

If two independent implementations disagree on a hash, at least one is wrong.


Structural sensitivity

Hashes are sensitive to structure, not intent.

Examples:

  • List order affects the hash

  • Map key order does not affect the hash (canonical sorting)

  • Changing any value changes the hash

  • Changing nesting changes the hash

This is intentional.

Strata hashes describe exact structure, not semantic similarity.


Non-canonical inputs

Hashing never accepts non-canonical input.

If bytes are non-canonical:

  • They must be decoded

  • Then re-encoded canonically

  • Then hashed

Hashing raw, non-canonical bytes is forbidden at the API level.

This prevents hash divergence and ambiguity.


Hashing raw bytes

If raw bytes are already known to be canonical, they may be hashed directly.

This is a performance optimization, not a semantic change.

The bytes must still be:

  • Valid Strata Core Binary

  • Canonically encoded

  • Complete and single-value

Otherwise, the result is undefined behavior.


Cross-language contract

The hashing contract is global.

All implementations must agree on:

  • Canonical encoding rules

  • Hash algorithm

  • Hash length

  • Byte order

No implementation may "optimize" hashing semantics.


Northstar enforcement

Hashing behavior is enforced by Northstar tests:

  • T1: Hash survives encode → decode → encode across languages

  • T2: Hash survives raw wire transport

  • T3: Hash survives framed streaming transport

If any Northstar fails, hashing is broken.


What hashing does NOT guarantee

Hashing does not guarantee:

  • Schema compatibility

  • Backward compatibility

  • Semantic equivalence

  • Business meaning

It guarantees byte-level identity only.


Summary

Hashing in Strata is:

  • Deterministic

  • Canonical

  • Structural

  • Cross-language

  • Non-negotiable

If hashing is wrong, Strata is wrong.

There is no fallback.

Last updated

Was this helpful?