Strata Text parser
Strata Text (.st) exists for humans.
The parser exists to turn that human-friendly syntax into an exact, deterministic Value.
The parser is not permissive, not heuristic, and not forgiving. It is a compiler frontend with explicit failure semantics.
Role of the parser
The Strata Text parser performs exactly one job:
Convert Strata Text into a Strata Value model without ambiguity.
It does not:
Infer types
Coerce values
Normalize structure
Apply canonical ordering
Perform hashing
Parsing is upstream of all canonical behavior.
Pipeline position
The full Strata pipeline looks like this:
The parser stops at Value.
Canonical guarantees begin after parsing.
Lexer responsibility
The lexer transforms raw text into tokens.
Token categories include:
Literals: null, true, false, integers, strings, bytes
Identifiers
Punctuation:
{ } [ ] : ,End-of-file
Whitespace and comments are ignored.
Supported comments:
Line comments starting with
//Line comments starting with
#
Comments never appear in the token stream.
Lexical constraints
The lexer enforces strict lexical rules:
Integers must fit in signed 64-bit
Bytes literals must be valid hex
Strings must be valid UTF-8
Strings cannot span multiple lines
Identifiers are ASCII-only
Invalid characters cause immediate failure.
There is no recovery.
Integer parsing
Integers are parsed as signed 64-bit values.
Rules:
Optional leading
-Base-10 digits only
No underscores
No suffixes
No floats
If the value exceeds the i64 range, parsing fails.
This is intentional.
Bytes literals
Bytes are expressed explicitly using hex syntax.
Format:
Rules:
Must start with
0xMust contain an even number of hex digits
Must contain at least one byte
Case-insensitive hex digits allowed
Malformed bytes literals are rejected.
String literals
Strings use double quotes.
Supported escapes:
"
\
\n
\r
\t
\uXXXX (exactly four hex digits)
Rules:
Strings must be valid UTF-8
No multiline strings
No implicit normalization
Unicode codepoints are inserted as-is.
Grammar overview
The parser is recursive-descent and LL(1).
Value grammar (simplified):
Containers:
List:
[ value, value, ... ]Map:
{ key: value, ... }
Trailing commas are allowed.
Map syntax and shorthand
Maps support two forms.
Explicit form:
Shorthand form:
Shorthand expands to:
Shorthand nesting is allowed.
Duplicate keys
During parsing:
Duplicate keys are allowed
Last-write-wins semantics apply
Example:
Resulting Value:
This behavior is not canonical.
Duplicate keys are resolved during parsing to allow inspection and debugging.
Canonical encoding later enforces uniqueness.
Structural guarantees
Parsing guarantees:
Correct Value construction
Preserved structure
Exact literal interpretation
Deterministic output for identical input
Parsing does not guarantee:
Canonical ordering
Canonical encoding
Hash stability
Those belong to later stages.
Error model
All parser failures are explicit and structured.
Errors include:
Unexpected token
Malformed bytes literal
Integer out of range
Invalid escape sequence
Extra input after value
Each error includes:
Exact byte offset
Line number
Column number
There are no panics on user input.
End-of-input enforcement
After parsing a value, the parser requires EOF.
Extra tokens after a valid value cause failure.
This prevents partial parses and silent truncation.
Round-trip expectations
Valid .st input must satisfy:
Where:
Value' is structurally equivalent to the parsed Value
Ordering differences may exist
Semantic meaning is preserved
This property is enforced by round-trip tests.
Why the parser is strict
Strata Text is a source language, not a configuration hack.
Strict parsing ensures:
Predictability
Tooling safety
Deterministic compilation
Cross-language portability
Relaxing the parser would leak ambiguity into canonical layers.
Summary
The Strata Text parser is:
Deterministic
Strict
Explicit
Non-magical
It converts human-written text into an exact Value.
Everything that matters happens after.
Last updated
Was this helpful?