Preserves Schema

Tony Garnock-Jones
June 2021. Version 0.1.3.

This document proposes a Schema language for the Preserves data model.


A Preserves schema connects Preserves Values to host-language data structures. Each definition within a schema can be processed by a compiler to produce

Every parsed Value retains enough information to always be able to be serialized again, and every instance of a host-language data structure contains, by construction, enough information to be successfully serialized.

Example. Sending the schema

version 1 .
Date = <date @year int @month int @day int>.
Person = <person @name string @birthday Date>.

to the TypeScript schema compiler produces types,

type Date = {"year": number, "month": number, "day": number};
type Person = {"name": string, "birthday": Date};


function Date({year, month, day}: {year: number, month: number, day: number}): Date;
function Person({name, birthday}: {name: string, birthday: Date}): Person;

partial parsing functions which throw on parse failure,

function asDate(v: _val): Date;
function asPerson(v: _val): Person;

total parsing functions which yield undefined on parse failure,

function toDate(v: _val): undefined | Date;
function toPerson(v: _val): undefined | Person;

and total serialization functions,

function fromDate(_v: Date): _val;
function fromPerson(_v: Person): _val;


Bundle. A collection of schemas, each named by a module path.

Definition. A named pattern within a schema. When compiled, a definition will usually produce a type (plus associated constructors and predicates), a parser function, and a serializer function.

Metaschema. The Preserves metaschema is a schema describing the abstract syntax of all schema instances (including itself).

Module path. A sequence of symbols, denoting a leaf in a tree with symbol-labelled edges.

Pattern. A pattern describes a collection of Values as well as providing names for the portions of matching Values that should be captured in a host-language data type.

Schema abstract syntax tree (AST). Schema-manipulating tools will usually work with schema AST; that is, with Values conforming to the metaschema or instances of the corresponding host-language datastructures.

Schema domain-specific language (DSL). While human beings can work directly with Preserves documents matching the metaschema, the schema DSL provides an easier-to-read and -write language for working with schemas that can be translated into instances

Schema. A collection of definitions, plus an optional schema-wide reference to a schema describing embedded values.

Identifiers and Capitalization Conventions

Throughout, id is used in the grammar to denote an identifier, which is a symbol that matches the regular expression ^[a-zA-Z][a-zA-Z_0-9]*$. This is a lowest-common-denominator constraint that allows for a reasonable mapping to the identifiers of many programming languages.

Identifiers are case-sensitive. Schemas should be written with an awareness of the fact that some programming languages cannot preserve case differences. Avoid using two identifiers in the same context that differ only in case.

Schemas should be written using the following capitalization conventions:

Concrete (DSL) Syntax

In this section, we use an ABNF-like notation to define a textual syntax that is easy for people to read and write. Most of the examples in this document are written using this syntax. In the following section, we will define the abstract syntax that this surface syntax translates into.

Schema files and bundles.

Each schema should be placed in a single file. Schema files usually end with extension .prs, and consist of a sequence of Preserves Values1 separated into clauses by the Preserves Symbol.”.

A bundle of schema files is a directory tree containing .prs files.


Clause            = (Version / EmbeddedTypeName / Include / Definition) "."

Version           = "version" "1"
EmbeddedTypeName  = "embeddedType" ("#f" / Ref)
Include           = "include" string
Definition        = id "=" (OrPattern / AndPattern / Pattern)

Version specification. Mandatory. Names the version of the schema language used in the file. This version of the specification is referred to in schema files as version 1.

Embedded type name. Optional. If given as #f (the default), it declares that values parsed by the schema do not contain embedded Values of any particular type. If given as a Ref, a reference to a definition in this or a neighbouring schema, it declares that embedded Values must themselves conform to the named definition.

Include. Experimental. Includes the contents of a neighbouring file as if it were textually inserted in place of this clause. The file path may be relative to the current file, or absolute.

Definition. Each definition clause implicitly connects a pattern with a type name and a set of associated functions.

Union definitions.

OrPattern = AltPattern "/" AltPattern *("/" AltPattern)

The right-hand-side of a definition may supply two or more alternatives. When parsing, the alternatives are tried in order; the result of the first successful alternative is the result of the entire parse.

The type corresponding to an OrPattern is a union type, a variant type, or an algebraic sum type, depending on the host language.

Each alternative with an OrPattern must have a definition-unique name. The name can either be given explicitly as @name (see discussion of NamedPattern below) or inferred. It can only be inferred from the label of a record pattern, from the name of a reference to another definition, or from the text of a “sufficiently identifierlike” literal pattern - one that matches a string, symbol, number or boolean:

AltPattern = "@" id SimplePattern
           / "<" id PatternSequence ">"
           / Ref
           / LiteralPattern  -- with a side condition

Intersection definitions.

AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern)

The right-hand-side of a definition may supply two or more patterns, the intersection of whose denotations is the denotation of the overall definition. When parsing, every pattern is tried; if all succeed, the resulting information is combined into a single record type.

When serializing, the terms resulting from serializing at each pattern are merged together.


Intersections are an experimental feature. They can be used to express optional dictionary entries:2

MyDict = {a: int, b: string} & @c MaybeC .
MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .

It is not yet clear whether they pull their weight. In particular, the semantics of serializing a value defined by intersection are not completely clear.


Pattern = SimplePattern / CompoundPattern

Patterns come in two kinds:

Simple patterns

SimplePattern = AnyPattern
              / AtomKindPattern
              / EmbeddedPattern
              / LiteralPattern
              / SequenceOfPattern
              / SetOfPattern
              / DictOfPattern
              / Ref

The any pattern matches any input Value:

AnyPattern = "any"

Specifying the name of a kind of Atom matches that kind of atom:

AtomKindPattern = "bool" / "float" / "double" / "int" / "string" / "bytes" / "symbol"

Embedded input Values are matched with embedded patterns. The portion under the #! prefix is the interface schema for the embedded value.3 The result of a match is an instance of the schema-wide embeddedType, if one is supplied.

EmbeddedPattern = "#!" SimplePattern

A literal pattern may be expressed in any of three ways: non-symbol atoms stand for themselves directly; symbols, prefixed with an equal sign, are matched literally; and any Value at all may be quoted by placing it in a <<lit> ... > record:

LiteralPattern = "="symbol / "<<lit>" value ">" / non-symbol-atom

Brackets containing an item pattern and a literal ellipsis match a sequence of items, each matching the nested item pattern. Sets and uniform dictionaries are similar.

SequenceOfPattern = "[" SimplePattern "..." "]"
SetOfPattern = "#{" SimplePattern "}"
DictOfPattern = "{" SimplePattern ":" SimplePattern "...:..." "}"

Finally, a reference to some other definition, in this schema or a neighbouring schema within this bundle, is made by mentioning the possibly-qualified name of the definition as a bare symbol:

Ref = symbol

Periods “.” in such symbols are special:

Each period-separated portion of a reference name must be an id, an identifier.

Compound patterns

CompoundPattern = RecordPattern
                / TuplePattern
                / VariableTuplePattern
                / DictionaryPattern

A record pattern matches an input record. It may be specified as a record with a literal in the label position, or as a quoted <<rec> ... > record with a pattern for each of the label and field-sequence positions:4

RecordPattern = "<<rec>" NamedPattern NamedPattern ">"
              / "<" value PatternSequence ">"

PatternSequence = *(NamedPattern) [NamedSimplePattern "..."]

A tuple pattern matches a fixed-length sequence with specific patterns in each position. A variable tuple pattern is the same, but with an additional pattern for matching additional elements following the fixed-position patterns.

TuplePattern = "[" *(NamedPattern) "]"
VariableTuplePattern = "[" *(NamedPattern) NamedSimplePattern "..." "]"

A dictionary pattern matches specific literal keys in an input dictionary. If no explicit name is given for a particular NamedSimplePattern, but the key for the pattern is a symbol, then that symbol is used as the name for that dictionary entry.

DictionaryPattern = "{" *(value ":" NamedSimplePattern) "}"

Identifiers and Bindings: NamedPattern and NamedSimplePattern

Compound patterns specifications contain NamedPatterns or NamedSimplePatterns rather than ordinary Patterns:

NamedPattern       = "@" id SimplePattern / Pattern
NamedSimplePattern = "@" id SimplePattern / SimplePattern

Use of an @name prefix generally results in creation of a field with the given name in the overall record type for a definition. The type of value contained in the field will correspond to the Pattern or SimplePattern given.

Appendix: Metaschema

The metaschema defines the structure of the abstract syntax (AST) of schemas, using the concrete DSL syntax described above.

The text below is taken from schema/schema.prs in the source code repository.

A Bundle collects a number of Schemas, each named by a ModulePath:5

Bundle = <bundle @modules Modules>.
Modules = { ModulePath: Schema ...:... }.
ModulePath = [symbol ...].

Schema = <schema {
  version: Version
  embeddedType: EmbeddedTypeName
  definitions: Definitions

A Version names the version of the schema language in use. At present, it must be 1.

; version 1 .
Version = 1 .

An EmbeddedTypeName specifies the type of embedded values within values parsed by a given schema:

EmbeddedTypeName = Ref / #f.
Ref = <ref @module ModulePath @name symbol>.

The Definitions are a named collection of definitions within a schema. Note the special mention of pattern0 and pattern1: these ensure that each or or and record has at least two members.

Definitions = { symbol: Definition ...:... }.

Definition =
  ; Pattern / Pattern / ...
  / <or [@pattern0 NamedAlternative
         @pattern1 NamedAlternative
         @patternN NamedAlternative ...]>

  ; Pattern & Pattern & ...
  / <and [@pattern0 NamedPattern
          @pattern1 NamedPattern
          @patternN NamedPattern ...]>

  ; Pattern
  / Pattern

NamedAlternative = [@variantLabel string @pattern Pattern].

Each Pattern is either a simple or compound pattern:

Pattern = SimplePattern / CompoundPattern .

Simple patterns are as described above:

SimplePattern =
  ; any
  / =any

  ; special builtins: bool, float, double, int, string, bytes, symbol
  / <atom @atomKind AtomKind>

  ; matches an embedded value in the input: #!p
  / <embedded @interface SimplePattern>

  ; =symbol, <<lit> any>, or plain non-symbol atom
  / <lit @value any>

  ; [p ...] ----> <seqof <ref p>>; see also tuplePrefix below.
  / <seqof @pattern SimplePattern>

  ; #{p} ----> <setof <ref p>>
  / <setof @pattern SimplePattern>

  ; {k: v, ...:...} ----> <dictof <ref k> <ref v>>
  / <dictof @key SimplePattern @value SimplePattern>

  ; symbol, symbol.symbol, symbol.symbol.symbol, ...
  / Ref

AtomKind = =Boolean
         / =Float
         / =Double
         / =SignedInteger
         / =String
         / =ByteString
         / =Symbol .

Compound patterns involve optionally-named subpatterns:

CompoundPattern =
  ; <label a b c> ----> <rec <lit label> <tuple [<ref a> <ref b> <ref c>]>>
  ; except for record labels
  ; <<rec> x y> ---> <rec <ref x> <ref y>>
  / <rec @label NamedPattern @fields NamedPattern>

  ; [a b c] ----> <tuple [<ref a> <ref b> <ref c>]>
  / <tuple @patterns [NamedPattern ...]>

  ; [a b c ...] ----> <tuplePrefix [<ref a> <ref b>] <seqof <ref c>>>
  / <tuplePrefix @fixed [NamedPattern ...] @variable NamedSimplePattern>

  ; {a: b, c: d} ----> <dict {a: <ref b>, c: <ref d>}>
  / <dict @entries DictionaryEntries>

DictionaryEntries = { any: NamedSimplePattern ...:... }.

Explicitly-named subpatterns are always SimplePatterns; but, depending on context, if a name is omitted, the pattern may be a Pattern or may be restricted to SimplePattern as well:

NamedSimplePattern = @named Binding / @anonymous SimplePattern .
NamedPattern = @named Binding / @anonymous Pattern .
Binding = <named @name symbol @pattern SimplePattern>.

Appendix: Metaschema instance

The following is a (lightly-reformatted) Preserves document which is the output of DSL-to-AST compilation of the DSL source text of the metaschema.

<schema {
  version: 1,
  embeddedType: #f,
  definitions: {

    Pattern: <or [
      ["SimplePattern", <ref [] SimplePattern>],
      ["CompoundPattern", <ref [] CompoundPattern>]

    CompoundPattern: <or [
      ["rec", <rec <lit rec> <tuple [
          <named label <ref [] NamedPattern>>,
          <named fields <ref [] NamedPattern>>
      ["tuple", <rec <lit tuple> <tuple [<named patterns <seqof <ref [] NamedPattern>>>]>>],
      ["tuplePrefix", <rec <lit tuplePrefix> <tuple [
          <named fixed <seqof <ref [] NamedPattern>>>,
          <named variable <ref [] NamedSimplePattern>>
      ["dict", <rec <lit dict> <tuple [<named entries <ref [] DictionaryEntries>>]>>]

    Modules: <dictof <ref [] ModulePath> <ref [] Schema>>,

    Ref: <rec <lit ref> <tuple [
      <named module <ref [] ModulePath>>,
      <named name <atom Symbol>>

    Bundle: <rec <lit bundle> <tuple [<named modules <ref [] Modules>>]>>,

    Binding: <rec <lit named> <tuple [
      <named name <atom Symbol>>,
      <named pattern <ref [] SimplePattern>>

    Definition: <or [
      ["or", <rec <lit or> <tuple [<tuplePrefix [
          <named pattern0 <ref [] NamedAlternative>>,
          <named pattern1 <ref [] NamedAlternative>>
        ] <named patternN <seqof <ref [] NamedAlternative>>>>]>>],
      ["and", <rec <lit and> <tuple [<tuplePrefix [
          <named pattern0 <ref [] NamedPattern>>,
          <named pattern1 <ref [] NamedPattern>>
        ] <named patternN <seqof <ref [] NamedPattern>>>>]>>],
      ["Pattern", <ref [] Pattern>]

    NamedSimplePattern: <or [
      ["named", <ref [] Binding>],
      ["anonymous", <ref [] SimplePattern>]

    EmbeddedTypeName: <or [
      ["Ref", <ref [] Ref>],
      ["false", <lit #f>]

    ModulePath: <seqof <atom Symbol>>,

    AtomKind: <or [
      ["Boolean", <lit Boolean>],
      ["Float", <lit Float>],
      ["Double", <lit Double>],
      ["SignedInteger", <lit SignedInteger>],
      ["String", <lit String>],
      ["ByteString", <lit ByteString>],
      ["Symbol", <lit Symbol>]

    DictionaryEntries: <dictof any <ref [] NamedSimplePattern>>,

    Version: <lit 1>,

    NamedPattern: <or [
      ["named", <ref [] Binding>],
      ["anonymous", <ref [] Pattern>]

    SimplePattern: <or [
      ["any", <lit any>],
      ["atom", <rec <lit atom> <tuple [<named atomKind <ref [] AtomKind>>]>>],
      ["embedded", <rec <lit embedded> <tuple [<named interface <ref [] SimplePattern>>]>>],
      ["lit", <rec <lit lit> <tuple [<named value any>]>>],
      ["seqof", <rec <lit seqof> <tuple [<named pattern <ref [] SimplePattern>>]>>],
      ["setof", <rec <lit setof> <tuple [<named pattern <ref [] SimplePattern>>]>>],
      ["dictof", <rec <lit dictof> <tuple [
          <named key <ref [] SimplePattern>>,
          <named value <ref [] SimplePattern>>
      ["Ref", <ref [] Ref>]

    NamedAlternative: <tuple [
      <named variantLabel <atom String>>,
      <named pattern <ref [] Pattern>>

    Definitions: <dictof <atom Symbol> <ref [] Definition>>,

    Schema: <rec <lit schema> <tuple [<dict {
      version: <named version <ref [] Version>>,
      embeddedType: <named embeddedType <ref [] EmbeddedTypeName>>,
      definitions: <named definitions <ref [] Definitions>>

Appendix: Example generated types

The following are the (abridged) TypeScript and Racket generated type definitions for the metaschema.


import * as _ from "@preserves/core";

// ...
export type _embedded = any;
export type _val = _.Value<_embedded>;
// ...

export type Bundle = {"modules": Modules};

export type Modules = _.KeyedDictionary<ModulePath, Schema, _embedded>;

export type Schema = {
    "version": Version,
    "embeddedType": EmbeddedTypeName,
    "definitions": Definitions

export type Version = null;

export type EmbeddedTypeName = ({"_variant": "Ref", "value": Ref} | {"_variant": "false"});

export type Definitions = _.KeyedDictionary<symbol, Definition, _embedded>;

export type Definition = (
        "_variant": "or",
        "pattern0": NamedAlternative,
        "pattern1": NamedAlternative,
        "patternN": Array<NamedAlternative>
    } |
        "_variant": "and",
        "pattern0": NamedPattern,
        "pattern1": NamedPattern,
        "patternN": Array<NamedPattern>
    } |
    {"_variant": "Pattern", "value": Pattern}

export type Pattern = (
    {"_variant": "SimplePattern", "value": SimplePattern} |
    {"_variant": "CompoundPattern", "value": CompoundPattern}

export type SimplePattern = (
    {"_variant": "any"} |
    {"_variant": "atom", "atomKind": AtomKind} |
    {"_variant": "embedded", "interface": SimplePattern} |
    {"_variant": "lit", "value": _val} |
    {"_variant": "seqof", "pattern": SimplePattern} |
    {"_variant": "setof", "pattern": SimplePattern} |
    {"_variant": "dictof", "key": SimplePattern, "value": SimplePattern} |
    {"_variant": "Ref", "value": Ref}

export type CompoundPattern = (
    {"_variant": "rec", "label": NamedPattern, "fields": NamedPattern} |
    {"_variant": "tuple", "patterns": Array<NamedPattern>} |
        "_variant": "tuplePrefix",
        "fixed": Array<NamedPattern>,
        "variable": NamedSimplePattern
    } |
    {"_variant": "dict", "entries": DictionaryEntries}

export type DictionaryEntries = _.KeyedDictionary<_val, NamedSimplePattern, _embedded>;

export type AtomKind = (
    {"_variant": "Boolean"} |
    {"_variant": "Float"} |
    {"_variant": "Double"} |
    {"_variant": "SignedInteger"} |
    {"_variant": "String"} |
    {"_variant": "ByteString"} |
    {"_variant": "Symbol"}

export type NamedAlternative = {"variantLabel": string, "pattern": Pattern};

export type NamedSimplePattern = (
    {"_variant": "named", "value": Binding} |
    {"_variant": "anonymous", "value": SimplePattern}

export type NamedPattern = (
    {"_variant": "named", "value": Binding} |
    {"_variant": "anonymous", "value": Pattern}

export type Binding = {"name": symbol, "pattern": SimplePattern};

export type Ref = {"module": ModulePath, "name": symbol};

export type ModulePath = Array<symbol>;


(struct AtomKind-Symbol () #:prefab)
(struct AtomKind-ByteString () #:prefab)
(struct AtomKind-String () #:prefab)
(struct AtomKind-SignedInteger () #:prefab)
(struct AtomKind-Double () #:prefab)
(struct AtomKind-Float () #:prefab)
(struct AtomKind-Boolean () #:prefab)

(struct Bundle (modules) #:prefab)

(struct CompoundPattern-dict (entries) #:prefab)
(struct CompoundPattern-tuplePrefix (fixed variable) #:prefab)
(struct CompoundPattern-tuple (patterns) #:prefab)
(struct CompoundPattern-rec (label fields) #:prefab)

(struct Definition-Pattern (value) #:prefab)
(struct Definition-and (pattern0 pattern1 patternN) #:prefab)
(struct Definition-or (pattern0 pattern1 patternN) #:prefab)

(struct EmbeddedTypeName-false () #:prefab)
(struct EmbeddedTypeName-Ref (value) #:prefab)

(struct NamedAlternative (variantLabel pattern) #:prefab)

(struct NamedPattern-anonymous (value) #:prefab)
(struct NamedPattern-named (value) #:prefab)

(struct NamedSimplePattern-anonymous (value) #:prefab)
(struct NamedSimplePattern-named (value) #:prefab)

(struct Binding (name pattern) #:prefab)

(struct Pattern-CompoundPattern (value) #:prefab)
(struct Pattern-SimplePattern (value) #:prefab)

(struct Ref (module name) #:prefab)

(struct Schema (definitions embeddedType version) #:prefab)

(struct SimplePattern-Ref (value) #:prefab)
(struct SimplePattern-dictof (key value) #:prefab)
(struct SimplePattern-setof (pattern) #:prefab)
(struct SimplePattern-seqof (pattern) #:prefab)
(struct SimplePattern-lit (value) #:prefab)
(struct SimplePattern-embedded (interface) #:prefab)
(struct SimplePattern-atom (atomKind) #:prefab)
(struct SimplePattern-any () #:prefab)

Appendix: Future work


  1. That is, schema files use Preserves as a kind of S-expression! 

  2. This encoding is not ideal. It passes responsibility for checking for invalid inputs up to the user, rather than handling it completely at the Schema layer. 

  3. Embedded patterns are experimental. One interpretation is that an embedded value denotes a reference to some stateful actor in a potentially-distributed system, and that the interface schema associated with an embedded value describes the messages that may be sent to that actor.

    Examples. #!any may denote a reference to an Actor able to receive any value as a message; #!#t, a reference to an Actor expecting only the “true” message; #!Session, a reference to an Actor expecting any message matching a schema defined as Session in this file. 

  4. Note that <label ps> can be thought of as roughly equivalent to <<rec> <<lit> label> [ps]>. The following two definitions are equivalent:

    D1 =              <foo   @a string @b string @extra any ... >.
    D2 = <<rec> <<lit> foo> [@a string @b string @extra any ...]>.

  5. The semantics of module path references remain to be specified!