Composable linguistic annotation lexicons for ATProto
Documentation • Discussions • Bluesky
Layers is a set of AT Protocol Lexicon schemas under the pub.layers.* namespace. It defines a composable interchange format for linguistic annotations across text, audio, video, and image modalities, psycholinguistic and neurolinguistic signals, and offline judgment and experiment data.
Layers subsumes 15+ major annotation data models (CoNLL, CoNLL-U, brat, ELAN, TEI, WebVTT, Universal Dependencies, AMT, SRL, ARK, and others) while maintaining a theory-neutral, modular architecture. All data lives in user-controlled Personal Data Servers (PDSes); Layers provides the schema and protocols for interoperability.
Layers is organized around a core pipeline of annotation layers, with parallel tracks for experimental and analytical workflows and integration layers connecting to the ATProto ecosystem:
Core Pipeline
Expression (any linguistic unit: document, paragraph, sentence, word, morpheme)
↓
Segmentation (tokenization strategies, token sequences)
↓
Annotation (linguistic labels: POS, NER, semantic roles, etc.)
Parallel Tracks
Ontology · Corpus · Resource · Judgment · Alignment
Integration Layers
Graph · Eprint · Media · Persona · Changelog
See the documentation for the full architecture, including dependency graphs and cross-referencing patterns.
14 lexicon namespaces define 26 record types and 90 lexicon schemas:
| Namespace | Purpose |
|---|---|
pub.layers.defs |
Shared primitives: anchors, selectors, metadata, cross-references |
pub.layers.expression |
Recursive document and expression model |
pub.layers.segmentation |
Tokenization strategies and token sequences |
pub.layers.annotation |
Annotation layers and cluster sets |
pub.layers.ontology |
Annotation type systems, role slots, and theoretical frameworks |
pub.layers.corpus |
Corpus collections with annotation design metadata |
pub.layers.resource |
Lexical entries, stimulus templates, fillings, and collections |
pub.layers.judgment |
Experiment definitions, judgment sets, and agreement reports |
pub.layers.alignment |
Cross-lingual and cross-modal structure correspondence |
pub.layers.graph |
Typed property graph nodes, edges, and edge sets |
pub.layers.persona |
Annotator personas and annotation frameworks |
pub.layers.media |
Audio, video, image, and signal metadata |
pub.layers.eprint |
Scholarly metadata and data provenance links |
pub.layers.changelog |
Structured change tracking with sub-record targeting |
layers/
├── lexicons/ # ATProto lexicon schemas (pub.layers.*)
│ └── pub/layers/ # 14 namespace directories, 90 JSON files
│ ├── defs.json # Shared primitives
│ ├── expression/ # Record types, queries, and namespace defs
│ ├── annotation/
│ ├── ...
│ └── changelog/
├── docs/ # Docusaurus documentation site (docs.layers.pub)
│ ├── docs/ # Markdown source files
│ ├── docusaurus.config.ts
│ └── sidebars.ts
├── CHANGELOG.md
└── validate-lexicons.mjs # Lexicon validation script
npm install
npm testThis project is in the design phase. Open issues or discussions to provide feedback on the lexicon design.
Licensed under CC-BY-SA-4.0.