This repository supports a pairwise + SQLite ontology alignment workflow and now includes a Streamlit app to run the full pipeline with intermediate previews and exports.
Linux
# Debian/Ubuntu
sudo apt-get install libraptor2-dev
# Fedora/RHEL/CentOS
sudo yum install raptor2-devel
# Arch Linux
sudo pacman -S raptor
# openSUSE
sudo zypper install raptor-develmacOS
brew install raptorWindows
No native package manager support. Options:
- Install via Conda:
conda install -c conda-forge raptor2 - Or use WSL and follow the Linux instructions above
uv sync- Required:
rapperCLI (from Raptor RDF toolkit) for TTL extraction.
Install examples:
# macOS (Homebrew)
brew install raptor
# Ubuntu/Debian
sudo apt-get install -y raptor2-utilsuv run streamlit run streamlit_app.py- Overview
- Fetch schemas and ontologies - Download external sources + browse OLS catalog
- Extract term TSV from TTL
- Generate pairwise candidates (local/local or local/OLS)
- Curate candidate decisions (quick actions + batch edits)
- Review curated dataset and export TTL
- View schema documentation (pyLODE), interactive graph, RDFGlance linkout, and Mermaid export before/after curation
- Background sync - candidate TSVs are synced automatically to SQLite and reconciled exports
- Inspect SQLite tables and run SQL queries
Ontologies: distinct vocabularies/knowledge models loaded in OLS (for example CHEBI, EDAM, BIOLINK).Classes: concept types in those ontologies (for examplechemical entity,analysis,sample).Properties: relationship or attribute predicates used to connect/describe entities (for examplepart_of,has_role,label).Individuals: concrete instances (named entities) rather than concept types.
Keep these in Git:
registry/external_sources.tsvregistry/pair_alignment_candidates_<source>.tsv
Each schema should have one TSV that contains:
- approved review decisions
This versioned TSV is the shared review ledger and SSOT for exports.
It is kept deterministically sorted by source_term_iri and written with LF line endings so Git diffs stay row-local.
Minimal shared ledger columns:
source_term_sourcesource_term_irisource_term_labelsource_term_kindcanonical_term_iricanonical_term_labelcanonical_term_sourcecanonical_term_kindrelation(optional semantic mapping metadata)statuscuratorcurator_namereviewerreviewer_namedate_reviewedcuration_comment
Do not version these generated caches/exports:
registry/alignment_curation.sqliteregistry/reconciled_mappings.tsvregistry/reconciled_canonical_groups.tsvregistry/pair_alignments.tsvregistry/downloads/registry/imports/*_terms.tsvregistry/exports/registry/work/registry/ols_ontologies.tsvregistry/ols_ontologies_meta.jsonregistry/mapping_relations_catalog.jsonregistry/schema_docs/
registry/work/pair_alignment_candidates_<source>.tsv is the local working queue.
Use it for focused regeneration and in-progress curation. Only approved decisions are synced back to the shared ledger.
Use shared schema files (no per-curator filenames) and collaborate through branches/PRs.
- Pull latest
mainand create a branch:curation/<schema>-<short-topic>. - In the app sidebar, select:
Source ID(schema you curate)Curator ORCID(must resolve to a public ORCID name)
- The shared review ledger for that schema is:
registry/pair_alignment_candidates_<source>.tsv
- Generate and curate locally in:
registry/work/pair_alignment_candidates_<source>.tsv
- Commit only the finalized shared ledger plus any intentional manifest edits:
registry/pair_alignment_candidates_<source>.tsv
- Open a PR with a short summary:
- schema curated
- terms reviewed
- notable manual additions/rejections
- Reviewer checks diff + app preview, then merges.
Notes:
curatormay beautoor a valid ORCID. When it is an ORCID,curator_namestores the public ORCID name.reviewermust be a valid ORCID on reviewed rows.reviewer_namestores the public ORCID name.- Reviewer attribution is stored in TSV
reviewer/reviewer_name/date_reviewedfields. - SQLite/reconciled exports are local cache files and should not be manually edited or committed.
- Focused local regeneration is safe because
needs_reviewqueue rows stay underregistry/work/.