*/ˈar.ti.feks/, [ˈärt̪ɪfɛks̠] 1. artist, actor 2. author, maker 3. craftsman 4. master of an art 5. mastermind*
Data is the backbone of society — yet most of it remains locked, misunderstood, and out of reach.
Modern data publishing is flawed. Datasets often ship without context, sparse documentation is made for humans, and machine-actionable metadata is lacking. The outcome: researchers, developers, policymakers, and AI systems all stall at the same bottleneck — struggling to find or interpret the data before they can utilize it.
Data Artifex is an open-source initiative solving this at the root. We build the metadata- and API-driven tools needed to elevate High-Value Datasets into intelligent, machine-actionable digital assets, bridging the gap between static data and machine-driven intelligence. By enabling a global ecosystem of self-documenting, FAIR-compliant data products designed for the era of machine intelligence, we make data truly self-describing. When data can explain itself, humans stop wrangling and machines start reasoning—accelerating both scientific discovery and humanitarian impact.
This project is currently in an early incubation phase.
This project is open-source and looking for sponsors! Your support helps us maintain infrastructure, improve documentation, and accelerate development.
👉 View our Sponsorship Page or Sponsor us on GitHub.
We aim to modernize the way binary and text data are published, discovered, and utilized across the globe:
- AI-Ready Documentation: Foster metadata that is equally accessible to humans and machines.
- API-First Publishing: Rapidly expose data and catalogs through scalable APIs.
- Machine Intelligence: Unlock automated data discovery and intelligent processing.
- Efficiency: Dramatically reduce the time spent on manual data wrangling.
- Standardized Power: Build on global standards like DDI, DCAT, and FAIR principles.
- Natural Interface: Enable natural language-driven data management.
Collaborating with leading organizations, research communities, and data custodians, we are building a collection of specialized Python packages powered by metadata standards, knowledge graphs, and intelligent agents.
| Repository | Description |
|---|---|
| Release Candidate / MVP | Relatively stable |
| rdf-toolkit | Core engine for RDF and Semantic Metadata |
| ddi-toolkit | Support and utilities for DDI-CDI and DDI-Codebook |
| dartfx-fairproxy-api | FAIR metadata proxy APIs |
| dartfx-unf | Universal Numeric Fingerprint (UNF) for data hashing |
| postman-api | Python client for Postman API integration |
| Beta / Prototypes | Functional for early adopters |
| ckan-toolkit | CKAN catalog harvesting and exploration |
| dataverse-toolkit | Dataverse catalog harvesting and exploration |
| dcat-toolkit | Support and utilities for DCAT |
| mtnards-toolkit | Integration with MTNA Rich Data Services platform |
| nada-toolkit | NADA (World Bank) catalogs harvester |
| postman-toolkit | Postman FAIR collection generation and utilities |
| socrata-toolkit | Integration with Socrata (Data Insights) platform |
| usbls-toolkit | Harvesting and FAIRification of U.S. Bureau of Labor Statistics time series raw data |
| Alpha / Experimental | Research and development |
| dartfx-cli | Command line shell for Data Artifex tools and packages |
| dartfx-utils | Collection of utilities and shared resources |
| dartfx-workspace | FAIR data workspace management |
| fair-data-machine | A Docker image for data FAIRification tools and software |
| qsv-toolkit | Integration with datHere QSV data wrangling toolkit |
| uscensus-toolkit | Integration with US Census Bureau API and data products |
Our focus is on datasets that drive global impact. From socio-economic indicators to environmental health, we believe that user-friendly APIs are essential for humanitarian efforts, scientific progress, and policy making.
Data Artifex directly supports the High-Value Data Network — our sister project and community platform advancing the discovery, access, and use of datasets that matter most.
We build on the shoulders of giants. Our strategy leverages:
- FAIR Principles (Findable, Accessible, Interoperable, Reusable)
- CODATA Cross-Domain Integration Framework
- Standards: DDI, DCAT, MLCommons Croissant, Schema.org, RO-Crate, ODRL.
- Tech: JSON Schema, Semantic Web, Python, APIs.
This project is led by Pascal Heus (@kulnor) and a proactive team of agents and contributors. Pascal, a dedicated and passionate data engineer and information technologist, is driven by a vision to improve the usability of high-value data to support research, scientific innovation, and policymaking, thereby contributing to the greater good of society and a sustainable future. We warmly welcome both human and digital collaborators.




