Data Artifex

*/ˈar.ti.feks/, [ˈärt̪ɪfɛks̠] 1. artist, actor 2. author, maker 3. craftsman 4. master of an art 5. mastermind*

Making High-Value Data Accessible, Interoperable, and Machine-Actionable.

Data is the backbone of society — yet most of it remains locked, misunderstood, and out of reach.

Modern data publishing is flawed. Datasets often ship without context, sparse documentation is made for humans, and machine-actionable metadata is lacking. The outcome: researchers, developers, policymakers, and AI systems all stall at the same bottleneck — struggling to find or interpret the data before they can utilize it.

Data Artifex is an open-source initiative solving this at the root. We build the metadata- and API-driven tools needed to elevate High-Value Datasets into intelligent, machine-actionable digital assets, bridging the gap between static data and machine-driven intelligence. By enabling a global ecosystem of self-documenting, FAIR-compliant data products designed for the era of machine intelligence, we make data truly self-describing. When data can explain itself, humans stop wrangling and machines start reasoning—accelerating both scientific discovery and humanitarian impact.

This project is currently in an early incubation phase.

Supporting Data Artifex ❤️

This project is open-source and looking for sponsors! Your support helps us maintain infrastructure, improve documentation, and accelerate development.

👉 View our Sponsorship Page or Sponsor us on GitHub.

Our Vision 🚀

We aim to modernize the way binary and text data are published, discovered, and utilized across the globe:

AI-Ready Documentation: Foster metadata that is equally accessible to humans and machines.
API-First Publishing: Rapidly expose data and catalogs through scalable APIs.
Machine Intelligence: Unlock automated data discovery and intelligent processing.
Efficiency: Dramatically reduce the time spent on manual data wrangling.
Standardized Power: Build on global standards like DDI, DCAT, and FAIR principles.
Natural Interface: Enable natural language-driven data management.

The Ecosystem 🛠️

Collaborating with leading organizations, research communities, and data custodians, we are building a collection of specialized Python packages powered by metadata standards, knowledge graphs, and intelligent agents.

Marketplace of Tools

Repository	Description
Release Candidate / MVP	Relatively stable
rdf-toolkit	Core engine for RDF and Semantic Metadata
ddi-toolkit	Support and utilities for DDI-CDI and DDI-Codebook
dartfx-fairproxy-api	FAIR metadata proxy APIs
dartfx-unf	Universal Numeric Fingerprint (UNF) for data hashing
postman-api	Python client for Postman API integration
Beta / Prototypes	Functional for early adopters
ckan-toolkit	CKAN catalog harvesting and exploration
dataverse-toolkit	Dataverse catalog harvesting and exploration
dcat-toolkit	Support and utilities for DCAT
mtnards-toolkit	Integration with MTNA Rich Data Services platform
nada-toolkit	NADA (World Bank) catalogs harvester
postman-toolkit	Postman FAIR collection generation and utilities
socrata-toolkit	Integration with Socrata (Data Insights) platform
usbls-toolkit	Harvesting and FAIRification of U.S. Bureau of Labor Statistics time series raw data
Alpha / Experimental	Research and development
dartfx-cli	Command line shell for Data Artifex tools and packages
dartfx-utils	Collection of utilities and shared resources
dartfx-workspace	FAIR data workspace management
fair-data-machine	A Docker image for data FAIRification tools and software
qsv-toolkit	Integration with datHere QSV data wrangling toolkit
uscensus-toolkit	Integration with US Census Bureau API and data products

High-Value Data 🌍

Our focus is on datasets that drive global impact. From socio-economic indicators to environmental health, we believe that user-friendly APIs are essential for humanitarian efforts, scientific progress, and policy making.

Data Artifex directly supports the High-Value Data Network — our sister project and community platform advancing the discovery, access, and use of datasets that matter most.

Global Standards 📏

We build on the shoulders of giants. Our strategy leverages:

FAIR Principles (Findable, Accessible, Interoperable, Reusable)
CODATA Cross-Domain Integration Framework
Standards: DDI, DCAT, MLCommons Croissant, Schema.org, RO-Crate, ODRL.
Tech: JSON Schema, Semantic Web, Python, APIs.

Strategic Partners 🤝

About

This project is led by Pascal Heus (@kulnor) and a proactive team of agents and contributors. Pascal, a dedicated and passionate data engineer and information technologist, is driven by a vision to improve the usability of high-value data to support research, scientific innovation, and policymaking, thereby contributing to the greater good of society and a sustainable future. We warmly welcome both human and digital collaborators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Artifex

Making High-Value Data Accessible, Interoperable, and Machine-Actionable.

Supporting Data Artifex ❤️

Our Vision 🚀

The Ecosystem 🛠️

Marketplace of Tools

High-Value Data 🌍

Global Standards 📏

Strategic Partners 🤝

About

Popular repositories Loading

Repositories

People

Top languages

Most used topics

Uh oh!