README

Repository: federator
Description: Provides a mechanism to exchanged data between Integration Architecture (IA) nodes

Overview

This repository contributes to the development of secure, scalable, and interoperable data-sharing infrastructure. It supports NDTP’s mission to enable trusted, federated, and decentralised data-sharing across organisations.

This repository is one of several open-source components that underpin NDTP’s Integration Architecture (IA)—a framework designed to allow organisations to manage and exchange data securely while maintaining control over their own information. The IA is actively deployed and tested across multiple sectors, ensuring its adaptability and alignment with real-world needs.

For a complete overview of the Integration Architecture (IA) project, please see the Integration Architecture Documentation.

Prerequisites

Java 21
This repo uses a maven wrapper so no installation of maven is required.
Docker
Git
Management-node

Quick Start

Follow these steps to get started quickly with this repository. For detailed installation, configuration, and deployment, refer to the relevant MD files.

1. Download and Build

To download from the github repository run the following commands:

git clone https://github.com/National-Digital-Twin/federator.git
cd federator

To run a demo with multiple Federator clients and multiple Federator servers run the following commands from the project root directory:

Compile the java source code: (Replace ./mvnw with mvn to use the maven without the wrapper)

./mvnw clean install

Build the docker containers:

docker compose --file docker/docker-compose-multiple-clients-multiple-server.yml build

2. Run Build Version

Run the docker containers:

docker compose --file docker/docker-compose-multiple-clients-multiple-server.yml up

You should then see the service running within docker containers. These contain multiple clients and multiple servers and their supporting services.
The service will move the data from the topic(s) in the kafka-src to federated topic(s) in kafka-target.

3. Installation

Refer to INSTALLATION.md for detailed installation steps, including required dependencies and setup configurations.

4. Uninstallation

For steps to remove this repository and its dependencies, see UNINSTALL.md.

Features

The federator enables secure data exchange between Integration Architecture nodes, supporting both server (producer) and client (consumer) roles. Key features include:

Data Federation

Secure, scalable data sharing using Kafka as both source and target.
Multiple federator servers and clients per organisation for flexible deployment.
Filtering of Kafka messages for federation is based on the securityLabel in the Kafka message header and the client’s credentials. The default filter performs an exact match between the client’s credentials and the securityLabel header (e.g., Security-Label:nationality=GBR).
Custom filtering logic can be configured; see Configuring a Custom Filter for details.
Communication between federator servers and clients uses gRPC over mTLS for secure, authenticated data transfer.
Federation currently supports RDF payloads, with extensibility hooks for other data formats on a per-topic basis.

File Streaming

File Transfer via gRPC: Stream large files from server to client as chunked messages over the GetFilesStream RPC endpoint.
Multi-Cloud Storage Support: Both producer (server) and consumer (client) support multiple storage backends:
- Server (Producer): Read files from AWS S3, Azure Blob Storage, Google Cloud Storage (GCP), or Local filesystem
- Client (Consumer): Write files to AWS S3, Azure Blob Storage, Google Cloud Storage (GCP), or Local filesystem
Integrity Verification: SHA-256 checksums ensure file integrity during transfer
Resume Support: Continue interrupted transfers using sequence IDs to avoid re-transferring complete files
Graceful Error Handling: Server sends StreamWarning messages for invalid requests without terminating the stream, allowing subsequent files to be processed
S3-Compatible Storage: Support for MinIO and other S3-compatible storage systems
Azure Support: Works with Azure Blob Storage and Azurite emulator for local development
GCP Support: Works with Google Cloud Storage and fake-gcs-server emulator for local development

For detailed file streaming documentation, see File Streaming README.

Common Infrastructure

Integration with Management-Node for centralised configuration, topic management, and authorisation.
Redis is used for offset tracking and short-lived configuration caching.
JWT-based authentication with Identity Provider (e.g., Keycloak) for consumer verification and authorisation.

An overview of the Federator service architecture is shown below:

The diagram above shows how the Federator can be used to exchange data between Integration Architecture Nodes that are running within many different organisations. Each organisation could typically run many servers (producers) and many clients (consumers) to exchange data between their Integration Architecture Nodes.

For example within the above diagram:

Organisation 2 (Org 2) is shown to be running two servers, with one named "Producer Node A1" that is sending messages to the topic named "DP1"
Organisation 1 (Org 1) is shown to be running a client called "Consumer Node B2" which is reading the messages from the topic named "DP1"

It should be further noted that this diagram shows that many servers (or producers) and many clients (or consumers) can be configured within each organisation to exchange data between their Integration Architecture Nodes.

Additional note on connectivity and security:

Multiple Federator Producers and Consumers can exchange data across organisations using gRPC over mTLS.
As long as they are configured to talk to the same Management-Node, they will obtain compatible configuration (topics, roles, filters, endpoints) required for their data exchange.
The Management-Node, together with the Identity Provider, issues the certificates/credentials and tokens that enable mutual TLS and authorisation.
This means any number of Producers and Consumers can safely share data so long as their exchange requirements are defined in, and served by, the Management-Node.

Exchange data between IA nodes

The Federator is designed to allow data exchange between Integration Architecture Nodes. It supports two primary modes of operation:

RDF Message Streaming: Kafka-to-Kafka message federation with filtering based on security labels
File Streaming: Large file transfer with multi-cloud storage support and integrity verification

Both modes use gRPC over mTLS for secure communication and are run in a distributed manner with multiple servers and clients.

Server (Producer) - Simplified View

For RDF Messages:

Reads messages from knowledge topics within the source Kafka broker
Authenticates clients using JWT tokens and verifies authorization
Filters messages based on security labels in message headers using configurable filters
Streams filtered messages to authorized clients via gRPC

For Files:

Reads files from configured storage (S3, Azure, GCP, or Local filesystem)
Authenticates clients using JWT tokens and verifies authorization
Chunks files into manageable pieces with a configurable chunk size
Streams file chunks to authorized clients via gRPC with SHA-256 checksums for integrity verification

Client (Consumer) - Simplified View

For RDF Messages:

Connects and authenticates with known server(s) using JWT tokens via gRPC
Requests message streams for authorized topics
Writes received messages to target Kafka broker with a configured topic prefix (e.g., 'federated')
Tracks offsets in Redis for resume capability

For Files:

Connects and authenticates with known server(s) using JWT tokens via gRPC
Requests file streams, optionally resuming from a previous sequence ID
Assembles received chunks and verifies integrity using SHA-256 checksums
Uploads complete files to configured storage destination (S3, Azure, GCP, or Local)
Tracks file sequence offsets in Redis for resume capability

The underlying communication protocol is gRPC over mTLS, providing secure, authenticated data transfer between servers and clients.

Architecture

Federator Server (Producer)

This app starts the data federation server that starts a gRPC service.

This process contains the federator service supplying RPC endpoints that are called by the client:

GetKafkaConsumer - Stream RDF messages from Kafka topics to clients
GetFilesStream - Stream files as chunks to clients with integrity verification

Both endpoints authenticate clients using JWT tokens and verify authorization against the Management-Node configuration before streaming data.

Federator Client (Consumer)

The client connects to one or more servers and performs the following:

For RDF Message Streaming (GetKafkaConsumer):

Authenticates with the server using JWT tokens
Checks Redis for the current offset for each topic
Requests message stream from the server
Processes messages and writes them to the destination Kafka topic (with configured prefix, e.g., 'federated')
Updates Redis offset tracking as messages are processed
Continues streaming until stopped; retries on failures if configured

For File Streaming (GetFilesStream):

Authenticates with the server using JWT tokens
Checks Redis for the last processed file sequence ID
Requests file stream from the server, optionally resuming from a previous sequence
Receives file chunks, assembles them locally, and verifies integrity using SHA-256 checksums
Uploads complete files to the configured storage destination (S3, Azure, GCP, or Local)
Updates Redis offset tracking as files are successfully processed
Handles StreamWarning messages by logging and advancing offsets to skip unrecoverable errors

Please refer to this context diagram as an overview of the federator service and its components:

This diagram illustrates the main components involved in a typical deployment:

Federator Producer and Federator Consumer communicating over gRPC (mTLS).
Kafka clusters used by producers and consumers.
Redis cache used for short‑lived configuration and offsets/tokens.
Management-Node service that provides configuration to Federators.
Identity Provider (e.g., Keycloak) used for authentication and authorisation.
Postgres databases used by the Management-Node and Identity Provider.

See the Architecture section below for more detail on Producer and Consumer responsibilities.

Testing Guide

Running Unit Tests

Navigate to the root of the project and run mvn test to run the tests for the repository.

Public Funding Acknowledgment

This repository has been developed with public funding as part of the National Digital Twin Programme (NDTP), a UK Government initiative. NDTP, alongside its partners, has invested in this work to advance open, secure, and reusable digital twin technologies for any organisation, whether from the public or private sector, irrespective of size.

License

This repository contains both source code and documentation, which are covered by different licenses:

Code: Originally developed by Telicent UK Ltd, now maintained by National Digital Twin Programme. Licensed under the Apache License 2.0.
Documentation: Licensed under the Open Government Licence (OGL) v3.0.

By contributing to this repository, you agree that your contributions will be licenced under these terms.

See LICENSE, OGL_LICENSE, and NOTICE for details.

Security and Responsible Disclosure

We take security seriously. If you believe you have found a security vulnerability in this repository, please follow our responsible disclosure process outlined in SECURITY.

Contributing

We welcome contributions that align with the Programme’s objectives. Please read our Contributing guidelines before submitting pull requests.

Acknowledgements

This repository has benefited from collaboration with various organisations. For a list of acknowledgments, see ACKNOWLEDGMENTS.

Support and Contact

For questions or support, check our Issues or contact the NDTP team on ndtp@businessandtrade.gov.uk.

Maintained by the National Digital Twin Programme (NDTP).

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.github		.github
.m2		.m2
.mvn/wrapper		.mvn/wrapper
codestyle		codestyle
docker		docker
docs		docs
scripts		scripts
src		src
.czrc		.czrc
.env		.env
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
ACKNOWLEDGEMENTS.md		ACKNOWLEDGEMENTS.md
AUDIT_REPORT.md		AUDIT_REPORT.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALLATION.md		INSTALLATION.md
LICENSE.md		LICENSE.md
MAINTAINERS.md		MAINTAINERS.md
NOTICE.md		NOTICE.md
OGL_LICENSE.md		OGL_LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
UNINSTALL.md		UNINSTALL.md
assembly.xml		assembly.xml
lombok.config		lombok.config
mkdocs.yml		mkdocs.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Overview

Prerequisites

Quick Start

1. Download and Build

2. Run Build Version

3. Installation

4. Uninstallation

Features

Data Federation

File Streaming

Common Infrastructure

Exchange data between IA nodes

Server (Producer) - Simplified View

Client (Consumer) - Simplified View

Architecture

Federator Server (Producer)

Federator Client (Consumer)

Testing Guide

Running Unit Tests

Public Funding Acknowledgment

License

Security and Responsible Disclosure

Contributing

Acknowledgements

Support and Contact

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

README

Overview

Prerequisites

Quick Start

1. Download and Build

2. Run Build Version

3. Installation

4. Uninstallation

Features

Data Federation

File Streaming

Common Infrastructure

Exchange data between IA nodes

Server (Producer) - Simplified View

Client (Consumer) - Simplified View

Architecture

Federator Server (Producer)

Federator Client (Consumer)

Testing Guide

Running Unit Tests

Public Funding Acknowledgment

License

Security and Responsible Disclosure

Contributing

Acknowledgements

Support and Contact

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages