Repository: federator
Description: Provides a mechanism to exchanged data between Integration Architecture (IA) nodes
This repository contributes to the development of secure, scalable, and interoperable data-sharing infrastructure. It supports NDTP’s mission to enable trusted, federated, and decentralised data-sharing across organisations.
This repository is one of several open-source components that underpin NDTP’s Integration Architecture (IA)—a framework designed to allow organisations to manage and exchange data securely while maintaining control over their own information. The IA is actively deployed and tested across multiple sectors, ensuring its adaptability and alignment with real-world needs.
For a complete overview of the Integration Architecture (IA) project, please see the Integration Architecture Documentation.
- Java 21
- This repo uses a maven wrapper so no installation of maven is required.
- Docker
- Git
- Management-node
Follow these steps to get started quickly with this repository. For detailed installation, configuration, and deployment, refer to the relevant MD files.
To download from the github repository run the following commands:
git clone https://github.com/National-Digital-Twin/federator.git
cd federator To run a demo with multiple Federator clients and multiple Federator servers run the following commands from the project root directory:
Compile the java source code: (Replace ./mvnw with mvn to use the maven without the wrapper)
./mvnw clean installBuild the docker containers:
docker compose --file docker/docker-compose-multiple-clients-multiple-server.yml buildRun the docker containers:
docker compose --file docker/docker-compose-multiple-clients-multiple-server.yml upYou should then see the service running within docker containers. These contain multiple clients and multiple servers and their supporting services.
The service will move the data from the topic(s) in the kafka-src to federated topic(s) in kafka-target.
Refer to INSTALLATION.md for detailed installation steps, including required dependencies and setup configurations.
For steps to remove this repository and its dependencies, see UNINSTALL.md.
The federator enables secure data exchange between Integration Architecture nodes, supporting both server (producer) and client (consumer) roles. Key features include:
- Secure, scalable data sharing using Kafka as both source and target.
- Multiple federator servers and clients per organisation for flexible deployment.
- Filtering of Kafka messages for federation is based on the
securityLabelin the Kafka message header and the client’s credentials. The default filter performs an exact match between the client’s credentials and thesecurityLabelheader (e.g.,Security-Label:nationality=GBR). - Custom filtering logic can be configured; see Configuring a Custom Filter for details.
- Communication between federator servers and clients uses gRPC over mTLS for secure, authenticated data transfer.
- Federation currently supports RDF payloads, with extensibility hooks for other data formats on a per-topic basis.
- File Transfer via gRPC: Stream large files from server to client as chunked messages over the
GetFilesStreamRPC endpoint. - Multi-Cloud Storage Support: Both producer (server) and consumer (client) support multiple storage backends:
- Server (Producer): Read files from AWS S3, Azure Blob Storage, Google Cloud Storage (GCP), or Local filesystem
- Client (Consumer): Write files to AWS S3, Azure Blob Storage, Google Cloud Storage (GCP), or Local filesystem
- Integrity Verification: SHA-256 checksums ensure file integrity during transfer
- Resume Support: Continue interrupted transfers using sequence IDs to avoid re-transferring complete files
- Graceful Error Handling: Server sends
StreamWarningmessages for invalid requests without terminating the stream, allowing subsequent files to be processed - S3-Compatible Storage: Support for MinIO and other S3-compatible storage systems
- Azure Support: Works with Azure Blob Storage and Azurite emulator for local development
- GCP Support: Works with Google Cloud Storage and fake-gcs-server emulator for local development
For detailed file streaming documentation, see File Streaming README.
- Integration with Management-Node for centralised configuration, topic management, and authorisation.
- Redis is used for offset tracking and short-lived configuration caching.
- JWT-based authentication with Identity Provider (e.g., Keycloak) for consumer verification and authorisation.
An overview of the Federator service architecture is shown below:
The diagram above shows how the Federator can be used to exchange data between Integration Architecture Nodes that are running within many different organisations.
Each organisation could typically run many servers (producers) and many clients (consumers) to exchange data between their Integration Architecture Nodes.
For example within the above diagram:
- Organisation 2 (Org 2) is shown to be running two servers, with one named "Producer Node A1" that is sending messages to the topic named "DP1"
- Organisation 1 (Org 1) is shown to be running a client called "Consumer Node B2" which is reading the messages from the topic named "DP1"
It should be further noted that this diagram shows that many servers (or producers) and many clients (or consumers) can be configured within each organisation to exchange data between their Integration Architecture Nodes.
Additional note on connectivity and security:
- Multiple Federator Producers and Consumers can exchange data across organisations using gRPC over mTLS.
- As long as they are configured to talk to the same Management-Node, they will obtain compatible configuration (topics, roles, filters, endpoints) required for their data exchange.
- The Management-Node, together with the Identity Provider, issues the certificates/credentials and tokens that enable mutual TLS and authorisation.
- This means any number of Producers and Consumers can safely share data so long as their exchange requirements are defined in, and served by, the Management-Node.
The Federator is designed to allow data exchange between Integration Architecture Nodes. It supports two primary modes of operation:
- RDF Message Streaming: Kafka-to-Kafka message federation with filtering based on security labels
- File Streaming: Large file transfer with multi-cloud storage support and integrity verification
Both modes use gRPC over mTLS for secure communication and are run in a distributed manner with multiple servers and clients.
For RDF Messages:
- Reads messages from knowledge topics within the source Kafka broker
- Authenticates clients using JWT tokens and verifies authorization
- Filters messages based on security labels in message headers using configurable filters
- Streams filtered messages to authorized clients via gRPC
For Files:
- Reads files from configured storage (S3, Azure, GCP, or Local filesystem)
- Authenticates clients using JWT tokens and verifies authorization
- Chunks files into manageable pieces with a configurable chunk size
- Streams file chunks to authorized clients via gRPC with SHA-256 checksums for integrity verification
For RDF Messages:
- Connects and authenticates with known server(s) using JWT tokens via gRPC
- Requests message streams for authorized topics
- Writes received messages to target Kafka broker with a configured topic prefix (e.g., 'federated')
- Tracks offsets in Redis for resume capability
For Files:
- Connects and authenticates with known server(s) using JWT tokens via gRPC
- Requests file streams, optionally resuming from a previous sequence ID
- Assembles received chunks and verifies integrity using SHA-256 checksums
- Uploads complete files to configured storage destination (S3, Azure, GCP, or Local)
- Tracks file sequence offsets in Redis for resume capability
The underlying communication protocol is gRPC over mTLS, providing secure, authenticated data transfer between servers and clients.
This app starts the data federation server that starts a gRPC service.
This process contains the federator service supplying RPC endpoints that are called by the client:
- GetKafkaConsumer - Stream RDF messages from Kafka topics to clients
- GetFilesStream - Stream files as chunks to clients with integrity verification
Both endpoints authenticate clients using JWT tokens and verify authorization against the Management-Node configuration before streaming data.
The client connects to one or more servers and performs the following:
For RDF Message Streaming (GetKafkaConsumer):
- Authenticates with the server using JWT tokens
- Checks Redis for the current offset for each topic
- Requests message stream from the server
- Processes messages and writes them to the destination Kafka topic (with configured prefix, e.g., 'federated')
- Updates Redis offset tracking as messages are processed
- Continues streaming until stopped; retries on failures if configured
For File Streaming (GetFilesStream):
- Authenticates with the server using JWT tokens
- Checks Redis for the last processed file sequence ID
- Requests file stream from the server, optionally resuming from a previous sequence
- Receives file chunks, assembles them locally, and verifies integrity using SHA-256 checksums
- Uploads complete files to the configured storage destination (S3, Azure, GCP, or Local)
- Updates Redis offset tracking as files are successfully processed
- Handles StreamWarning messages by logging and advancing offsets to skip unrecoverable errors
Please refer to this context diagram as an overview of the federator service and its components:
This diagram illustrates the main components involved in a typical deployment:
- Federator Producer and Federator Consumer communicating over gRPC (mTLS).
- Kafka clusters used by producers and consumers.
- Redis cache used for short‑lived configuration and offsets/tokens.
- Management-Node service that provides configuration to Federators.
- Identity Provider (e.g., Keycloak) used for authentication and authorisation.
- Postgres databases used by the Management-Node and Identity Provider.
See the Architecture section below for more detail on Producer and Consumer responsibilities.
Navigate to the root of the project and run mvn test to run the tests for the repository.
This repository has been developed with public funding as part of the National Digital Twin Programme (NDTP), a UK Government initiative. NDTP, alongside its partners, has invested in this work to advance open, secure, and reusable digital twin technologies for any organisation, whether from the public or private sector, irrespective of size.
This repository contains both source code and documentation, which are covered by different licenses:
- Code: Originally developed by Telicent UK Ltd, now maintained by National Digital Twin Programme. Licensed under the Apache License 2.0.
- Documentation: Licensed under the Open Government Licence (OGL) v3.0.
By contributing to this repository, you agree that your contributions will be licenced under these terms.
See LICENSE, OGL_LICENSE, and NOTICE for details.
We take security seriously. If you believe you have found a security vulnerability in this repository, please follow our responsible disclosure process outlined in SECURITY.
We welcome contributions that align with the Programme’s objectives. Please read our Contributing guidelines before submitting pull requests.
This repository has benefited from collaboration with various organisations. For a list of acknowledgments, see ACKNOWLEDGMENTS.
For questions or support, check our Issues or contact the NDTP team on ndtp@businessandtrade.gov.uk.
Maintained by the National Digital Twin Programme (NDTP).
© Crown Copyright 2025. This work has been developed by the National Digital Twin Programme and is legally attributed to the Department for Business and Trade (UK) as the governing entity.
