LEI to SDMX Pipeline

This project provides a pipeline for transforming Legal Entity Identifier (LEI) data into SDMX (Statistical Data and Metadata eXchange) format, with built-in validation and data quality checks.

Overview

The pipeline processes LEI data through several stages:

Data loading from CSV format
Data cleaning and reshaping
Conversion to SDMX format
Structural validation using FMR (Fusion Metadata Registry)
Data quality validation using VTL (Validation and Transformation Language) scripts

Prerequisites

Python 3.9 or higher
Required Python packages:
- vtlengine (installs automatically libraries like pandas and pysdmx)
- requests

Installation

Clone the repository:

git clone https://github.com/Meaningful-Data/lei_sdmx.git
cd lei_sdmx

Install dependencies (using poetry):

poetry install --no-root

Make sure you have Poetry installed. If not, you can install it with pip install poetry.

Usage

The main pipeline can be used as follows:

from pathlib import Path
from lei_sdmx_pipeline import lei_to_sdmx_pipeline

# Configure paths
base_path = Path(__file__).parent
lei_data_path = base_path / "lei_data" / "gleif-goldencopy-lei2-golden-copy.csv"
output_path = base_path / "output" / "lei_to_sdmx.csv"
logs_folder = base_path / "log"

# Configure the pipeline
sdmx_api_endpoint = "https://fmr.meaningfuldata.eu/sdmx/v2"
vtl_script_query = {
    'id': 'LEI_VALIDATIONS',
    'agency': 'MD',
    'version': '1.0',
    'api_endpoint': sdmx_api_endpoint
}

# Run the pipeline
dataset, structural_validation_result, validation_result = lei_to_sdmx_pipeline(
    input_path=lei_data_path,
    row_limit=10000,
    sdmx_api_endpoint=sdmx_api_endpoint,
    vtl_script_query=vtl_script_query,
    output_path=output_path,
    logs_folder=logs_folder
)

# Check results
print(f"Process finished. SDMX dataset saved to {output_path}")
print(f"Logs saved to {logs_folder}")
print("Available validation results:", validation_result.keys())

Note that the function is already implemented in the file lei_sdmx_pipeline.py

Input Data Format

The input CSV file is the LEI golden copy, which can be found here Please bear in mind that you should download a file, and change the parameters in the code to point to the right CSV file.

Output

The pipeline produces:

An SDMX-formatted dataset
Structural validation results from FMR (saved to log/structural_validation_logs.json)
Data quality validation results from VTL scripts (saved to CSV files in the log folder)
A CSV file in SDMX CSV 2.0 format (saved to the specified output path)

Project Structure

lei_sdmx/
├── lei_sdmx_pipeline.py    # Main pipeline implementation
├── utils.py               # Utility functions for FMR validation
├── pyproject.toml         # Poetry dependencies
├── README.md             # This file
├── lei_data/             # Directory for input LEI data
├── output/               # Directory for SDMX output files
└── log/                  # Directory for validation logs

Validation

The pipeline performs two types of validation:

Structural Validation: Ensures the data conforms to the SDMX structure defined in the FMR
VTL Validation: Runs custom validation rules defined in VTL scripts to check data quality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEI to SDMX Pipeline

Overview

Prerequisites

Installation

Usage

Input Data Format

Output

Project Structure

Validation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
lei_data		lei_data
log		log
output		output
.gitignore		.gitignore
README.md		README.md
lei_sdmx_pipeline.py		lei_sdmx_pipeline.py
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

LEI to SDMX Pipeline

Overview

Prerequisites

Installation

Usage

Input Data Format

Output

Project Structure

Validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages