Skip to content

Meaningful-Data/lei_sdmx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LEI to SDMX Pipeline

This project provides a pipeline for transforming Legal Entity Identifier (LEI) data into SDMX (Statistical Data and Metadata eXchange) format, with built-in validation and data quality checks.

Overview

The pipeline processes LEI data through several stages:

  1. Data loading from CSV format
  2. Data cleaning and reshaping
  3. Conversion to SDMX format
  4. Structural validation using FMR (Fusion Metadata Registry)
  5. Data quality validation using VTL (Validation and Transformation Language) scripts

Prerequisites

  • Python 3.9 or higher
  • Required Python packages:
    • vtlengine (installs automatically libraries like pandas and pysdmx)
    • requests

Installation

  1. Clone the repository:
git clone https://github.com/Meaningful-Data/lei_sdmx.git
cd lei_sdmx
  1. Install dependencies (using poetry):
poetry install --no-root

Make sure you have Poetry installed. If not, you can install it with pip install poetry.

Usage

The main pipeline can be used as follows:

from pathlib import Path
from lei_sdmx_pipeline import lei_to_sdmx_pipeline

# Configure paths
base_path = Path(__file__).parent
lei_data_path = base_path / "lei_data" / "gleif-goldencopy-lei2-golden-copy.csv"
output_path = base_path / "output" / "lei_to_sdmx.csv"
logs_folder = base_path / "log"

# Configure the pipeline
sdmx_api_endpoint = "https://fmr.meaningfuldata.eu/sdmx/v2"
vtl_script_query = {
    'id': 'LEI_VALIDATIONS',
    'agency': 'MD',
    'version': '1.0',
    'api_endpoint': sdmx_api_endpoint
}

# Run the pipeline
dataset, structural_validation_result, validation_result = lei_to_sdmx_pipeline(
    input_path=lei_data_path,
    row_limit=10000,
    sdmx_api_endpoint=sdmx_api_endpoint,
    vtl_script_query=vtl_script_query,
    output_path=output_path,
    logs_folder=logs_folder
)

# Check results
print(f"Process finished. SDMX dataset saved to {output_path}")
print(f"Logs saved to {logs_folder}")
print("Available validation results:", validation_result.keys())

Note that the function is already implemented in the file lei_sdmx_pipeline.py

Input Data Format

The input CSV file is the LEI golden copy, which can be found here Please bear in mind that you should download a file, and change the parameters in the code to point to the right CSV file.

Output

The pipeline produces:

  1. An SDMX-formatted dataset
  2. Structural validation results from FMR (saved to log/structural_validation_logs.json)
  3. Data quality validation results from VTL scripts (saved to CSV files in the log folder)
  4. A CSV file in SDMX CSV 2.0 format (saved to the specified output path)

Project Structure

lei_sdmx/
├── lei_sdmx_pipeline.py    # Main pipeline implementation
├── utils.py               # Utility functions for FMR validation
├── pyproject.toml         # Poetry dependencies
├── README.md             # This file
├── lei_data/             # Directory for input LEI data
├── output/               # Directory for SDMX output files
└── log/                  # Directory for validation logs

Validation

The pipeline performs two types of validation:

  1. Structural Validation: Ensures the data conforms to the SDMX structure defined in the FMR
  2. VTL Validation: Runs custom validation rules defined in VTL scripts to check data quality

About

An example of a pipeline for on-boarding the LEI into SDMX

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages