DHIS2-to-DHIS2 development utility library maintained by Bluesquare Data Services team.
Install this library on its own:
pip install git+https://github.com/BLSQ/openhexa-ds-developments.git#subdirectory=d2d_developmentDescription:
Main class to extract data from DHIS2. It provides unified handlers for extracting data elements, indicators, and reporting rates, saving them to disk in a standardized format.
Configuration Parameters:
When initializing DHIS2Extractor, you can configure the following parameters:
dhis2_client(required): The DHIS2 client instance.download_mode: Controls how files are saved when extracting data. Use"DOWNLOAD_REPLACE"(default) to always overwrite files, or"DOWNLOAD_NEW"to skip downloading if the file already exists.return_existing_file: IfTrueand usingDOWNLOAD_NEW, returns the path to the existing file instead ofNonewhen a file already exists (default:False).logger: Optional custom logger instance.
Example:
extractor = DHIS2Extractor(dhis2_client, download_mode="DOWNLOAD_NEW", return_existing_file=True)Parameters for download_period (DataElementsExtractor, IndicatorsExtractor, ReportingRatesExtractor):
-
data_elements / indicators / reporting_rates (
list[str]):
A list of DHIS2 UIDs to extract.- For
data_elements.download_period, usedata_elements=["de1", "de2"]. - For
indicators.download_period, useindicators=["ind1", ...]. - For
reporting_rates.download_period, usereporting_rates=["rr1", ...].
- For
-
org_units (
list[str]):
List of DHIS2 organisation unit UIDs to extract data for (e.g.,["ou1", "ou2"]). -
period (
str):
The DHIS2 period to extract data for (e.g.,"202401"for January 2024). Must be a valid DHIS2 period string. -
output_dir (
Path):
The directory where the extracted data file will be saved. The file will be nameddata_<period>.parquetby default unless you specify a custom filename. -
filename (
str | None, optional):
Custom filename for the output file. If not provided, the default isdata_<period>.parquet. Using the default is recommended when extracting multiple periods. -
kwargs (
dict, optional):
Additional keyword arguments for advanced extraction options.- For data elements:
last_updated(not yet implemented). - For indicators:
include_cocs(bool, whether to include category option combo, use only together with data element ids). - For reporting rates: currently no extra options.
- For data elements:
Returns:
- The path to the saved Parquet file (
Path), orNoneif no data was extracted or the file already exists andreturn_existing_fileisFalse.
Output Format: The extraction methods always save the data in a table with a fixed column structure. Each extraction creates a separate .parquet file, where each row represents a data point and the columns are always:
- dx: Data element, indicator, or reporting rate UID
- period: Period (e.g.,
"202401") - orgUnit: Organisation unit UID
- categoryOptionCombo: Category option combo UID
- attributeOptionCombo: Attribute option combo UID
- rateMetric: Rate metric (for reporting rates)
- domainType: Data domain (e.g.,
"AGGREGATED") - value: The value for the data point
The file path to the saved Parquet file is returned by the extraction method. You can load the output using pandas, polars, or any tool that supports Parquet files.
Usage Example:
from d2d_development.extract import DHIS2Extractor
from openhexa.sdk import workspace
from openhexa.toolbox.dhis2 import DHIS2
from pathlib import Path
dhis2_client = DHIS2(workspace.get_connection("dhis2-connection"))
extractor = DHIS2Extractor(dhis2_client, download_mode="DOWNLOAD_REPLACE")
# Extract several periods of data elements
for period in ["202401", "202402", "202403"]:
extractor.data_elements.download_period(
data_elements=["de1", "de2"],
org_units=["ou1", "ou2"],
period=period,
output_dir=Path("/output")
)
# Extract one period of indicators
extractor.indicators.download_period(
indicators=["ind1"],
org_units=["ou1"],
period="202401",
output_dir=Path("/tmp")
)
# Extract one period of reporting rates
extractor.reporting_rates.download_period(
reporting_rates=["rr1"],
org_units=["ou1"],
period="202401",
output_dir=Path("/tmp")
)
# Example load the output file
import polars as pl
df = pl.read_parquet(Path(/tmp) / f"data_{period}.parquet") # Default naming
print(df.head())Note:
- The same pattern applies for
extractor.indicators.download_periodandextractor.reporting_rates.download_period, just change the first argument name accordingly. - All extracted files are saved in Parquet format by default.
Description:
Main class to handle pushing data to DHIS2. It validates and pushes formatted data (pandas or polars DataFrame) to a DHIS2 instance.
Input Data Format for DHIS2Pusher
The push_data method expects a pandas or polars DataFrame with the following columns (all required):
- dx: Data element, indicator, or reporting rate UID
- period: Period (e.g.,
"202401") - orgUnit: Organisation unit UID
- categoryOptionCombo: Category option combo UID
- attributeOptionCombo: Attribute option combo UID
- value: The value to be pushed (numeric or string, depending on DHIS2 configuration)
If any of these columns are missing, or if the input is not a pandas or polars DataFrame, a PusherError will be raised.
Configuration Parameters:
When initializing DHIS2Pusher, you can configure the following parameters:
dhis2_client(required): The DHIS2 client instance.import_strategy: Strategy flag passed to the DHIS2 API for data import. Accepts "CREATE", "UPDATE", or "CREATE_AND_UPDATE" (default: "CREATE_AND_UPDATE"). NOTE: This only controls how the DHIS2 server processes the data; it does not affect client-side logic.dry_run: IfTrue, simulates the push without making changes on the server (default:True).max_post: Maximum number of data points per POST request (default:500).logging_interval: Log progress every N data points (default:50000).logger: Optional custom logger instance.
Usage Example:
from d2d_development.push import DHIS2Pusher
from openhexa.sdk import workspace
from openhexa.toolbox.dhis2 import DHIS2
import polars as pl
dhis2_client = DHIS2(workspace.get_connection("dhis2-connection"))
df = pl.DataFrame({
"dx": ["de1"],
"period": ["202401"],
"orgUnit": ["ou1"],
"categoryOptionCombo": ["coc"],
"attributeOptionCombo": ["aoc"],
"value": [123]})
pusher = DHIS2Pusher(
dhis2_client,
import_strategy="CREATE_AND_UPDATE", # or "CREATE", "UPDATE"
dry_run=False,
max_post=1000,
logging_interval=10000,
)
pusher.push_data(df)Accessing Push Summary Information
After calling push_data, the DHIS2Pusher instance provides detailed results of the push operation in its summary attribute. This dictionary contains:
import_counts: Number of data points imported, updated, ignored, or deleted (dict with keys:imported,updated,ignored,deleted).import_options: The options used for the import (strategy, dry run, etc).import_errors: List of errors, conflicts, or error reports returned by DHIS2 or encountered during the push.ignored_data_points: List of data points that were ignored due to missing or invalid fields.delete_data_points: List of data points that were marked for deletion (value is NA/null).
Example:
pusher.push_data(df)
print(pusher.summary)
# Example output:
# {
# 'import_counts': {'imported': 1, 'updated': 0, 'ignored': 0, 'deleted': 0},
# 'import_options': {'importStrategy': 'CREATE_AND_UPDATE', 'dryRun': False, ...},
# 'import_errors': [],
# 'ignored_data_points': [],
# 'delete_data_points': []
# }You can use these fields to programmatically inspect the results of your push, handle errors, or log/report the outcome. For example, to check if any data points were ignored:
if pusher.summary["ignored_data_points"]:
print(f"Ignored {len(pusher.summary['ignored_data_points'])} data points:")
for dp in pusher.summary["ignored_data_points"]:
print(dp)