Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions statvar_imports/eurostat/live_births_total_by_month/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Eurostat Live Births Total By Month Import

## Overview
This dataset contains monthly live births data at national and regional levels, sourced from Eurostat. The data tracks the total number of live births per month across various European countries and regions.

**type of place:** Country, NUTS Regions (Level 0-3)
**years:** Historical data to present (1960-2025)
**place_resolution:** Resolved to DCIDs (e.g., dcid:country/FRA, dcid:nuts/AT113)

## Data Source
**Source URL:**
https://ec.europa.eu/eurostat/databrowser/view/DEMO_FMONTH__custom_270818/default/table?lang=en

**Provenance Description:**
The data is provided by Eurostat, the statistical office of the European Union. It is part of the "Demography and migration" database, specifically the "Live births by month" (DEMO_FMONTH) dataset.

## Refresh Type
Automatic Refresh

The refresh is automated using the provided `run.sh` script, which handles both data download and processing.

## How To Run Import
To execute the complete import process (download and processing), run:
```bash
./run.sh
```

### Script Details:
- **Download**: Uses `curl` to fetch the latest SDMX-CSV data from Eurostat's dissemination API.
- **Processing**: Uses `stat_var_processor.py` to map raw data to Data Commons StatVarObservations using the PV map and metadata configuration.

## Processing Instructions
To process the Eurostat Live Births data and generate statistical variables, use the following commands from your current import data directory:

Download input file

```bash
mkdir -p input_files
curl -L --retry 3 "https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/DEMO_FMONTH/?format=SDMX-CSV&compressed=false" -o input_files/live_births_total_by_month_data_input.csv
```

For Test Data Run

```bash
python3 ../../../tools/statvar_importer/stat_var_processor.py \
"--input_data=./test_data/live_births_total_by_month_data_input.csv" \
"--pv_map=./live_births_total_by_month_pvmap.csv" \
"--output_path=./test_data/live_births_total_by_month_output" \
"--config_file=./live_births_total_by_month_metadata.csv" \
"--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

For Main data processing run

```bash
python3 ../../../tools/statvar_importer/stat_var_processor.py \
"--input_data=./input_files/*.csv" \
"--pv_map=./live_births_total_by_month_pvmap.csv" \
"--config_file=./live_births_total_by_month_metadata.csv" \
"--generate_statvar_name=True" \
"--skip_constant_csv_columns=False" \
"--output_columns=observationDate,observationAbout,variableMeasured,value,observationPeriod,unit" \
"--output_path=./live_births_total_by_month_output" \
"--places_resolved_csv=./places_resolved.csv" \
"--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

## Key Files
- `run.sh`: Main execution script for download and processing.
- `live_births_total_by_month_pvmap.csv`: Property-Value mapping for StatVar definitions and dimensions.
- `live_births_total_by_month_metadata.csv`: Configuration parameters for the processor.
- `places_resolved.csv`: Mapping of place codes to Data Commons DCIDs.
- `live_births_total_by_month_output.csv`: Processed statistical observations.
- `live_births_total_by_month_output.tmcf`: Template MCF mapping the CSV columns to Data Commons schema.

## Validation
To validate the generated data, use the Data Commons import tool (lint mode):
```bash
java -jar datacommons-import-tool.jar lint live_births_total_by_month_output.csv live_births_total_by_month_output.tmcf
```
The resulting reports (`report.json`, `summary_report.html`) in `dc_generated/` provide detailed insights into data quality and validation status.

## Testing
Testing is performed using the `test_data` directory:
- Raw Input: `test_data/live_births_total_by_month_data_input.csv`
- Expected Output: `test_data/live_births_total_by_month_output.csv`
- Expected TMCF: `test_data/live_births_total_by_month_output.tmcf`
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
parameter,value
header_rows,1
#input_rows,20

Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
key,property1,value1,property2,value2,property3,value3,property4,value4

# Global Dataflow properties (Standard Data Commons Demographic Modeling)
"DATAFLOW:ESTAT:DEMO_FMONTH(1.0)",populationType,dcs:BirthEvent,measuredProperty,dcs:count,statType,dcs:measuredValue,,

# --- AUTOMATED CLEANUP: Ignored Columns ---
LAST UPDATE,#ignore,"",,,,,,
freq,#ignore,"",,,,,,
CONF_STATUS,#ignore,"",,,,,,

# Month mapping & observationPeriod variable passing (Grounded on standard ISO-8601 durations)
month:M01,Month,01,observationPeriod,P1M,,,,
month:M02,Month,02,observationPeriod,P1M,,,,
month:M03,Month,03,observationPeriod,P1M,,,,
month:M04,Month,04,observationPeriod,P1M,,,,
month:M05,Month,05,observationPeriod,P1M,,,,
month:M06,Month,06,observationPeriod,P1M,,,,
month:M07,Month,07,observationPeriod,P1M,,,,
month:M08,Month,08,observationPeriod,P1M,,,,
month:M09,Month,09,observationPeriod,P1M,,,,
month:M10,Month,10,observationPeriod,P1M,,,,
month:M11,Month,11,observationPeriod,P1M,,,,
month:M12,Month,12,observationPeriod,P1M,,,,
month:TOTAL,Month,"",observationPeriod,P1Y,,,,
month:UNK,#ignore,"",,,,,,

# Unit dimension (dcs:Person is standard for demographic counts)
unit:NR,unit,dcs:Person,,,,,,

# Time column capture (Stores year for the strict date checker)
TIME_PERIOD,Year,{Data},,,,,,

geo,observationAbout,{Data},,,,,,

# Measure and Date Resolution
OBS_VALUE,#Eval,"observationDate='{Year}-{Month}' if '{Month}' else '{Year}'",value,{Number},,,,

# Safely Ignoring Transient and Composite Status Flags to Prevent Ingestion Warnings
OBS_FLAG:b,#ignore,"",,,,,,
OBS_FLAG:e,#ignore,"",,,,,,
OBS_FLAG:p,#ignore,"",,,,,,
OBS_FLAG:f,#ignore,"",,,,,,
OBS_FLAG:n,#ignore,"",,,,,,
OBS_FLAG:u,#ignore,"",,,,,,
OBS_FLAG:be,#ignore,"",,,,,,
OBS_FLAG:bep,#ignore,"",,,,,,

# Missing values
OBS_FLAG:M,MissingValue,"",,,,,,
34 changes: 34 additions & 0 deletions statvar_imports/eurostat/live_births_total_by_month/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"import_specifications": [
{
"import_name": "EuroStat_Live_Births_Total_By_Month",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://ec.europa.eu/eurostat/databrowser/view/DEMO_FMONTH__custom_270818/default/table?lang=en",
"provenance_description": "Annual tracking of the number of live births by month across reporting European countries",
"scripts": [
"run.sh"
],
"import_inputs": [
{
"template_mcf": "live_births_total_by_month_output.tmcf",
"cleaned_csv": "live_births_total_by_month_output.csv"
}
],
"source_files": [
"input_files/*.csv"
],
"cron_schedule": "5 1 1,15 * *",
"resource_limits": {"cpu": 4, "memory": 8, "disk":100},
"config_override": {
"invoke_import_validation": true,
"invoke_import_tool": true,
"invoke_differ_tool": true,
"skip_input_upload": false,
"skip_gcs_upload": false,
"cleanup_gcs_volume_mount": false
}
}
]
}
Loading
Loading