Skip to content

alejandrocorchonfranco/housing-vulnerability-logit-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

housing-vulnerability-logit-model

Econometric analysis of the determinants of high rent burden among Spanish households using EPF 2023 microdata. Includes data preparation, logistic regression modeling, evaluation metrics, and interpretation of key factors linked to housing vulnerability.

Overview

This repository contains an econometric analysis of the determinants of high rent burden among Spanish households. Using microdata from the Household Budget Survey (EPF) 2023, the study examines which socioeconomic, demographic, geographic and housing-related factors drive a household to allocate an unusually high share of total expenditure to rent.

The core of the project is a logistic regression model (Logit) that identifies households with high rent burden, defined as those above one standard deviation from the mean proportion of expenditure devoted to the principal dwelling’s rent.

This project is part of my academic portfolio for Economics.


Objectives

  • Quantify the factors influencing the likelihood of experiencing a high rent-to-expenditure ratio.
  • Identify significant predictors of housing vulnerability.
  • Evaluate the predictive performance of a logistic regression model using microdata.
  • Provide insights relevant for public policies aimed at improving rental affordability.

Data

  • Source: Encuesta de Presupuestos Familiares (EPF) 2023 – INE
  • Dependent variable: High rent burden (binary)
  • Key explanatory variables:
    • Household income
    • Dwelling size (useful surface)
    • Age of the household reference person
    • Number of household members
    • Education level
    • Labour status
    • Municipal size & population density
    • Sex of household reference person

The original EPF microdata cannot be redistributed due to INE licensing restrictions.
Only simulated/sample data or scripts for preprocessing are included in this repository.


Methodology

1. Data Preparation

  • Construction of the rent burden variable (rent expenditure / total household expenditure).
  • Categorization into low, medium and high rent burden using ±1 standard deviation around the mean.
  • Binary dependent variable: 1 = high rent burden, 0 = otherwise.

2. Econometric Model

A Logit model is estimated to identify determinants of high rent burden.
Key steps include:

  • Variable selection based on economic and sociodemographic literature.
  • Omnibus test for overall model significance.
  • Parameter estimation and hypothesis testing.
  • Influence analysis using Cook’s distance.
  • Re-estimation after filtering high-influence observations.

3. Model Evaluation

  • Confusion matrix and classification metrics.
  • Sensitivity, specificity, precision and error rates.
  • ROC curve and AUC calculation.

Main Findings

  • Significant predictors:

    • Dwelling size (negative effect): Larger dwellings reduce the probability of high rent burden.
    • Age of household head (negative effect): Younger households show higher vulnerability.
    • Household income: Effect per unit is nearly zero, but large income differences matter in practical terms.
  • Non-significant variables:
    Education, household size, labour status, sex, population density, and municipal size do not show significant effects after controlling for other variables.

  • Model performance:

    • Accuracy: 85.1%
    • Sensitivity: 98.6%
    • Specificity: 8.5%
    • ROC AUC: 0.756

The model is highly effective at detecting vulnerable households (few false negatives), though it overestimates vulnerability among non-vulnerable ones.


Conclusions

The analysis provides evidence that housing vulnerability is concentrated among households that are younger and live in smaller dwellings. Although unit income changes show minimal statistical effect, substantial income differences matter in practice.


👥 Authors

  • Alejandro Corchón
  • Fiorella Raguseo

About

Econometric analysis of the determinants of high rent burden among Spanish households using EPF 2023 microdata. Includes data preparation, logistic regression modeling, evaluation metrics, and interpretation of key factors linked to housing vulnerability.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages