A comprehensive Python toolkit for analyzing geographic proximity network effects on property vacancy rates in San Francisco.
"Do properties near vacant properties show distance-weighted higher vacancy rates than properties farther away with identical characteristics?"
This research tests if vacancy decisions spread through observable neighborhood patterns (network effects) rather than each owner arriving at the same conclusion independently (parallel reasoning).
This repository provides a complete, production-ready framework for:
- Data Collection: Geocoding property addresses and collecting vacancy indicators
- Network Construction: Building spatial networks based on geographic proximity
- Spatial Analysis: Testing for spatial autocorrelation using Moran's I
- Statistical Testing: Chi-square, logistic regression, and effect size calculations
- Network Amplification: Quantifying how proximity to vacant properties amplifies vacancy rates
- Visualization: Geographic maps, network diagrams, and statistical plots
- Python 3.8 or higher
- pip package manager
# Clone the repository
git clone https://github.com/NWelde/Network-analysis.git
cd Network-analysis
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install package and dependencies
pip install -e .For development with testing tools:
pip install -e ".[dev]"The scripts automatically detect and transform Zillow data format:
# Run full analysis on Zillow dataset
python scripts/run_full_analysis.py dataset_zillow-zip-search_2025-12-12_12-13-45-217.csv
# Or use the quick script
python run_zillow_analysis.py
# Generate report with visualizations
python scripts/generate_report.py dataset_zillow-zip-search_2025-12-12_12-13-45-217.csvpython scripts/run_full_analysis.py data/sample/sample_properties.csv- Console Output: Summary statistics and key findings
- Log File:
outputs/analysis.log - Report:
outputs/reports/analysis_report.md - Figures:
outputs/figures/*.png
from src.data_collection.property_scraper import load_property_data, clean_property_data
from src.network.network_builder import build_property_network
from src.analysis.spatial_autocorrelation import calculate_morans_i
from src.analysis.amplification import calculate_network_amplification
# Load and clean data
properties = load_property_data("data/sample/sample_properties.csv")
properties = clean_property_data(properties)
# Build network
network = build_property_network(
properties,
threshold_blocks=2.0,
weight_method="inverse",
)
# Spatial autocorrelation
morans_results = calculate_morans_i(network, permutations=999)
print(f"Moran's I: {morans_results['morans_i']:.4f}")
print(f"P-value: {morans_results['p_value_sim']:.4f}")
# Network amplification
amplification = calculate_network_amplification(network)
print(f"Multiplier: {amplification['multiplier_vs_without']:.2f}x")Customize analysis parameters in config/config.yaml:
distance:
miles_per_block: 0.1
max_distance_blocks: 2
edge_weights:
method: "inverse" # Options: binary, linear, inverse, gaussian
threshold_blocks: 2
analysis:
moran_permutations: 999
significance_level: 0.05Your property data CSV should include:
| Column | Type | Description | Required |
|---|---|---|---|
| Address | string | Property address | Yes |
| Lat | float | Latitude | Yes |
| Lon | float | Longitude | Yes |
| Vacant | boolean | Vacancy status | Yes |
| Price | float | Property price | No |
| Bedrooms | int | Number of bedrooms | No |
| Property_Type | string | Type of property | No |
See data/sample/sample_properties.csv for an example.
Properties are connected if within a specified distance threshold (default: 2 blocks ≈ 0.2 miles). Edge weights are calculated using:
- Inverse Distance:
weight = 1 / distance - Gaussian:
weight = exp(-distance² / (2σ²)) - Linear Decay:
weight = 1 - (distance / max_distance) - Binary:
weight = 1if within threshold
Moran's I statistic tests for spatial clustering:
- I > 0: Positive spatial autocorrelation (clustering)
- I ≈ 0: Random spatial pattern
- I < 0: Negative spatial autocorrelation (dispersion)
Compares vacancy rates between properties:
- With vacant neighbors: Properties that have ≥1 vacant neighbor within threshold
- Without vacant neighbors: Properties with no vacant neighbors
Amplification Multiplier = (Rate with neighbors) / (Rate without neighbors)
A multiplier > 1 indicates network effects beyond parallel reasoning.
The analysis produces:
- Moran's I Statistic: Spatial clustering measure with p-value
- Vacancy by Distance Table: Vacancy rates at 0-1, 1-2, 2-3, 3+ blocks
- Chi-square Test: Tests independence of vacancy and neighbor status
- Network Amplification: Multiplier showing network effect strength
- Visualizations: Network graphs, heatmaps, and statistical plots
=== SF VACANCY NETWORK ANALYSIS RESULTS ===
Spatial Clustering (Moran's I):
Moran's I: 0.45
P-value: 0.001
Interpretation: ✓ Significant spatial clustering detected
Vacancy by Distance:
0-1 blocks: 62% (n=15)
1-2 blocks: 38% (n=20)
2-3 blocks: 18% (n=10)
3+ blocks: 12% (n=5)
Network Effect Test (Chi-square):
χ² = 34.56, p < 0.001
✓ Having vacant neighbors significantly predicts vacancy
Network Amplification:
Baseline vacancy rate: 15.0%
With vacant neighbors: 55.0%
Amplification: 40.0 percentage points
Multiplier: 2.67x
Conclusion: Geographic proximity network amplifies vacancy
by 2.67x beyond parallel reasoning baseline.
sf-vacancy-network-analysis/
├── config/
│ └── config.yaml # Configuration settings
├── data/
│ ├── raw/ # Raw data files
│ ├── processed/ # Processed data
│ └── sample/ # Sample data for testing
├── src/
│ ├── data_collection/ # Data loading and geocoding
│ ├── network/ # Network construction
│ ├── analysis/ # Statistical analysis
│ ├── visualization/ # Plotting functions
│ └── utils/ # Utilities
├── tests/ # Unit tests
├── notebooks/ # Jupyter notebooks
├── scripts/ # Analysis scripts
├── outputs/ # Results and figures
├── requirements.txt # Dependencies
├── setup.py # Package setup
└── README.md # This file
Run the test suite:
pytest tests/ -vRun with coverage:
pytest tests/ --cov=src --cov-report=html- Docstrings: All functions have Google-style docstrings
- Type Hints: Full type annotations throughout
- Logging: Comprehensive logging at INFO level
- Comments: Minimal but meaningful comments where needed
If you use this code in your research, please cite:
SF Vacancy Network Analysis
GitHub: https://github.com/NWelde/Network-analysis
Year: 2025
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
SF Vacancy Network Analysis Team
- PySAL for spatial analysis tools
- NetworkX for network construction
- The urban economics and network science communities
For questions or feedback, please open an issue on GitHub.
Keywords: vacancy networks, spatial autocorrelation, Moran's I, network effects, urban economics, San Francisco real estate, geographic proximity, spatial analysis