Skip to content

NWelde/Networks

Repository files navigation

SF Vacancy Network Analysis

A comprehensive Python toolkit for analyzing geographic proximity network effects on property vacancy rates in San Francisco.

Research Question

"Do properties near vacant properties show distance-weighted higher vacancy rates than properties farther away with identical characteristics?"

This research tests if vacancy decisions spread through observable neighborhood patterns (network effects) rather than each owner arriving at the same conclusion independently (parallel reasoning).

Overview

This repository provides a complete, production-ready framework for:

  • Data Collection: Geocoding property addresses and collecting vacancy indicators
  • Network Construction: Building spatial networks based on geographic proximity
  • Spatial Analysis: Testing for spatial autocorrelation using Moran's I
  • Statistical Testing: Chi-square, logistic regression, and effect size calculations
  • Network Amplification: Quantifying how proximity to vacant properties amplifies vacancy rates
  • Visualization: Geographic maps, network diagrams, and statistical plots

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Quick Install

# Clone the repository
git clone https://github.com/NWelde/Network-analysis.git
cd Network-analysis

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install package and dependencies
pip install -e .

Development Install

For development with testing tools:

pip install -e ".[dev]"

Quick Start

1. Run Analysis on Zillow Dataset (Recommended)

The scripts automatically detect and transform Zillow data format:

# Run full analysis on Zillow dataset
python scripts/run_full_analysis.py dataset_zillow-zip-search_2025-12-12_12-13-45-217.csv

# Or use the quick script
python run_zillow_analysis.py

# Generate report with visualizations
python scripts/generate_report.py dataset_zillow-zip-search_2025-12-12_12-13-45-217.csv

2. Run Analysis on Sample Data (for testing)

python scripts/run_full_analysis.py data/sample/sample_properties.csv

3. View Results

  • Console Output: Summary statistics and key findings
  • Log File: outputs/analysis.log
  • Report: outputs/reports/analysis_report.md
  • Figures: outputs/figures/*.png

Usage

Basic Analysis Pipeline

from src.data_collection.property_scraper import load_property_data, clean_property_data
from src.network.network_builder import build_property_network
from src.analysis.spatial_autocorrelation import calculate_morans_i
from src.analysis.amplification import calculate_network_amplification

# Load and clean data
properties = load_property_data("data/sample/sample_properties.csv")
properties = clean_property_data(properties)

# Build network
network = build_property_network(
    properties,
    threshold_blocks=2.0,
    weight_method="inverse",
)

# Spatial autocorrelation
morans_results = calculate_morans_i(network, permutations=999)
print(f"Moran's I: {morans_results['morans_i']:.4f}")
print(f"P-value: {morans_results['p_value_sim']:.4f}")

# Network amplification
amplification = calculate_network_amplification(network)
print(f"Multiplier: {amplification['multiplier_vs_without']:.2f}x")

Configuration

Customize analysis parameters in config/config.yaml:

distance:
  miles_per_block: 0.1
  max_distance_blocks: 2
  
edge_weights:
  method: "inverse"  # Options: binary, linear, inverse, gaussian
  threshold_blocks: 2
  
analysis:
  moran_permutations: 999
  significance_level: 0.05

Data Format

Your property data CSV should include:

Column Type Description Required
Address string Property address Yes
Lat float Latitude Yes
Lon float Longitude Yes
Vacant boolean Vacancy status Yes
Price float Property price No
Bedrooms int Number of bedrooms No
Property_Type string Type of property No

See data/sample/sample_properties.csv for an example.

Methodology

1. Network Construction

Properties are connected if within a specified distance threshold (default: 2 blocks ≈ 0.2 miles). Edge weights are calculated using:

  • Inverse Distance: weight = 1 / distance
  • Gaussian: weight = exp(-distance² / (2σ²))
  • Linear Decay: weight = 1 - (distance / max_distance)
  • Binary: weight = 1 if within threshold

2. Spatial Autocorrelation

Moran's I statistic tests for spatial clustering:

  • I > 0: Positive spatial autocorrelation (clustering)
  • I ≈ 0: Random spatial pattern
  • I < 0: Negative spatial autocorrelation (dispersion)

3. Network Amplification

Compares vacancy rates between properties:

  • With vacant neighbors: Properties that have ≥1 vacant neighbor within threshold
  • Without vacant neighbors: Properties with no vacant neighbors

Amplification Multiplier = (Rate with neighbors) / (Rate without neighbors)

A multiplier > 1 indicates network effects beyond parallel reasoning.

Key Outputs

The analysis produces:

  1. Moran's I Statistic: Spatial clustering measure with p-value
  2. Vacancy by Distance Table: Vacancy rates at 0-1, 1-2, 2-3, 3+ blocks
  3. Chi-square Test: Tests independence of vacancy and neighbor status
  4. Network Amplification: Multiplier showing network effect strength
  5. Visualizations: Network graphs, heatmaps, and statistical plots

Example Output

=== SF VACANCY NETWORK ANALYSIS RESULTS ===

Spatial Clustering (Moran's I):
  Moran's I: 0.45
  P-value: 0.001
  Interpretation: ✓ Significant spatial clustering detected

Vacancy by Distance:
  0-1 blocks: 62% (n=15)
  1-2 blocks: 38% (n=20)
  2-3 blocks: 18% (n=10)
  3+ blocks:  12% (n=5)

Network Effect Test (Chi-square):
  χ² = 34.56, p < 0.001
  ✓ Having vacant neighbors significantly predicts vacancy

Network Amplification:
  Baseline vacancy rate: 15.0%
  With vacant neighbors: 55.0%
  Amplification: 40.0 percentage points
  Multiplier: 2.67x

Conclusion: Geographic proximity network amplifies vacancy
by 2.67x beyond parallel reasoning baseline.

Repository Structure

sf-vacancy-network-analysis/
├── config/
│   └── config.yaml              # Configuration settings
├── data/
│   ├── raw/                     # Raw data files
│   ├── processed/               # Processed data
│   └── sample/                  # Sample data for testing
├── src/
│   ├── data_collection/         # Data loading and geocoding
│   ├── network/                 # Network construction
│   ├── analysis/                # Statistical analysis
│   ├── visualization/           # Plotting functions
│   └── utils/                   # Utilities
├── tests/                       # Unit tests
├── notebooks/                   # Jupyter notebooks
├── scripts/                     # Analysis scripts
├── outputs/                     # Results and figures
├── requirements.txt             # Dependencies
├── setup.py                     # Package setup
└── README.md                    # This file

Testing

Run the test suite:

pytest tests/ -v

Run with coverage:

pytest tests/ --cov=src --cov-report=html

Documentation

  • Docstrings: All functions have Google-style docstrings
  • Type Hints: Full type annotations throughout
  • Logging: Comprehensive logging at INFO level
  • Comments: Minimal but meaningful comments where needed

Citation

If you use this code in your research, please cite:

SF Vacancy Network Analysis
GitHub: https://github.com/NWelde/Network-analysis
Year: 2025

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Authors

SF Vacancy Network Analysis Team

Acknowledgments

  • PySAL for spatial analysis tools
  • NetworkX for network construction
  • The urban economics and network science communities

Contact

For questions or feedback, please open an issue on GitHub.


Keywords: vacancy networks, spatial autocorrelation, Moran's I, network effects, urban economics, San Francisco real estate, geographic proximity, spatial analysis

About

CX assignment network analysis code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors