Deploy the trained segmentation model for pan-arctic inference (60-74°N) on 2025 PlanetScope basemap imagery to produce an RTS survey map. The pipeline prioritizes precision over recall to minimize false alarms in the final product.
The data and model operation in inference should exactly match those in training. The best 'recipe' will be provided once the training and experiments are done.
| Resource | Specification |
|---|---|
| Cloud | Google Cloud Platform |
| VM Type | GPU-enabled VM (specific type TBD with PDG team) |
| Storage | Google Cloud Storage bucket: abruptthawmapping |
| Collaboration | PDG workflow optimization team (Luigi/Todd) |
gs://abruptthawmapping/
├── models/
│ └── rts-v2/
│ ├── best_model.pth
│ ├── normalization_stats.json
│ └── config.yaml
├── inference/
│ ├── 2025-Q3/
│ │ ├── tiles/ # Raw prediction tiles
│ │ │ ├── tile_0001.tif
│ │ │ └── ...
│ │ ├── merged/ # Merged prediction rasters
│ │ │ ├── region_yakutia.tif
│ │ │ └── ...
│ │ ├── vectors/ # Vectorized polygons
│ │ │ ├── rts_predictions.gpkg
│ │ │ └── ...
│ │ └── logs/
│ │ └── inference_log.json
│ └── ...
└── basemaps/
└── 2025-Q3/
└── ... (input imagery)
Base Image: Same as training — see computing/docker_training.md for the authoritative Dockerfile and base image.
Additional Inference Requirements:
| Package | Purpose |
|---|---|
| google-cloud-storage | GCS bucket access |
| geopandas | Vector operations |
| shapely | Geometry handling |
| pyproj | Coordinate transformations |
Docker Configuration for Inference:
| Flag | Purpose |
|---|---|
--gpus all |
Enable GPU access |
-v /path/to/cache:/cache |
Local cache for tiles |
--env GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json |
GCS authentication |
GCS Authentication:
- Create service account with Storage Object Viewer and Storage Object Creator roles
- Download JSON key file
- Mount key file into container and set environment variable
| Attribute | Value |
|---|---|
| Product | Global Quarterly PlanetScope Basemap |
| Year | 2025 |
| Quarter | Q3 (July-September) |
| Bands | RGB |
| Resolution | ~3 m |
| Coverage | 60-74°N (pan-arctic) |
| CRS | EPSG:3857 |
| Parameter | Estimate |
|---|---|
| Total area | ~20 million km² |
| Tile size | 512×512 @ 3m = ~2.36 km² per tile |
| Estimated tiles | ~8-10 million tiles (without overlap) |
| With 50% overlap | ~32-40 million tile inferences |
| Parameter | Value | Rationale |
|---|---|---|
| Tile size | 512×512 pixels | Matches training tile size |
| Spatial coverage | ~1.5 km × 1.5 km | At 3m resolution |
| CRS | EPSG:3857 | Consistent with training |
| Format | GeoTIFF | Preserves georeferencing |
Overlapping tiles ensure RTS at tile boundaries are detected where both headwall and floor are visible.
| Parameter | Value | Rationale |
|---|---|---|
| Overlap (stride) | 256 pixels (50%) | Ensures most partial RTS captured in adjacent tile |
| Step size | 256 pixels | tile_size - overlap |
Overlap rationale: An RTS split at a tile boundary may show only floor in tile A and only headwall in tile B. With 50% overlap, an intermediate tile C will likely contain both features.
The inference tile grid is pre-filtered externally (land-only, permafrost zones) before the inference pipeline runs. The inference code receives a pre-filtered tile list and processes it as-is — no filtering logic inside the inference container.
- Define bounding box for inference region (or per-region bounding boxes)
- Generate tile grid with specified overlap
- Apply land/permafrost filtering externally (outside this pipeline)
- Save filtered tile grid as CSV with tile IDs and bounding boxes → this is the
--tile-listinput to the inference script
Implement a simple Histogram Matching script or a "Mini-Normalization" check. Before running full inference on a new region, calculate the mean/std of a small 2025 sample and compare it to the 2024 normalization_stats.json.
Critical: Use the exact normalization statistics from training.
- Load
normalization_stats.jsonfrom model directory - Verify dataset version matches expected training data version
- Apply mean subtraction and std division to each input tile
Use the exact normalization methods and statistics identically to training.
RTS range from ~50m to 2+ km. A single resolution cannot optimally detect all sizes:
- Native 3m: Good for small-medium RTS, may miss context for large RTS
- Downscaled: Larger effective field of view captures large RTS
| Scale | Effective Resolution | Field of View | Target RTS |
|---|---|---|---|
| 1.0 | 3m (native) | 1.5 km | Small-medium (50m-500m) |
| 0.5 | 6m | 3 km | Medium-large (200m-1km) |
First 1.0, if large RTS is problematic then 0.5
For each tile location:
Scale 1.0 (native):
- Load 512×512 tile at native resolution
- Normalize using training statistics
- Run inference → probability map P_1.0
Scale 0.5:
- Load 1024×1024 region centered on tile location
- Downsample to 512×512 (bilinear interpolation)
- Normalize using training statistics
- Run inference → probability map at 512×512
- Upsample prediction back to 1024×1024
- Crop center 512×512 → P_0.5
| Setting | Transforms | Speed Multiplier |
|---|---|---|
| Disabled | None | 1× |
| Minimal | Identity, hflip | 2× |
| Standard | Identity, hflip, vflip, rot180 | 4× |
Recommendation: For pan-arctic inference, use Minimal TTA (2×) as balance between accuracy and compute cost. Full TTA on 40M+ tiles is expensive.
For each input tile:
- Original → predict → P_orig
- Horizontal flip → predict → flip back → P_hflip
- Average: P_tta = (P_orig + P_hflip) / 2
Order of operations:
- For each scale: a. Apply TTA transforms b. Average TTA predictions at this scale
- Take maximum across scales
Total inference passes per tile location: n_scales × n_tta_transforms
| Configuration | Passes per Location |
|---|---|
| 2 scales, no TTA | 2 |
| 2 scales, minimal TTA | 4 |
| 2 scales, standard TTA | 8 |
| Parameter | Value | Notes |
|---|---|---|
| Batch size | 64-128 | Tune based on GPU memory |
| Tile loading | Async prefetch | Overlap I/O with compute |
| GPU utilization target | >90% | Monitor with nvidia-smi |
- Initialize: Load model, load normalization stats, set model to eval mode
- Tile iteration:
- Load batch of tiles from GCS (with prefetching)
- Normalize batch
- For each scale: run inference (with TTA if enabled)
- Fuse scales
- Save predictions to GCS
- Progress tracking: Log completed tiles, estimated time remaining
- Checkpointing: Save progress every N tiles for resumability
The inference job must be resumable after interruption:
- Maintain manifest of completed tiles in
inference_log.json - On restart, load manifest and skip completed tiles
- Use atomic writes to GCS (write to temp, then rename)
| Attribute | Value |
|---|---|
| Format | Cloud-Optimized GeoTIFF (COG) |
| Data type | Float32 |
| Range | [0.0, 1.0] |
| NoData value | -1.0 |
| CRS | EPSG:3857 |
| Resolution | 3m (native) |
| Compression | Deflate |
| Attribute | Value |
|---|---|
| Format | Cloud-Optimized GeoTIFF (COG) |
| Data type | UInt8 |
| Values | 0 (background), 1 (RTS) |
| NoData value | 255 |
| CRS | EPSG:3857 |
| Resolution | 3m |
| Compression | Deflate |
Threshold applied: Use calibrated threshold from training (documented in model config).
| Attribute | Value |
|---|---|
| Format | GeoPackage (.gpkg) |
| Geometry | Polygon (MultiPolygon for fragmented) |
| CRS | EPSG:3857 |
Attributes per polygon:
| Field | Type | Description |
|---|---|---|
| rts_id | Integer | Unique identifier |
| area_m2 | Float | Polygon area in square meters (geodesic) |
| perimeter_m | Float | Polygon perimeter in meters (geodesic) |
| centroid_lat | Float | Centroid latitude (WGS84) |
| centroid_lon | Float | Centroid longitude (WGS84) |
| mean_prob | Float | Mean probability within polygon |
| max_prob | Float | Maximum probability within polygon |
| detection_scale | String | Scale(s) that detected this RTS |
| tile_ids | String | Comma-separated tile IDs containing this RTS |
Save with each inference run:
inference_log.json:
| Field | Description |
|---|---|
| model_version | Model identifier |
| model_checkpoint | Path to model weights |
| normalization_stats_hash | MD5 hash of normalization file |
| inference_date | ISO timestamp |
| basemap_version | 2025-Q3 |
| scales_used | [1.0, 0.5] |
| tta_config | "minimal" |
| threshold | 0.XX (from calibration) |
| n_tiles_processed | Total tiles |
| n_tiles_with_detection | Tiles with any RTS prediction |
| total_rts_area_km2 | Sum of predicted RTS area |
| processing_time_hours | Wall clock time |
| gpu_type | e.g., "NVIDIA H100" |
| Check | Action if Failed |
|---|---|
| Tile has valid data (not all NoData) | Skip tile, log warning |
| Prediction values in [0, 1] | Clip and log error |
| Tile georeferencing valid | Stop and investigate |
| GPU memory stable | Reduce batch size |
Performed before releasing results (detailed in post-inference.md):
- Visual inspection of sample predictions
- Comparison with known RTS locations
- False positive analysis
- Regional performance assessment
| Technique | Description |
|---|---|
| Tile caching | Cache frequently accessed tiles locally |
| Prefetching | Load next batch while current batch processes |
| COG format | Cloud-Optimized GeoTIFF enables efficient partial reads |
| Batch GCS operations | Upload predictions in batches, not per-tile |
| Technique | Description |
|---|---|
| Mixed precision (FP16) | 2× throughput on tensor cores |
| Batch size tuning | Maximize GPU utilization |
| Multiple streams | Overlap data transfer and compute |
| Model compilation | torch.compile() for additional speedup |
| Configuration | Tiles/Second (est.) | Time for 40M tiles |
|---|---|---|
| 1 scale, no TTA, batch=64 | ~100-200 | 2-4 days |
| 2 scales, minimal TTA, batch=64 | ~50-100 | 4-8 days |
| 2 scales, standard TTA, batch=64 | ~25-50 | 8-16 days |
Note: Estimates are rough; actual performance depends on I/O bandwidth, tile complexity, and GCS latency.
The inference pipeline integrates with the existing PDG (Permafrost Discovery Gateway) workflow infrastructure developed for DARTS inference.
Integration points:
- Input: Basemap tiles from GCS
- Output: Prediction tiles and vectors to GCS
- Logging: Compatible format for PDG monitoring
- Parallelization: Workflow handles VM orchestration
The inference container exposes a CLI interface for PDG workflow integration:
python scripts/inference.py --config configs/inference.yaml --tile-list tiles.csv--config: YAML file specifying model path, GCS paths, scales, TTA config, threshold--tile-list: CSV file with tile IDs and bounding boxes to process (pre-filtered by PDG/RTS team)- Output: Prediction tiles written to GCS path defined in config;
inference_log.jsonupdated on completion
Tile-level parallelism (managed by PDG workflow):
- RTS team generates the full filtered tile grid (CSV)
- PDG team (Luigi/Todd) partitions the CSV into chunks and spawns VMs
- Each VM runs the inference container with its assigned tile list chunk
- RTS team merges outputs after all chunks complete
Within-VM parallelism:
- Single GPU processes tiles in batches
- Multiple CPU workers handle I/O prefetching
- No multi-GPU within single VM (simplifies code)
| Responsibility | Owner |
|---|---|
| Tile grid generation (filtered CSV) | RTS team |
| VM orchestration + tile partitioning | PDG team (Luigi/Todd) |
| Inference Docker container | RTS team |
| Output merging | RTS team |
| Quality control | RTS team |
Interface contract (to finalize with PDG team):
- Input:
configs/inference.yaml+tiles.csv(tile_id, bbox columns) - Output: Prediction tiles at
{config.output_path}/{tile_id}.tif; log at{config.output_path}/inference_log.json
- Model artifacts uploaded to GCS (model, normalization stats, config)
- Docker image built and pushed to container registry
- Tile grid generated and validated
- GCS permissions configured (service account)
- Test inference on small region successful
- Throughput estimate matches budget
- Progress monitoring active
- GPU utilization >90%
- No error accumulation in logs
- Checkpoint saves working
- All tiles processed (compare manifest to grid)
- Merged rasters generated
- Vectorization complete
- Metadata logged
- Sanity checks passed
- Ready for quality control (post-inference.md)
| Issue | Possible Cause | Solution |
|---|---|---|
| OOM errors | Batch size too large | Reduce batch size |
| Slow inference | I/O bottleneck | Enable prefetching, use local cache |
| Inconsistent predictions | Wrong normalization | Verify normalization_stats.json hash |
| Missing tiles in output | Job interrupted | Check manifest, restart from checkpoint |
| High false positive rate | Threshold too low | Re-calibrate threshold on validation set |
| Predictions all zero | Model loading error | Verify model checkpoint, test on known positive |