Supply Chain Late Delivery Risk Prediction & Shipping Cost Optimization

An end-to-end analytics project built entirely in Microsoft Excel — from exploratory analysis through machine learning, Monte Carlo simulation, and constrained optimization.

Uses real supply chain data (180k+ orders from DataCo Global) to predict late deliveries, quantify cost exposure, and find the optimal shipping mode allocation.

The Problem

DataCo Global has a late delivery rate of 54.8% — more than half of all orders arrive late. This drives penalty costs, customer churn, and operational inefficiency. The company uses four shipping modes (Standard Class, First Class, Second Class, Same Day), but the current allocation is not cost-optimized.

The question: Can we predict which orders will be late before they ship, quantify the monthly cost risk, and find a better shipping mode mix?

The Approach

Phase	Method	Sheet
Exploration	Summary statistics, pivot breakdowns, conditional formatting	`1_Data_EDA`
Feature Engineering	Binary encoding, derived features, median-based buckets	`2_Feature_Engineering`
Prediction	Logistic regression (gradient descent, 3,000 iterations)	`3_Logistic_Regression`
Risk Quantification	Monte Carlo simulation (1,000 scenarios, NORMINV + RAND)	`4_Monte_Carlo_Sim`
Optimization	Excel Solver (GRG Nonlinear, constrained allocation)	`5_Solver_Optimization`
Communication	Executive dashboard with KPIs and recommendations	`6_Dashboard`

Key Results

70.8% prediction accuracy using only pre-shipment features (no data leakage)
First Class shipping has a coefficient of +3.89 in the logistic model — selecting it nearly guarantees a late delivery
Same Day shipping has a coefficient of -2.70 — it actively reduces late delivery risk
Monthly cost exposure averages ~$99k with a 95th percentile VaR of ~$128k
Solver optimization can reduce the weighted late rate from 54.5% to under 45%, saving an estimated $5–15k/month

Data

Source: DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS (Mendeley Data, CC BY 4.0)

180,519 orders from DataCo Global
53 original features covering provisioning, production, sales, and distribution
Products: Clothing, Sports, Electronics
Markets: Europe, LATAM, Pacific Asia, USCA, Africa
Sampled to 8,000 orders (stratified) for Excel workability

Citation: Constante, F., Silva, F., & Pereira, A. (2019). DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS. Mendeley Data, V3. doi: 10.17632/8gx2fvg2k6.3

On Data Leakage

This matters. The dataset includes Days for shipping (real) and Delivery Status — both are post-hoc variables that are only known after an order has already been delivered. Using them would give you near-perfect accuracy (the day difference deterministically encodes lateness), but that model is useless in production because you can't know actual shipping days before the order ships.

I deliberately excluded these features. The model uses only what's available at order time: scheduled shipping days, product price, quantity, sales amount, discount rate, shipping mode, customer segment, and market. This drops accuracy from ~100% to 70.8%, but it's a model you could actually deploy.

Workbook Structure

Supply_Chain_Optimization_Completed.xlsx
│
├── 1_Data_EDA
│   ├── 8,000 cleaned order records (15 columns)
│   ├── Summary statistics panel
│   ├── Late rate by shipping mode / segment / market
│   └── Bar chart visualization
│
├── 2_Feature_Engineering
│   ├── 18 model-ready features (all formula-linked to Sheet 1)
│   ├── Binary encoding: 4 shipping modes → 3 dummies
│   ├── Binary encoding: 3 segments → 2 dummies
│   ├── Binary encoding: 5 markets → 4 dummies
│   └── Derived: price bucket, high quantity flag
│
├── 3_Logistic_Regression
│   ├── 18 trained coefficients (gradient descent)
│   ├── Sigmoid function: P(Late) = 1/(1+exp(-z))
│   ├── Confusion matrix: TP=2392, FP=326, TN=3274, FN=2008
│   ├── Metrics: Accuracy=70.8%, Precision=88.0%, Recall=54.4%
│   └── 100-row prediction sample with live formulas
│
├── 4_Monte_Carlo_Sim
│   ├── 10 editable assumptions (blue cells)
│   ├── 1,000 simulated monthly scenarios
│   ├── Random demand via NORMINV(RAND(), mean, stdev)
│   ├── Late rate draws with variability
│   └── Risk summary: mean, VaR 95%, VaR 99%, P(cost > $100k)
│
├── 5_Solver_Optimization
│   ├── 4 shipping modes with cost and late rate parameters
│   ├── Decision variables (yellow cells): allocation percentages
│   ├── Current vs Optimized cost comparison
│   ├── Constraints: sum=100%, min 5%, max 80%, late rate ≤ 45%
│   └── Step-by-step Solver setup instructions
│
└── 6_Dashboard
    ├── 4 KPIs: Late Rate, Accuracy, VaR 95%, Savings
    ├── 5 key findings
    ├── Actionable recommendation
    └── Cost breakdown table (linked to Sheet 5)

How to Use

Open in Excel (not Google Sheets — Solver and NORMINV won't work there)
Press F9 to regenerate Monte Carlo random draws
Go to Sheet 5 → Data tab → Solver
Set up Solver per the on-sheet instructions
Click Solve → the yellow cells update → savings populate across Sheets 5 and 6

Technical Notes

152,566 live formulas, zero hardcoded calculation values
All calculations are dynamic — change an assumption, everything recalculates
Logistic regression coefficients were trained via gradient descent in Python, then placed as blue input cells in Excel. The prediction formulas (sigmoid, threshold) are native Excel
Monte Carlo uses Excel's built-in NORMINV and RAND functions — no VBA, no add-ins
Solver uses GRG Nonlinear method for the constrained optimization

Tools

Microsoft Excel (Solver add-in)
Python (data sampling and coefficient training only — not required to use the workbook)

What I'd Do Next

Add a sensitivity analysis sheet — vary the late penalty cost and see how the optimal allocation shifts
Build a scenario comparison table (best case / base case / worst case)
Connect to live order data via Power Query for real-time risk monitoring
Test additional classifiers (decision tree, k-NN) and compare in a model selection sheet

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Supply_Chain_Optimization_Completed.xlsx		Supply_Chain_Optimization_Completed.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supply Chain Late Delivery Risk Prediction & Shipping Cost Optimization

The Problem

The Approach

Key Results

Data

On Data Leakage

Workbook Structure

How to Use

Technical Notes

Tools

What I'd Do Next

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Supply Chain Late Delivery Risk Prediction & Shipping Cost Optimization

The Problem

The Approach

Key Results

Data

On Data Leakage

Workbook Structure

How to Use

Technical Notes

Tools

What I'd Do Next

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages