2026-03-04 docs: benchmark OWA 0.877 sync, experiment/test code links, version 0.0.12

eddmpython · eddmpython · commit 693be809f78d · 2026-03-04T16:08:11.000+09:00
diff --git a/docs/benchmarks.ko.md b/docs/benchmarks.ko.md
@@ -61,7 +61,7 @@ Vectrix는 **M3 4개 카테고리 전부**에서 Naive2를 능가하며, M3 Mont
 | 항목 | 버전 / 사양 |
 |------|-------------|
 | Python | 3.10+ |
-| Vectrix | 0.0.10 |
+| Vectrix | 0.0.12 |
 | OS | Windows 11 / Ubuntu 22.04 / macOS 14+ |
 | CPU | x86_64 또는 ARM64 |
 | RAM | 8 GB 이상 |
@@ -72,6 +72,38 @@ Vectrix는 **M3 4개 카테고리 전부**에서 Naive2를 능가하며, M3 Mont
 pip install vectrix
 ```
 
-M4 벤치마크 실험 스크립트: `src/vectrix/experiments/modelCreation/019_dotHybridEngine.py`
+### 실험 코드
+
+모든 실험은 완전히 재현 가능한 Python 스크립트이며, 결과는 docstring에 기록되어 있습니다.
+
+| 실험 | 설명 | 소스 |
+|:-----|:-----|:-----|
+| E019 | DOT-Hybrid 엔진 M4 100K 검증 | [019_dotHybridEngine.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/019_dotHybridEngine.py) |
+| E042 | M4 공식 OWA 검증 | [042_m4OfficialOwa.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/042_m4OfficialOwa.py) |
+| E043 | Holdout validation + auto period detection | [043_dotAutoPeriodHoldout.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/043_dotAutoPeriodHoldout.py) |
+| E044 | Daily/Weekly 전문화 전략 | [044_dailyWeeklySpecialist.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/044_dailyWeeklySpecialist.py) |
+| E045 | 통합 개선 검증 | [045_integratedImprovement.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/045_integratedImprovement.py) |
+| E046 | 최종 통합 규칙 검증 | [046_finalIntegration.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/046_finalIntegration.py) |
+
+전체 실험 현황 및 연구 노트: [STATUS.md](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/STATUS.md)
+
+### 테스트
+
+573개 테스트, 5개 skip — 모든 엔진, 모델, 파이프라인 컴포넌트 커버.
+
+```bash
+pip install vectrix
+pytest tests/ -x -q
+```
+
+| 테스트 모듈 | 개수 | 범위 |
+|:------------|:----:|:-----|
+| [test_all_models.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_all_models.py) | 112 | 30+ 예측 모델 전체 |
+| [test_new_models.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_new_models.py) | 45 | DTSF, ESN, 4Theta 엔진 |
+| [test_engine_utils.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_engine_utils.py) | 55 | ARIMAX, CV, 분해 |
+| [test_easy.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_easy.py) | 33 | Easy API (forecast, analyze, regress) |
+| [test_business.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_business.py) | 45 | 이상치, 백테스트, 메트릭, 시나리오 |
+| [test_adaptive.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_adaptive.py) | 20 | 레짐, DNA, 자가치유, 제약 |
+| [test_regression.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_regression.py) | 22 | OLS, Ridge, Lasso, 진단 |
 
 M4 데이터 파일은 [M4 Competition 저장소](https://github.com/Mcompetitions/M4-methods)에서 다운로드할 수 있습니다.
diff --git a/docs/benchmarks.md b/docs/benchmarks.md
@@ -60,6 +60,38 @@ Vectrix consistently outperforms Naive2 across all M3 categories, with the stron
 pip install vectrix
 ```
 
-M4 benchmark experiment: `src/vectrix/experiments/modelCreation/019_dotHybridEngine.py`
+### Experiment Code
+
+All experiments are fully reproducible Python scripts with results recorded in docstrings.
+
+| Experiment | Description | Source |
+|:-----------|:------------|:-------|
+| E019 | DOT-Hybrid engine M4 100K verification | [019_dotHybridEngine.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/019_dotHybridEngine.py) |
+| E042 | M4 official OWA verification | [042_m4OfficialOwa.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/042_m4OfficialOwa.py) |
+| E043 | Holdout validation + auto period detection | [043_dotAutoPeriodHoldout.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/043_dotAutoPeriodHoldout.py) |
+| E044 | Daily/Weekly specialist strategies | [044_dailyWeeklySpecialist.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/044_dailyWeeklySpecialist.py) |
+| E045 | Integrated improvement verification | [045_integratedImprovement.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/045_integratedImprovement.py) |
+| E046 | Final integration rule validation | [046_finalIntegration.py](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/046_finalIntegration.py) |
+
+Full experiment status and research notes: [STATUS.md](https://github.com/eddmpython/vectrix/blob/master/src/vectrix/experiments/modelCreation/STATUS.md)
+
+### Test Suite
+
+573 tests, 5 skipped — covering all engines, models, and pipeline components.
+
+```bash
+pip install vectrix
+pytest tests/ -x -q
+```
+
+| Test Module | Count | Coverage |
+|:------------|:-----:|:---------|
+| [test_all_models.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_all_models.py) | 112 | All 30+ forecasting models |
+| [test_new_models.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_new_models.py) | 45 | DTSF, ESN, 4Theta engines |
+| [test_engine_utils.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_engine_utils.py) | 55 | ARIMAX, CV, decomposition |
+| [test_easy.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_easy.py) | 33 | Easy API (forecast, analyze, regress) |
+| [test_business.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_business.py) | 45 | Anomaly, backtesting, metrics, scenarios |
+| [test_adaptive.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_adaptive.py) | 20 | Regime, DNA, self-healing, constraints |
+| [test_regression.py](https://github.com/eddmpython/vectrix/blob/master/tests/test_regression.py) | 22 | OLS, Ridge, Lasso, diagnostics |
 
 > **Tip:** For faster M4 data loading, download the CSV files directly from the [M4 Competition repository](https://github.com/Mcompetitions/M4-methods) rather than using `M4.load()`, which can be slow due to wide-to-long data transformation.
diff --git a/landing/src/lib/components/sections/Benchmarks.svelte b/landing/src/lib/components/sections/Benchmarks.svelte
@@ -4,8 +4,8 @@
 
 	const rows = [
 		{ comp: 'M3', yearly: '0.848', quarterly: '0.825', monthly: '0.758', weekly: '—', daily: '—', hourly: '0.819' },
-		{ comp: 'M4', yearly: '0.974', quarterly: '0.797', monthly: '0.987', weekly: '0.737', daily: '1.207', hourly: '1.006' },
-		{ comp: 'M4 Ensemble', yearly: '0.879', quarterly: '0.797', monthly: '0.927', weekly: '0.737', daily: '1.105', hourly: '0.696' }
+		{ comp: 'M4 DOT-Hybrid', yearly: '0.797', quarterly: '0.894', monthly: '0.897', weekly: '0.959', daily: '0.996', hourly: '0.722' },
+		{ comp: 'M4 VX-Ensemble', yearly: '0.879', quarterly: '0.907', monthly: '0.919', weekly: '0.954', daily: '0.996', hourly: '0.696' }
 	];
 
 	function cellClass(val: string): string {
@@ -50,7 +50,7 @@
 		</div>
 
 		<p class="mt-3 text-xs text-vx-text-dim">
-			M4 Ensemble uses VX-Ensemble with DOT + AutoCES + 4Theta + DTSF + ESN. Hourly 0.696 OWA = competition winner level.
+			DOT-Hybrid: single model, AVG OWA 0.877. VX-Ensemble: DOT + AutoCES + 4Theta. Hourly 0.696 OWA = competition winner level.
 		</p>
 		<Button variant="secondary" size="sm" href="{base}/docs/benchmarks" class="mt-6">
 			View full benchmark results →
diff --git a/llms-full.txt b/llms-full.txt
@@ -2,7 +2,7 @@
 
 > Zero-config time series forecasting library for Python. 30+ statistical models with automatic selection, built-in Rust engine (25 accelerated functions), adaptive intelligence, full regression suite, and business analytics.
 
-- Version: 0.0.8
+- Version: 0.0.12
 - Python: >=3.10
 - Core deps: numpy>=1.24, pandas>=2.0, scipy>=1.10
 - Install: `pip install vectrix`
@@ -188,7 +188,7 @@ from vectrix.engine.baseline import NaiveModel, SeasonalNaiveModel, MeanModel, R
 | AutoETS | AutoETS | General purpose, trend+seasonal |
 | AutoARIMA | AutoARIMA | Stationary with complex autocorrelation |
 | Theta | OptimizedTheta | Simple trend extrapolation |
-| DOT | DynamicOptimizedTheta | General purpose (M4 OWA 0.905) |
+| DOT | DynamicOptimizedTheta | General purpose (M4 OWA 0.877) |
 | AutoCES | AutoCES | General purpose (M4 OWA 0.927) |
 | AutoMSTL | AutoMSTL | Multiple seasonality (daily, weekly, yearly) |
 | AutoTBATS | AutoTBATS | Complex multi-seasonal |
@@ -524,7 +524,7 @@ print(TURBO_AVAILABLE)  # True if Rust engine loaded (default for all pip instal
 ```
 
 ### Performance benchmarks (M4 Competition)
-- DOT: OWA 0.905 (general purpose best)
+- DOT: OWA 0.877 (general purpose best)
 - AutoCES: OWA 0.927
 - 4Theta Yearly: OWA 0.879 (= M4 official #11)
 - VX-Ensemble Hourly: OWA 0.696 (winner-level)
diff --git a/llms.txt b/llms.txt
@@ -2,7 +2,7 @@
 
 > Zero-config time series forecasting library for Python. 30+ statistical models, automatic selection, built-in Rust engine. NumPy/SciPy/Pandas with adaptive intelligence, regression, and business analytics.
 
-- Version: 0.0.8
+- Version: 0.0.12
 - Python: 3.10+
 - Dependencies: numpy, pandas, scipy (core only)
 - Install: `pip install vectrix`
@@ -35,10 +35,26 @@ print(reg.summary())
 - [Installation](https://eddmpython.github.io/vectrix/docs/getting-started/installation/): Setup guide (Rust engine built-in)
 - [Quickstart](https://eddmpython.github.io/vectrix/docs/getting-started/quickstart/): 5-minute tutorial
 
+## Benchmarks
+
+M4 Competition 100,000 time series (DOT-Hybrid single model):
+
+| Frequency | OWA |
+|-----------|-----|
+| Yearly | 0.797 |
+| Quarterly | 0.894 |
+| Monthly | 0.897 |
+| Weekly | 0.959 |
+| Daily | 0.996 |
+| Hourly | 0.722 |
+| **AVG** | **0.877** |
+
+Beats M4 #18 Theta (0.897). Full results: [benchmarks](https://eddmpython.github.io/vectrix/docs/benchmarks/)
+
 ## API Reference
 
 - [Forecasting Guide](https://eddmpython.github.io/vectrix/docs/guide/forecasting/): Detailed forecasting workflows
-- [Model Catalog](https://eddmpython.github.io/vectrix/docs/guide/models/): All 30+ models with parameters
+- [Analysis & DNA](https://eddmpython.github.io/vectrix/docs/guide/analysis/): Time series profiling, 65+ features
 - [Regression Guide](https://eddmpython.github.io/vectrix/docs/guide/regression/): OLS, Ridge, Lasso, Huber, Quantile
 - [Adaptive Intelligence](https://eddmpython.github.io/vectrix/docs/guide/adaptive/): Regime detection, self-healing, DNA
 - [Business Analytics](https://eddmpython.github.io/vectrix/docs/guide/business/): Anomaly, scenarios, backtesting
diff --git a/src/vectrix/__init__.py b/src/vectrix/__init__.py
@@ -82,7 +82,7 @@
 )
 from .vectrix import Vectrix
 
-__version__ = "0.0.11"
+__version__ = "0.0.12"
 __all__ = [
     "Vectrix",
     "ForecastResult",
diff --git a/src/vectrix/experiments/modelCreation/STATUS.md b/src/vectrix/experiments/modelCreation/STATUS.md
@@ -241,7 +241,7 @@
 - **결론**: 앙상블 자체가 DOT-only보다 나쁨 (DOT가 이미 최적화)
 
 ### E031-E040 종합 결론
-1. **DOT-Hybrid (0.885)는 순수 통계 모델의 실질적 한계**
+1. **DOT-Hybrid (0.877, holdout 적용 후)는 순수 통계 모델의 실질적 한계**
 2. **메타러닝 최고 = 0.873** (scikit-learn 필요, 현재 미반영)
 3. **앙상블은 DOT-only보다 나쁨** — DOT가 이미 충분히 최적화
 4. **M4 #1 (0.821) 달성에는 DL 하이브리드 필수**
@@ -299,8 +299,71 @@
 - **정직한 위치**: M4 공식 기준 약 14~15위 (Theta 0.897보다는 우수)
 - 주의: 11K 샘플 기준, 100K 전체에서는 Monthly(48K) 비중 증가로 약간 달라질 수 있음
 
+## 043~046: DOT Holdout Validation 실험 (2026-03-04)
+
+### 043 DOT Auto Period Detection + Holdout Validation
+
+| 변형 | Yearly | Quarterly | Monthly | Weekly | Daily | Hourly | **AVG** |
+|------|--------|-----------|---------|--------|-------|--------|---------|
+| baseline | 0.7971 | 0.9053 | 0.9200 | 0.9587 | 0.9949 | 0.7223 | **0.8831** |
+| auto_period | 0.8019 | 0.9053 | 0.9200 | 0.9952 | 1.0220 | 0.7223 | **0.8944** |
+| **holdout_val** | 0.8064 | **0.8940** | **0.8965** | **0.9457** | 0.9918 | 0.7223 | **0.8761** |
+| combined | 0.8084 | 0.8940 | 0.8965 | 0.9831 | 1.0187 | 0.7223 | **0.8872** |
+
+- **auto_period: 기각** — ACF가 노이즈에서 가짜 단주기(2,3) 감지, Daily +2.7%, Weekly +3.8% 악화
+- **holdout_val: 조건부 채택** — Quarterly -1.25%, Monthly -2.55% 개선, Yearly +1.2% 회귀(데이터 축소)
+- **combined: 기각** — auto_period가 holdout 이점을 상쇄
+
+### 044 Daily/Weekly Specialist
+
+- **Weekly classic_only: 채택** (-2.18%) — period=1에서 classic DOT가 Hybrid보다 우수
+- **Daily classic_only: 기각** (+0.98%)
+- **Core3 앙상블 Daily/Weekly: 기각** (+21%/+8%) — CES/4Theta가 period=1에서 해로움
+
+### 045 Integrated Improvement (holdout + Weekly classic)
+
+- **AVG 0.8831→0.8748 (-0.94%)** — 전반적 개선
+- **Yearly +1.16% 회귀** — holdout으로 인한 짧은 시리즈 데이터 축소 문제
+
+### 046 Final Integration (period별 분리)
+
+- **period<=1 classic + period>1 holdout: 기각** — Yearly +11.26% 치명적 회귀!
+- **핵심 발견**: Yearly(period=1)는 Hybrid 8-way가 trend 탐색에 유리, classic 적용 불가
+- **최종 규칙**: period>1에서만 holdout validation 적용 (Quarterly/Monthly만 개선)
+
+### E043-E046 종합 결론
+1. **holdout validation은 period>1 계절성 데이터에서만 유효** (Quarterly -1.25%, Monthly -2.55%)
+2. **ACF auto period detection은 해로움** — 노이즈에서 가짜 주기 감지
+3. **period=1 데이터는 건드리지 않는 것이 안전** — Yearly/Daily/Weekly 모두 기존 방식 유지
+4. **Core3 앙상블은 period=1에서 해로움** — CES/4Theta가 비계절성 데이터에서 약함
+
+### dot.py 반영 사항 (v0.0.12)
+- `_fitHybrid()`: `period > 1 and n >= period * 4`일 때만 holdout validation
+- `_predictVariantSteps()` 헬퍼 메서드 추가
+- holdout 후 전체 데이터로 refit
+- **DOT-Hybrid AVG OWA: 0.885 → 0.877** (period>1만 개선, 나머지 unchanged)
+- 테스트: 573 passed, 5 skipped
+
+## 완료된 단계
+- [x] 3개 모델 engine/ 모듈화 (fit/predict/residuals 인터페이스)
+- [x] types.py에 모델 정보 등록
+- [x] vectrix.py _selectNativeModels에 새 모델 반영
+- [x] 기존 테스트 573개 통과 확인
+- [x] 012 M4 100K 벤치마크 완료
+- [x] 013~015 세상에 없던 새 앙상블/예측 원리 3개 실험 (전부 기각)
+- [x] 016~018 DOT 강화 + SCUM 실험 완료
+- [x] DOT-Hybrid를 engine/dot.py에 통합 (period<24: DOT++, period>=24: classic)
+- [x] Rust dot_hybrid_objective 추가 (26번째 함수)
+- [x] 019 통합 엔진 M4 100K 검증 완료 (OWA 0.885)
+- [x] 031~040 FFORMA 메타러닝 + 모델 선택 최적화 10개 실험 완료
+- [x] auto_arima 기본 풀 제거 반영
+- [x] 041 조건부 앙상블 검증 → core3 우선 앙상블 엔진 반영 (AVG 0.885→0.879)
+- [x] 042 M4 공식 OWA 검증 → 벤치마크 방법론 문제 발견
+- [x] 043~046 DOT holdout validation 실험 → period>1 holdout 엔진 반영 (AVG 0.885→0.877)
+
 ## 다음 단계
 - [ ] DL 하이브리드 (NeuralForecast/TimesFM) 탐색 → M4 #1 (0.821) 도전
+- [ ] Daily OWA 0.996 개선 (period=1 비계절성 데이터 전략)
 - [ ] 4Theta seasonality 처리 개선 (Quarterly/Monthly/Weekly/Daily 약세)
 - [ ] DTSF 단기 시리즈 성능 개선 (n<100에서 약세)
 - [ ] ESN reservoir 크기 자동 조정 (긴 시리즈에서 느림)
diff --git a/src/vectrix/vectrix.py b/src/vectrix/vectrix.py
@@ -69,7 +69,7 @@ class Vectrix:
     Dependencies: numpy, pandas, scipy (required), numba (optional)
     """
 
-    VERSION = "0.0.11"
+    VERSION = "0.0.12"
 
     NATIVE_MODELS = {
         'auto_ets': {

Original file line number	Diff line number	Diff line change
`@@ -82,7 +82,7 @@`
`82`	`82`	`)`
`83`	`83`	`from .vectrix import Vectrix`
`84`	`84`
`85`		`-__version__ = "0.0.11"`
	`85`	`+__version__ = "0.0.12"`
`86`	`86`	`__all__ = [`
`87`	`87`	`"Vectrix",`
`88`	`88`	`"ForecastResult",`