-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
137 lines (126 loc) · 5.31 KB
/
index.html
File metadata and controls
137 lines (126 loc) · 5.31 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>EvalOps Workbench</title>
<meta name="description" content="A local-first evaluation harness for prompts, tools, and agents with regression tracking and experiment history." />
<meta name="theme-color" content="#07111f" />
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;700&family=IBM+Plex+Mono:wght@400;500&display=swap" rel="stylesheet" />
<link rel="stylesheet" href="/styles.css" />
<style>
:root {
--bg: #07111f;
--bg-alt: #0d1a2c;
--panel: #10233c;
--text: #eff6ff;
--muted: #9fb3c8;
--accent: #5eead4;
--accent-alt: #60a5fa;
--border: rgba(159, 179, 200, 0.18);
}
</style>
</head>
<body>
<div class="page-shell">
<header class="hero">
<nav class="topbar">
<span class="brand">evalops-workbench</span>
<div class="links">
<a href="https://github.com/IgnazioDS/evalops-workbench">GitHub</a>
<a href="https://github.com/IgnazioDS/evalops-workbench/blob/main/docs/architecture.md">Architecture</a>
<a href="https://github.com/IgnazioDS/evalops-workbench/blob/main/docs/roadmap.md">Roadmap</a>
</div>
</nav>
<section class="hero-grid">
<div>
<p class="eyebrow">Shipping System</p>
<h1>EvalOps Workbench</h1>
<p class="lede">A local-first evaluation harness for prompts, tools, and agents with regression tracking and experiment history.</p>
<div class="hero-actions">
<a class="button primary" href="https://github.com/IgnazioDS/evalops-workbench">Open Repository</a>
<a class="button secondary" href="https://github.com/IgnazioDS/evalops-workbench/blob/main/README.md">Read Docs</a>
</div>
</div>
<aside class="card status-panel">
<div class="status-row">
<span>Status</span>
<strong>Researching</strong>
</div>
<div class="status-row">
<span>Category</span>
<strong>Developer Tool</strong>
</div>
<div class="status-row">
<span>Track</span>
<strong>LLM</strong>
</div>
<div class="status-row">
<span>Audience</span>
<strong>Agent builders, prompt engineers, applied AI teams</strong>
</div>
</aside>
</section>
</header>
<main>
<section class="grid two-up">
<article class="card narrative">
<p class="eyebrow">Problem</p>
<h2>Operational pain, made explicit.</h2>
<p>LLM teams lack a lightweight way to compare prompt and tool changes before shipping.</p>
</article>
<article class="card narrative">
<p class="eyebrow">Why Now</p>
<h2>Built for a market that already feels the gap.</h2>
<p>Evaluation is moving from optional best practice to baseline engineering hygiene.</p>
</article>
</section>
<section class="section-head">
<p class="eyebrow">Core Capabilities</p>
<h2>Focused scope, credible surface area.</h2>
</section>
<section class="grid capabilities">
<article class="card capability">
<span class="eyebrow">Capability 1</span>
<h3>Load datasets from JSON or CSV</h3>
<p>Designed as a production-facing workflow instead of a throwaway demo path.</p>
</article>
<article class="card capability">
<span class="eyebrow">Capability 2</span>
<h3>Run prompt or agent variants</h3>
<p>Designed as a production-facing workflow instead of a throwaway demo path.</p>
</article>
<article class="card capability">
<span class="eyebrow">Capability 3</span>
<h3>Score outputs with rubric functions</h3>
<p>Designed as a production-facing workflow instead of a throwaway demo path.</p>
</article>
<article class="card capability">
<span class="eyebrow">Capability 4</span>
<h3>Compare runs and export regressions</h3>
<p>Designed as a production-facing workflow instead of a throwaway demo path.</p>
</article>
</section>
<section class="grid two-up lower">
<article class="card stack-panel">
<p class="eyebrow">Stack Direction</p>
<h2>Implementation posture</h2>
<ul class="stack-list">
<li>Python</li><li>Typer</li><li>DuckDB</li><li>OpenTelemetry</li>
</ul>
</article>
<article class="card command-panel">
<p class="eyebrow">Local Entry Points</p>
<h2>Minimal interface, easy to demo</h2>
<pre><code>uv run evalops-workbench summary
uv run evalops-workbench capabilities
uv run evalops-workbench roadmap
vercel deploy -y</code></pre>
</article>
</section>
</main>
</div>
</body>
</html>