Skip to content

Commit 6d1d5fe

Browse files
committed
Update README: llmPredict implementation merged from closed PR apache#2430
1 parent cb9ce4d commit 6d1d5fe

1 file changed

Lines changed: 18 additions & 8 deletions

File tree

scripts/staging/llm-bench/README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,21 @@ embeddings) with n=50 per workload.
88
## Purpose and Motivation
99

1010
This project was developed as part of the LDE (Large-Scale Data Engineering)
11-
course. The `llmPredict` native built-in was added to SystemDS in
12-
[PR #2430](https://github.com/apache/systemds/pull/2430). This PR
13-
([#2431](https://github.com/apache/systemds/pull/2431)) contains the
14-
benchmarking framework that evaluates `llmPredict` against established LLM
15-
serving solutions, plus the benchmark results.
11+
course. This PR ([#2431](https://github.com/apache/systemds/pull/2431))
12+
contains both the `llmPredict` built-in implementation and the benchmarking
13+
framework. (The original `llmPredict` PR
14+
[#2430](https://github.com/apache/systemds/pull/2430) has been closed and
15+
merged into this one.)
16+
17+
**What this PR adds:**
18+
- `LlmPredictCPInstruction.java` -- dedicated CP instruction class for
19+
`llmPredict`, extracted from `ParameterizedBuiltinCPInstruction`
20+
- Structured error handling: `ConnectException`, `SocketTimeoutException`,
21+
`MalformedURLException`, HTTP non-200 with error body readback
22+
- Negative tests: `testServerUnreachable` and `testInvalidUrl` with message
23+
assertions
24+
- Benchmark framework comparing OpenAI, vLLM, and SystemDS across 5 workloads
25+
- License headers on all Python files
1626

1727
**Research questions:**
1828

@@ -26,9 +36,9 @@ serving solutions, plus the benchmark results.
2636
- Built a Python benchmarking framework that runs standardized workloads
2737
against all backends under identical conditions (same prompts, same
2838
evaluation metrics).
29-
- The `llmPredict` built-in (from PR #2430) goes through the full DML
30-
compilation pipeline (parser -> hops -> lops -> CP instruction) and makes
31-
HTTP calls to any OpenAI-compatible inference server.
39+
- The `llmPredict` built-in goes through the full DML compilation pipeline
40+
(parser -> hops -> lops -> CP instruction) and makes HTTP calls to any
41+
OpenAI-compatible inference server.
3242
- GPU backends (vLLM, SystemDS) executed on NVIDIA H100 PCIe (81 GB).
3343
OpenAI ran on local MacBook calling cloud API.
3444
All runs used 50 samples per workload, temperature=0.0 for reproducibility.

0 commit comments

Comments
 (0)