@@ -8,11 +8,21 @@ embeddings) with n=50 per workload.
88## Purpose and Motivation
99
1010This project was developed as part of the LDE (Large-Scale Data Engineering)
11- course. The ` llmPredict ` native built-in was added to SystemDS in
12- [ PR #2430 ] ( https://github.com/apache/systemds/pull/2430 ) . This PR
13- ([ #2431 ] ( https://github.com/apache/systemds/pull/2431 ) ) contains the
14- benchmarking framework that evaluates ` llmPredict ` against established LLM
15- serving solutions, plus the benchmark results.
11+ course. This PR ([ #2431 ] ( https://github.com/apache/systemds/pull/2431 ) )
12+ contains both the ` llmPredict ` built-in implementation and the benchmarking
13+ framework. (The original ` llmPredict ` PR
14+ [ #2430 ] ( https://github.com/apache/systemds/pull/2430 ) has been closed and
15+ merged into this one.)
16+
17+ ** What this PR adds:**
18+ - ` LlmPredictCPInstruction.java ` -- dedicated CP instruction class for
19+ ` llmPredict ` , extracted from ` ParameterizedBuiltinCPInstruction `
20+ - Structured error handling: ` ConnectException ` , ` SocketTimeoutException ` ,
21+ ` MalformedURLException ` , HTTP non-200 with error body readback
22+ - Negative tests: ` testServerUnreachable ` and ` testInvalidUrl ` with message
23+ assertions
24+ - Benchmark framework comparing OpenAI, vLLM, and SystemDS across 5 workloads
25+ - License headers on all Python files
1626
1727** Research questions:**
1828
@@ -26,9 +36,9 @@ serving solutions, plus the benchmark results.
2636- Built a Python benchmarking framework that runs standardized workloads
2737 against all backends under identical conditions (same prompts, same
2838 evaluation metrics).
29- - The ` llmPredict ` built-in (from PR # 2430 ) goes through the full DML
30- compilation pipeline (parser -> hops -> lops -> CP instruction) and makes
31- HTTP calls to any OpenAI-compatible inference server.
39+ - The ` llmPredict ` built-in goes through the full DML compilation pipeline
40+ (parser -> hops -> lops -> CP instruction) and makes HTTP calls to any
41+ OpenAI-compatible inference server.
3242- GPU backends (vLLM, SystemDS) executed on NVIDIA H100 PCIe (81 GB).
3343 OpenAI ran on local MacBook calling cloud API.
3444 All runs used 50 samples per workload, temperature=0.0 for reproducibility.
0 commit comments