Skip to content

Commit 88505a7

Browse files
[SPARK-54784] Document the security policy on ml models (#676)
* try * try * try * Apply suggestions from code review Co-authored-by: Celeste Horgan <17999517+celestehorgan@users.noreply.github.com> * html --------- Co-authored-by: Celeste Horgan <17999517+celestehorgan@users.noreply.github.com>
1 parent 3def12e commit 88505a7

2 files changed

Lines changed: 28 additions & 0 deletions

File tree

security.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,20 @@ internet or untrusted networks. We recommend access within trusted networks (com
4343
private cloud environments), using restrict access to the Spark cluster with robust authentication,
4444
authorization, and network controls.
4545

46+
<h3>Is loading a machine learning model secure? Who is responsible for model security?</h3>
47+
48+
Loading an Apache Spark ML model is equivalent to loading and executing code within the Spark runtime.
49+
50+
Spark ML models might contain serialized objects, custom transformers, user-defined expressions, and execution graphs.
51+
During model loading, Spark deserializes these components, reconstructs the pipeline, and instantiates runtime objects.
52+
This process can invoke executable logic on the Spark driver and executors.
53+
Any model, but particularly that is compromised or intentionally created with malicious intent,
54+
might execute arbitrary code, access sensitive data, or compromise cluster nodes.
55+
56+
End users must treat Spark ML models with the same level of caution and security scrutiny as any third-party software.
57+
This includes verifying the source, validating integrity, and applying appropriate isolation and security controls
58+
before loading or deploying a model.
59+
4660
<h2>Known security issues</h2>
4761

4862
<h3 id="CVE-2023-32007">CVE-2023-32007: Apache Spark shell command injection vulnerability via Spark UI</h3>

site/security.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,20 @@ <h3>During a security analysis of Apache Spark, I noticed that Spark allows for
189189
private cloud environments), using restrict access to the Spark cluster with robust authentication,
190190
authorization, and network controls.</p>
191191

192+
<h3>Is loading a machine learning model secure? Who is responsible for model security?</h3>
193+
194+
<p>Loading an Apache Spark ML model is equivalent to loading and executing code within the Spark runtime.</p>
195+
196+
<p>Spark ML models might contain serialized objects, custom transformers, user-defined expressions, and execution graphs.
197+
During model loading, Spark deserializes these components, reconstructs the pipeline, and instantiates runtime objects.
198+
This process can invoke executable logic on the Spark driver and executors.
199+
Any model, but particularly that is compromised or intentionally created with malicious intent,
200+
might execute arbitrary code, access sensitive data, or compromise cluster nodes.</p>
201+
202+
<p>End users must treat Spark ML models with the same level of caution and security scrutiny as any third-party software.
203+
This includes verifying the source, validating integrity, and applying appropriate isolation and security controls
204+
before loading or deploying a model.</p>
205+
192206
<h2>Known security issues</h2>
193207

194208
<h3 id="CVE-2023-32007">CVE-2023-32007: Apache Spark shell command injection vulnerability via Spark UI</h3>

0 commit comments

Comments
 (0)