[SPARK-54784] Document the security policy on ml models (#676)

zhengruifeng · celestehorgan · web-flow · commit 88505a758027 · 2026-02-13T08:51:17.000+08:00
* try

* try

* try

* Apply suggestions from code review

Co-authored-by: Celeste Horgan &lt;17999517+celestehorgan@users.noreply.github.com&gt;

* html

---------

Co-authored-by: Celeste Horgan &lt;17999517+celestehorgan@users.noreply.github.com&gt;
diff --git a/security.md b/security.md
@@ -43,6 +43,20 @@ internet or untrusted networks. We recommend access within trusted networks (com
 private cloud environments), using restrict access to the Spark cluster with robust authentication, 
 authorization, and network controls.
 
+<h3>Is loading a machine learning model secure? Who is responsible for model security?</h3> 
+
+Loading an Apache Spark ML model is equivalent to loading and executing code within the Spark runtime.
+
+Spark ML models might contain serialized objects, custom transformers, user-defined expressions, and execution graphs. 
+During model loading, Spark deserializes these components, reconstructs the pipeline, and instantiates runtime objects. 
+This process can invoke executable logic on the Spark driver and executors. 
+Any model, but particularly that is compromised or intentionally created with malicious intent, 
+might execute arbitrary code, access sensitive data, or compromise cluster nodes.
+
+End users must treat Spark ML models with the same level of caution and security scrutiny as any third-party software. 
+This includes verifying the source, validating integrity, and applying appropriate isolation and security controls 
+before loading or deploying a model.
+
 <h2>Known security issues</h2>
 
 <h3 id="CVE-2023-32007">CVE-2023-32007: Apache Spark shell command injection vulnerability via Spark UI</h3>
diff --git a/site/security.html b/site/security.html
@@ -189,6 +189,20 @@ <h3>During a security analysis of Apache Spark, I noticed that Spark allows for
 private cloud environments), using restrict access to the Spark cluster with robust authentication, 
 authorization, and network controls.</p>
 
+<h3>Is loading a machine learning model secure? Who is responsible for model security?</h3>
+
+<p>Loading an Apache Spark ML model is equivalent to loading and executing code within the Spark runtime.</p>
+
+<p>Spark ML models might contain serialized objects, custom transformers, user-defined expressions, and execution graphs. 
+During model loading, Spark deserializes these components, reconstructs the pipeline, and instantiates runtime objects. 
+This process can invoke executable logic on the Spark driver and executors. 
+Any model, but particularly that is compromised or intentionally created with malicious intent, 
+might execute arbitrary code, access sensitive data, or compromise cluster nodes.</p>
+
+<p>End users must treat Spark ML models with the same level of caution and security scrutiny as any third-party software. 
+This includes verifying the source, validating integrity, and applying appropriate isolation and security controls 
+before loading or deploying a model.</p>
+
 <h2>Known security issues</h2>
 
 <h3 id="CVE-2023-32007">CVE-2023-32007: Apache Spark shell command injection vulnerability via Spark UI</h3>