From e642f7e0ab11164bb5253163f98df26dc3867b13 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Thu, 30 Apr 2026 04:48:32 +0000 Subject: [PATCH 01/14] [INFRA] Document test base class hierarchy in AGENTS.md Add a "Test Base Classes" section to AGENTS.md describing the layered SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession hierarchy, when to pick each base, common redundant-mixin patterns to avoid, and the linearization gotcha when the first item in an `extends` clause is a pure trait without the SparkFunSuite class chain. Generated-by: Claude opus-4-7 --- AGENTS.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 96f5b7917caea..33ddc01dd54a2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,6 +20,33 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m Avoid introducing non-ASCII characters in code or comments. String literals may contain non-ASCII when the content requires it (error messages, test data, etc.). Identifiers are ASCII by convention. The common failure mode is typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) sneaking into comments; scalastyle flags some of these. Spot-check before committing: `grep -rn -P "[^\x00-\x7F]" `. +## Test Base Classes + +When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. The chain is layered — each adds capability on top of the previous: + + SparkFunSuite (core) + <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) + <- QueryTest = SparkFunSuite + QueryTestBase + PlanTest (sql/core) + <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) + +| Test scope | Base | Notes | +|------------|------|-------| +| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, retry, `gridTest`. | +| Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | +| DataFrame helpers — no session | `QueryTest` | Rarely used on its own; mostly a building block for other traits (e.g. `FileBasedDataSourceTest`). | +| SQL/DataFrame integration tests — needs a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | + +Helper traits like `ParquetTest`, `OrcTest`, `FileBasedDataSourceTest`, `DDLCommandTestUtils` already extend `QueryTest` but do NOT provide a `SparkSession`. Combine with `SharedSparkSession` when a session is needed (e.g. `extends ParquetTest with SharedSparkSession`). + +Common mistakes in the `extends` clause: +- `extends QueryTest with SharedSparkSession` — `QueryTest` is redundant (`SharedSparkSession` already extends it). Use `extends SharedSparkSession`. +- `extends QueryTest with ParquetTest` / `with FileBasedDataSourceTest` / `with OrcTest` — `QueryTest` is redundant; these helper traits all extend `QueryTest`. +- `extends OrcTest with QueryTestBase` — `OrcTest -> QueryTest -> QueryTestBase` already. +- `extends QueryTest with CommandSuiteBase with DDLCommandTestUtils` — both `CommandSuiteBase` (via `SharedSparkSession`) and `DDLCommandTestUtils` extend `QueryTest`, so `QueryTest` is redundant. +- `extends ParquetTest with SharedSparkSession` is NOT redundant — the format helper trait and the session trait bring different things. + +Linearization gotcha: the first item in the `extends` clause must transitively extend `SparkFunSuite` (an `abstract class`, not a trait). All four bases above carry that chain. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. + ## Build and Test Build and tests can take a long time. If the user explicitly asked to run tests, run them. Otherwise (you are running tests on your own to verify a change), first ask the user if they have more changes to make. From 69e1255a8d7007ec2c49d1a0b00e61125b0f16de Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 11:57:46 +0000 Subject: [PATCH 02/14] [INFRA] Document session providers (TestHiveSingleton, RemoteSparkSession) in AGENTS.md Clarify that QueryTest declares `spark` abstractly and that `SharedSparkSession` is one of several session-providing traits in the repo. List the common providers (`SharedSparkSession`, `TestHiveSingleton`, `RemoteSparkSession`) and note the rare manual-override pattern. Generated-by: Claude opus-4-7 --- AGENTS.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 33ddc01dd54a2..cd7cd7e60eef7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -33,10 +33,20 @@ When writing a new Scala test suite, pick the lowest base class that provides wh |------------|------|-------| | Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, retry, `gridTest`. | | Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | -| DataFrame helpers — no session | `QueryTest` | Rarely used on its own; mostly a building block for other traits (e.g. `FileBasedDataSourceTest`). | -| SQL/DataFrame integration tests — needs a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | +| SQL/DataFrame helpers — abstract `spark` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. Cannot be instantiated alone — `spark` is abstract and must be supplied by a session-providing trait. | +| SQL/DataFrame integration tests — provides a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared classic `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | -Helper traits like `ParquetTest`, `OrcTest`, `FileBasedDataSourceTest`, `DDLCommandTestUtils` already extend `QueryTest` but do NOT provide a `SparkSession`. Combine with `SharedSparkSession` when a session is needed (e.g. `extends ParquetTest with SharedSparkSession`). +`QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`. To run a concrete suite, mix in a session-providing trait. The common providers in this repo are: + +| Session provider | Module / location | Use case | +|---|---|---| +| `SharedSparkSession` | `sql/core` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | +| `TestHiveSingleton` | `sql/hive` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | +| `RemoteSparkSession` | `sql/connect/client/jvm` | Spark Connect client session. Used by Connect client tests. | + +A handful of suites (e.g. `MapStatusEndToEndSuite`, `ExecutorSideSQLConfSuite`) override `spark` directly and manage the session lifecycle themselves; reach for that pattern only when none of the providers above fit (e.g. local-cluster mode, custom builder configs). + +Helper traits like `ParquetTest`, `OrcTest`, `FileBasedDataSourceTest`, `DDLCommandTestUtils` already extend `QueryTest` but do NOT supply a session. Combine them with one of the session providers above (e.g. `extends ParquetTest with SharedSparkSession`, `extends QueryTest with TestHiveSingleton`). Common mistakes in the `extends` clause: - `extends QueryTest with SharedSparkSession` — `QueryTest` is redundant (`SharedSparkSession` already extends it). Use `extends SharedSparkSession`. From fc612dd3326e1c083a0f43cc50aff1db82f8b84c Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:00:16 +0000 Subject: [PATCH 03/14] [INFRA] Note that RemoteSparkSession is non-SparkFunSuite-based in AGENTS.md Add a "Base chain" column to the session provider table and explain that RemoteSparkSession directly extends AnyFunSuite (not SparkFunSuite), so Connect client suites must combine it with QueryTest to recover the SparkFunSuite-only features (per-test timeout, gridTest, retry, log-capture). Generated-by: Claude opus-4-7 --- AGENTS.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index cd7cd7e60eef7..121454b4b249d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -38,11 +38,13 @@ When writing a new Scala test suite, pick the lowest base class that provides wh `QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`. To run a concrete suite, mix in a session-providing trait. The common providers in this repo are: -| Session provider | Module / location | Use case | -|---|---|---| -| `SharedSparkSession` | `sql/core` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | -| `TestHiveSingleton` | `sql/hive` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | -| `RemoteSparkSession` | `sql/connect/client/jvm` | Spark Connect client session. Used by Connect client tests. | +| Session provider | Module / location | Base chain | Use case | +|---|---|---|---| +| `SharedSparkSession` | `sql/core` | `QueryTest` -> `SparkFunSuite` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | +| `TestHiveSingleton` | `sql/hive` | `SparkFunSuite` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | +| `RemoteSparkSession` | `sql/connect/client/jvm` | `AnyFunSuite` (NOT `SparkFunSuite`) | Spark Connect client session. Used by Connect client tests. | + +`RemoteSparkSession` is the odd one out: it directly extends `AnyFunSuite` (with `// scalastyle:ignore funsuite`) instead of going through `SparkFunSuite`, so on its own it lacks SparkFunSuite-only features (per-test timeout, `gridTest`, `retry`, log-capture). Connect client suites therefore combine it with `QueryTest` (e.g. `class DataFrameSuite extends QueryTest with RemoteSparkSession`) to bring the `SparkFunSuite` chain back in. A handful of suites (e.g. `MapStatusEndToEndSuite`, `ExecutorSideSQLConfSuite`) override `spark` directly and manage the session lifecycle themselves; reach for that pattern only when none of the providers above fit (e.g. local-cluster mode, custom builder configs). From cfa3aa9c23ac77f26affb15b8bcd1fea438ac77e Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:02:58 +0000 Subject: [PATCH 04/14] [INFRA] Document SparkTestSuite as style-agnostic foundation in AGENTS.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add SparkTestSuite as the bottom of the test base class chain — it holds the common Spark test functionality (thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError) and is style-agnostic. SparkFunSuite is AnyFunSuite + SparkTestSuite. Mention the alternative-style demonstration suites under core/.../*SparkTestSuite.scala and update the linearization gotcha to note that SparkTestSuite is itself a pure trait. Generated-by: Claude opus-4-7 --- AGENTS.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 121454b4b249d..eb3093da8a72d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -24,14 +24,16 @@ Avoid introducing non-ASCII characters in code or comments. String literals may When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. The chain is layered — each adds capability on top of the previous: - SparkFunSuite (core) - <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) - <- QueryTest = SparkFunSuite + QueryTestBase + PlanTest (sql/core) - <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) + SparkTestSuite (core; style-agnostic foundation) + <- SparkFunSuite = AnyFunSuite + SparkTestSuite (core; pins the FunSuite style — the default) + <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) + <- QueryTest = SparkFunSuite + QueryTestBase + PlanTest (sql/core) + <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) | Test scope | Base | Notes | |------------|------|-------| -| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, retry, `gridTest`. | +| Non-FunSuite ScalaTest style (rare) | `SparkTestSuite` | Style-agnostic trait holding the common Spark test functionality (thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`, etc.). Mix with any ScalaTest style — see `core/src/test/scala/org/apache/spark/{WordSpec,FunSpec,FlatSpec,...}SparkTestSuite.scala` for examples. Real Spark tests should use `SparkFunSuite` instead. | +| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. = `AnyFunSuite + SparkTestSuite`. Adds per-test timeout, `testRetry`, `gridTest` on top of `SparkTestSuite`. | | Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | | SQL/DataFrame helpers — abstract `spark` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. Cannot be instantiated alone — `spark` is abstract and must be supplied by a session-providing trait. | | SQL/DataFrame integration tests — provides a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared classic `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | @@ -57,7 +59,7 @@ Common mistakes in the `extends` clause: - `extends QueryTest with CommandSuiteBase with DDLCommandTestUtils` — both `CommandSuiteBase` (via `SharedSparkSession`) and `DDLCommandTestUtils` extend `QueryTest`, so `QueryTest` is redundant. - `extends ParquetTest with SharedSparkSession` is NOT redundant — the format helper trait and the session trait bring different things. -Linearization gotcha: the first item in the `extends` clause must transitively extend `SparkFunSuite` (an `abstract class`, not a trait). All four bases above carry that chain. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. +Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). Of the bases above, `SparkFunSuite`, `PlanTest`, `QueryTest`, and `SharedSparkSession` all carry the `SparkFunSuite` -> `AnyFunSuite` chain. `SparkTestSuite` is a pure trait and does NOT — if you use it directly you must put a ScalaTest style class first (e.g. `class X extends AnyWordSpec with SparkTestSuite`). The same applies to other "pure helper" traits (`*ErrorsBase`, `*Helper`): if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. ## Build and Test From b1d297121392a29712f4e242bf283b31cfdf611d Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:05:05 +0000 Subject: [PATCH 05/14] [INFRA] Remove RemoteSparkSession from session-provider table in AGENTS.md Drop the RemoteSparkSession row and follow-up paragraph from the session provider section. The trait does not formally extend SparkSessionProvider and serves a separate audience (Connect client tests under sql/connect/client/jvm), so it is out of scope for this guide. Generated-by: Claude opus-4-7 --- AGENTS.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index eb3093da8a72d..a082df4b4e213 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -44,9 +44,6 @@ When writing a new Scala test suite, pick the lowest base class that provides wh |---|---|---|---| | `SharedSparkSession` | `sql/core` | `QueryTest` -> `SparkFunSuite` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | | `TestHiveSingleton` | `sql/hive` | `SparkFunSuite` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | -| `RemoteSparkSession` | `sql/connect/client/jvm` | `AnyFunSuite` (NOT `SparkFunSuite`) | Spark Connect client session. Used by Connect client tests. | - -`RemoteSparkSession` is the odd one out: it directly extends `AnyFunSuite` (with `// scalastyle:ignore funsuite`) instead of going through `SparkFunSuite`, so on its own it lacks SparkFunSuite-only features (per-test timeout, `gridTest`, `retry`, log-capture). Connect client suites therefore combine it with `QueryTest` (e.g. `class DataFrameSuite extends QueryTest with RemoteSparkSession`) to bring the `SparkFunSuite` chain back in. A handful of suites (e.g. `MapStatusEndToEndSuite`, `ExecutorSideSQLConfSuite`) override `spark` directly and manage the session lifecycle themselves; reach for that pattern only when none of the providers above fit (e.g. local-cluster mode, custom builder configs). From 39de84b44509b128a6a35b370182f849717237e2 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:06:03 +0000 Subject: [PATCH 06/14] [INFRA] Fix test-base-class chain diagram in AGENTS.md QueryTest extends SparkFunSuite with QueryTestBase with PlanTest, so PlanTest is a parent of QueryTest. Indent QueryTest one level deeper to reflect the actual hierarchy and show QueryTest as PlanTest + QueryTestBase (PlanTest already brings the SparkFunSuite chain). Generated-by: Claude opus-4-7 --- AGENTS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index a082df4b4e213..6eefa295d9d2d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -27,8 +27,8 @@ When writing a new Scala test suite, pick the lowest base class that provides wh SparkTestSuite (core; style-agnostic foundation) <- SparkFunSuite = AnyFunSuite + SparkTestSuite (core; pins the FunSuite style — the default) <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) - <- QueryTest = SparkFunSuite + QueryTestBase + PlanTest (sql/core) - <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) + <- QueryTest = PlanTest + QueryTestBase (sql/core) + <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) | Test scope | Base | Notes | |------------|------|-------| From 0fd8a39587099815408e1f5a61598aafc52f2cf4 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:09:58 +0000 Subject: [PATCH 07/14] [INFRA] Show PlanTestBase / QueryTestBase chain in AGENTS.md Add a parallel diagram for the style-agnostic `*Base` helper traits (`PlanTestBase` -> `QueryTestBase` -> `SharedSparkSessionBase`) and note that `PlanTest` / `QueryTest` / `SharedSparkSession` are the FunSuite-flavored bundles built on top of them, analogous to `SparkFunSuite` = `AnyFunSuite` + `SparkTestSuite`. Generated-by: Claude opus-4-7 --- AGENTS.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 6eefa295d9d2d..ff6959f5a939c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -30,6 +30,14 @@ When writing a new Scala test suite, pick the lowest base class that provides wh <- QueryTest = PlanTest + QueryTestBase (sql/core) <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) +Each class-bearing test base above pairs `SparkFunSuite` with a style-agnostic `*Base` trait that actually holds the helpers. Those `*Base` traits form a parallel chain: + + PlanTestBase (sql/catalyst; pure trait — plan-comparison helpers) + <- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark`) + <- SharedSparkSessionBase (sql/core; provides the actual `TestSparkSession`) + +`PlanTest`, `QueryTest`, and `SharedSparkSession` are the FunSuite-flavored bundles built on top of these (analogous to how `SparkFunSuite` = `AnyFunSuite` + `SparkTestSuite`). + | Test scope | Base | Notes | |------------|------|-------| | Non-FunSuite ScalaTest style (rare) | `SparkTestSuite` | Style-agnostic trait holding the common Spark test functionality (thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`, etc.). Mix with any ScalaTest style — see `core/src/test/scala/org/apache/spark/{WordSpec,FunSpec,FlatSpec,...}SparkTestSuite.scala` for examples. Real Spark tests should use `SparkFunSuite` instead. | From 40c1993387220ec07885c7bfaa63399750f7c433 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:12:52 +0000 Subject: [PATCH 08/14] [INFRA] Trim redundant-mixin and manual-session sections from AGENTS.md Drop the "manual session override", helper-trait combination, and "common mistakes" subsections from the Test Base Classes section to keep the guide focused on the layered base hierarchy and the linearization gotcha. Generated-by: Claude opus-4-7 --- AGENTS.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index ff6959f5a939c..de77ebccd22c7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -53,17 +53,6 @@ Each class-bearing test base above pairs `SparkFunSuite` with a style-agnostic ` | `SharedSparkSession` | `sql/core` | `QueryTest` -> `SparkFunSuite` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | | `TestHiveSingleton` | `sql/hive` | `SparkFunSuite` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | -A handful of suites (e.g. `MapStatusEndToEndSuite`, `ExecutorSideSQLConfSuite`) override `spark` directly and manage the session lifecycle themselves; reach for that pattern only when none of the providers above fit (e.g. local-cluster mode, custom builder configs). - -Helper traits like `ParquetTest`, `OrcTest`, `FileBasedDataSourceTest`, `DDLCommandTestUtils` already extend `QueryTest` but do NOT supply a session. Combine them with one of the session providers above (e.g. `extends ParquetTest with SharedSparkSession`, `extends QueryTest with TestHiveSingleton`). - -Common mistakes in the `extends` clause: -- `extends QueryTest with SharedSparkSession` — `QueryTest` is redundant (`SharedSparkSession` already extends it). Use `extends SharedSparkSession`. -- `extends QueryTest with ParquetTest` / `with FileBasedDataSourceTest` / `with OrcTest` — `QueryTest` is redundant; these helper traits all extend `QueryTest`. -- `extends OrcTest with QueryTestBase` — `OrcTest -> QueryTest -> QueryTestBase` already. -- `extends QueryTest with CommandSuiteBase with DDLCommandTestUtils` — both `CommandSuiteBase` (via `SharedSparkSession`) and `DDLCommandTestUtils` extend `QueryTest`, so `QueryTest` is redundant. -- `extends ParquetTest with SharedSparkSession` is NOT redundant — the format helper trait and the session trait bring different things. - Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). Of the bases above, `SparkFunSuite`, `PlanTest`, `QueryTest`, and `SharedSparkSession` all carry the `SparkFunSuite` -> `AnyFunSuite` chain. `SparkTestSuite` is a pure trait and does NOT — if you use it directly you must put a ScalaTest style class first (e.g. `class X extends AnyWordSpec with SparkTestSuite`). The same applies to other "pure helper" traits (`*ErrorsBase`, `*Helper`): if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. ## Build and Test From b50223b55f8e17a67af369bc8d21df932f96223f Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:20:01 +0000 Subject: [PATCH 09/14] [INFRA] Lead with style-agnostic helper traits in AGENTS.md Reorder the Test Base Classes section so the style-agnostic helper traits (SparkTestSuite, PlanTestBase, QueryTestBase, SharedSparkSessionBase) are introduced first, followed by the class-bearing FunSuite-style bases that bundle each helper with AnyFunSuite. The helpers are where the actual test functionality lives, so leading with them makes the relationship clearer. Generated-by: Claude opus-4-7 --- AGENTS.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index de77ebccd22c7..21f813fffdba2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -22,21 +22,21 @@ Avoid introducing non-ASCII characters in code or comments. String literals may ## Test Base Classes -When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. The chain is layered — each adds capability on top of the previous: +When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. - SparkTestSuite (core; style-agnostic foundation) - <- SparkFunSuite = AnyFunSuite + SparkTestSuite (core; pins the FunSuite style — the default) - <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) - <- QueryTest = PlanTest + QueryTestBase (sql/core) - <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) +The actual test helpers live in style-agnostic traits that do not commit to a ScalaTest style: -Each class-bearing test base above pairs `SparkFunSuite` with a style-agnostic `*Base` trait that actually holds the helpers. Those `*Base` traits form a parallel chain: - - PlanTestBase (sql/catalyst; pure trait — plan-comparison helpers) - <- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark`) + SparkTestSuite (core; thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError) + PlanTestBase (sql/catalyst; plan-comparison helpers — comparePlans, normalizePlan) + <- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark` via `SparkSessionProvider`) <- SharedSparkSessionBase (sql/core; provides the actual `TestSparkSession`) -`PlanTest`, `QueryTest`, and `SharedSparkSession` are the FunSuite-flavored bundles built on top of these (analogous to how `SparkFunSuite` = `AnyFunSuite` + `SparkTestSuite`). +The class-bearing test bases bundle each style-agnostic helper with `AnyFunSuite` (the default ScalaTest style) so concrete suites can extend them directly. They form a layered chain — each adds capability on top of the previous: + + SparkFunSuite = AnyFunSuite + SparkTestSuite (core; pins the FunSuite style — the default) + <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) + <- QueryTest = PlanTest + QueryTestBase (sql/core) + <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) | Test scope | Base | Notes | |------------|------|-------| From 1298dc335d62e190f0f138612af0ed985314f995 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 6 May 2026 12:23:56 +0000 Subject: [PATCH 10/14] [INFRA] Rename Test Base Classes section to Scala-specific in AGENTS.md The section covers Scala test bases only (PySpark tests are handled separately under Build and Test), so the more specific name avoids confusion. Generated-by: Claude opus-4-7 --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 21f813fffdba2..c42ce05c05485 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,7 +20,7 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m Avoid introducing non-ASCII characters in code or comments. String literals may contain non-ASCII when the content requires it (error messages, test data, etc.). Identifiers are ASCII by convention. The common failure mode is typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) sneaking into comments; scalastyle flags some of these. Spot-check before committing: `grep -rn -P "[^\x00-\x7F]" `. -## Test Base Classes +## Scala Test Base Classes When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. From c89924b7462838d554ac669f1eb555abf9856444 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Thu, 7 May 2026 06:32:11 +0000 Subject: [PATCH 11/14] [INFRA] Drop style-agnostic *Base chain from AGENTS.md per review Spark uses the AnyFunSuite ScalaTest style throughout, so the SparkTestSuite / PlanTestBase / QueryTestBase / SharedSparkSessionBase chain is not relevant to contributors picking a base for a new test suite. Collapse the section to just the class-bearing chain (SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession), drop the SparkTestSuite row from the decision table, fold its helpers into the SparkFunSuite row, drop the now-redundant base-chain column from the session-provider table, and trim the linearization gotcha to no longer cite SparkTestSuite / AnyWordSpec. Generated-by: Claude opus-4-7 --- AGENTS.md | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index c42ce05c05485..f69793f3e8a3d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -22,38 +22,28 @@ Avoid introducing non-ASCII characters in code or comments. String literals may ## Scala Test Base Classes -When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. +When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. Spark uses the `AnyFunSuite` ScalaTest style throughout, so the bases below are the chain to choose from. Each adds capability on top of the previous: -The actual test helpers live in style-agnostic traits that do not commit to a ScalaTest style: - - SparkTestSuite (core; thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError) - PlanTestBase (sql/catalyst; plan-comparison helpers — comparePlans, normalizePlan) - <- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark` via `SparkSessionProvider`) - <- SharedSparkSessionBase (sql/core; provides the actual `TestSparkSession`) - -The class-bearing test bases bundle each style-agnostic helper with `AnyFunSuite` (the default ScalaTest style) so concrete suites can extend them directly. They form a layered chain — each adds capability on top of the previous: - - SparkFunSuite = AnyFunSuite + SparkTestSuite (core; pins the FunSuite style — the default) - <- PlanTest = SparkFunSuite + PlanTestBase (sql/catalyst) - <- QueryTest = PlanTest + QueryTestBase (sql/core) - <- SharedSparkSession = QueryTest + SharedSparkSessionBase (sql/core) + SparkFunSuite (core) + <- PlanTest (sql/catalyst) + <- QueryTest (sql/core) + <- SharedSparkSession (sql/core) | Test scope | Base | Notes | |------------|------|-------| -| Non-FunSuite ScalaTest style (rare) | `SparkTestSuite` | Style-agnostic trait holding the common Spark test functionality (thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`, etc.). Mix with any ScalaTest style — see `core/src/test/scala/org/apache/spark/{WordSpec,FunSpec,FlatSpec,...}SparkTestSuite.scala` for examples. Real Spark tests should use `SparkFunSuite` instead. | -| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. = `AnyFunSuite + SparkTestSuite`. Adds per-test timeout, `testRetry`, `gridTest` on top of `SparkTestSuite`. | +| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`, thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`. | | Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | | SQL/DataFrame helpers — abstract `spark` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. Cannot be instantiated alone — `spark` is abstract and must be supplied by a session-providing trait. | | SQL/DataFrame integration tests — provides a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared classic `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | `QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`. To run a concrete suite, mix in a session-providing trait. The common providers in this repo are: -| Session provider | Module / location | Base chain | Use case | -|---|---|---|---| -| `SharedSparkSession` | `sql/core` | `QueryTest` -> `SparkFunSuite` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | -| `TestHiveSingleton` | `sql/hive` | `SparkFunSuite` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | +| Session provider | Module / location | Use case | +|---|---|---| +| `SharedSparkSession` | `sql/core` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | +| `TestHiveSingleton` | `sql/hive` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | -Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). Of the bases above, `SparkFunSuite`, `PlanTest`, `QueryTest`, and `SharedSparkSession` all carry the `SparkFunSuite` -> `AnyFunSuite` chain. `SparkTestSuite` is a pure trait and does NOT — if you use it directly you must put a ScalaTest style class first (e.g. `class X extends AnyWordSpec with SparkTestSuite`). The same applies to other "pure helper" traits (`*ErrorsBase`, `*Helper`): if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. +Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). The four bases above all carry the `SparkFunSuite` chain, so they can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. ## Build and Test From bc821cb33d109042a2496b1b5859814bacdec50b Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Mon, 11 May 2026 06:45:08 +0000 Subject: [PATCH 12/14] [INFRA] Separate session providers from base-class chain in AGENTS.md Per Wenchen's offline comment: SharedSparkSession is a session provider, not another rung in the inheritance ladder, so it shouldn't sit on the same diagram as SparkFunSuite/PlanTest/QueryTest. Stop the base-class chain at QueryTest, drop the SharedSparkSession row from the base-class table, and add a dedicated "Providing a SparkSession for QueryTest" subsection with its own diagram showing how SharedSparkSession and TestHiveSingleton plug into QueryTest's abstract `spark`. Generated-by: Claude opus-4-7 --- AGENTS.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index f69793f3e8a3d..7b755916b2e7d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -27,23 +27,27 @@ When writing a new Scala test suite, pick the lowest base class that provides wh SparkFunSuite (core) <- PlanTest (sql/catalyst) <- QueryTest (sql/core) - <- SharedSparkSession (sql/core) | Test scope | Base | Notes | |------------|------|-------| | Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`, thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`. | | Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | -| SQL/DataFrame helpers — abstract `spark` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. Cannot be instantiated alone — `spark` is abstract and must be supplied by a session-providing trait. | -| SQL/DataFrame integration tests — provides a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared classic `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | +| SQL/DataFrame tests — needs a `SparkSession` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. `spark: SparkSession` is abstract and must be supplied by a session-providing trait (see below). | -`QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`. To run a concrete suite, mix in a session-providing trait. The common providers in this repo are: +### Providing a `SparkSession` for `QueryTest` -| Session provider | Module / location | Use case | +`QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`, so it cannot be instantiated on its own. A concrete suite mixes in one of the session-providing traits below: + + QueryTest (abstract `spark`) + + SharedSparkSession (sql/core) -> classic in-process `TestSparkSession` + + TestHiveSingleton (sql/hive) -> Hive-backed `TestHive` session + +| Session provider | Module / location | Typical usage | |---|---|---| -| `SharedSparkSession` | `sql/core` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | -| `TestHiveSingleton` | `sql/hive` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | +| `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. | +| `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g. `class X extends QueryTest with TestHiveSingleton`. Used by tests under `sql/hive`. | -Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). The four bases above all carry the `SparkFunSuite` chain, so they can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. +Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). All three bases above plus `SharedSparkSession` carry the `SparkFunSuite` chain, so any of them can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. ## Build and Test From d22ff688ff64b6cd5fe4202d2fd30884c27ec036 Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Mon, 11 May 2026 12:41:46 +0000 Subject: [PATCH 13/14] [INFRA] Drop linearization-gotcha paragraph from AGENTS.md per review Cloud-fan: the gotcha references `*ErrorsBase` / `*Helper` traits that aren't mentioned anywhere else in the section, and the four documented bases all safely carry the `SparkFunSuite` chain. Drop the paragraph. Generated-by: Claude opus-4-7 --- AGENTS.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 7b755916b2e7d..6ddb4038c2cca 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -47,8 +47,6 @@ When writing a new Scala test suite, pick the lowest base class that provides wh | `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. | | `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g. `class X extends QueryTest with TestHiveSingleton`. Used by tests under `sql/hive`. | -Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). All three bases above plus `SharedSparkSession` carry the `SparkFunSuite` chain, so any of them can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. - ## Build and Test Build and tests can take a long time. If the user explicitly asked to run tests, run them. Otherwise (you are running tests on your own to verify a change), first ask the user if they have more changes to make. From e5a92a2be9007ba9aadd07a4c4a5b04b45bee03d Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Wed, 13 May 2026 09:10:10 +0000 Subject: [PATCH 14/14] [INFRA] Recommend explicit `QueryTest with SharedSparkSession` in AGENTS.md per review Cloud-fan: recommend the explicit `extends QueryTest with SharedSparkSession` form so the sql/core row parallels the sql/hive row (`extends QueryTest with TestHiveSingleton`). Also note that SharedSparkSession already extends QueryTest for historical reasons, so readers aren't confused when they see existing suites omit the explicit mixin. Generated-by: Claude opus-4-7 --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 6ddb4038c2cca..28944c9d78108 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -44,7 +44,7 @@ When writing a new Scala test suite, pick the lowest base class that provides wh | Session provider | Module / location | Typical usage | |---|---|---| -| `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. | +| `SharedSparkSession` | `sql/core` | Already extends `QueryTest` for historical reasons, but still mix in `QueryTest` explicitly, e.g. `class X extends QueryTest with SharedSparkSession`. Default for tests under `sql/core`. | | `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g. `class X extends QueryTest with TestHiveSingleton`. Used by tests under `sql/hive`. | ## Build and Test