[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md#55707
[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md#55707zhengruifeng wants to merge 14 commits into
Conversation
| SparkTestSuite (core; thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError) | ||
| PlanTestBase (sql/catalyst; plan-comparison helpers — comparePlans, normalizePlan) | ||
| <- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark` via `SparkSessionProvider`) | ||
| <- SharedSparkSessionBase (sql/core; provides the actual `TestSparkSession`) |
There was a problem hiding this comment.
hmmm do we need to mention then? Inside Spark we should always pick AnyFunSuite style for consistency
There was a problem hiding this comment.
Good point — agreed. Dropped the style-agnostic *Base chain (and the related SparkTestSuite row + linearization-gotcha example) in 65bb5e6. The section now only documents the SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession chain.
…les_for_files ### What changes were proposed in this pull request? This PR extends `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`. ### Why are the changes needed? A documentation-only PR that touches only `AGENTS.md` (e.g. apache#55707) currently triggers all CI test jobs because the file is not associated with any submodule, so it falls through to the `root` module. Neither file affects code or tests, and neither is consumed by the docs build, so they should be ignored just like `README.md`. ### Does this PR introduce _any_ user-facing change? No, this is only a testing infra change. ### How was this patch tested? Updated and ran the doctests in `dev/sparktestsupport/utils.py`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7
…les_for_files ### What changes were proposed in this pull request? Extend `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`. ### Why are the changes needed? A documentation-only PR that touches only `AGENTS.md` (e.g. #55707, see [run](https://github.com/zhengruifeng/spark/actions/runs/25479939541/job/74761859785)) currently triggers all CI test jobs because the file is not associated with any submodule, so it falls through to the `root` module. Neither file affects code or tests, and neither is consumed by the docs build, so they should be ignored just like `README.md`. ### Does this PR introduce _any_ user-facing change? No, this is only a testing infra change. ### How was this patch tested? Updated and ran the doctests in `dev/sparktestsupport/utils.py`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 Closes #55731 from zhengruifeng/skip-tests-for-top-level-md. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…les_for_files Extend `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`. A documentation-only PR that touches only `AGENTS.md` (e.g. #55707, see [run](https://github.com/zhengruifeng/spark/actions/runs/25479939541/job/74761859785)) currently triggers all CI test jobs because the file is not associated with any submodule, so it falls through to the `root` module. Neither file affects code or tests, and neither is consumed by the docs build, so they should be ignored just like `README.md`. No, this is only a testing infra change. Updated and ran the doctests in `dev/sparktestsupport/utils.py`. Generated-by: Claude Opus 4.7 Closes #55731 from zhengruifeng/skip-tests-for-top-level-md. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit bb72aef) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Add a "Test Base Classes" section to AGENTS.md describing the layered SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession hierarchy, when to pick each base, common redundant-mixin patterns to avoid, and the linearization gotcha when the first item in an `extends` clause is a pure trait without the SparkFunSuite class chain. Generated-by: Claude opus-4-7
…sion) in AGENTS.md Clarify that QueryTest declares `spark` abstractly and that `SharedSparkSession` is one of several session-providing traits in the repo. List the common providers (`SharedSparkSession`, `TestHiveSingleton`, `RemoteSparkSession`) and note the rare manual-override pattern. Generated-by: Claude opus-4-7
…ENTS.md Add a "Base chain" column to the session provider table and explain that RemoteSparkSession directly extends AnyFunSuite (not SparkFunSuite), so Connect client suites must combine it with QueryTest to recover the SparkFunSuite-only features (per-test timeout, gridTest, retry, log-capture). Generated-by: Claude opus-4-7
…S.md Add SparkTestSuite as the bottom of the test base class chain — it holds the common Spark test functionality (thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError) and is style-agnostic. SparkFunSuite is AnyFunSuite + SparkTestSuite. Mention the alternative-style demonstration suites under core/.../*SparkTestSuite.scala and update the linearization gotcha to note that SparkTestSuite is itself a pure trait. Generated-by: Claude opus-4-7
…TS.md Drop the RemoteSparkSession row and follow-up paragraph from the session provider section. The trait does not formally extend SparkSessionProvider and serves a separate audience (Connect client tests under sql/connect/client/jvm), so it is out of scope for this guide. Generated-by: Claude opus-4-7
QueryTest extends SparkFunSuite with QueryTestBase with PlanTest, so PlanTest is a parent of QueryTest. Indent QueryTest one level deeper to reflect the actual hierarchy and show QueryTest as PlanTest + QueryTestBase (PlanTest already brings the SparkFunSuite chain). Generated-by: Claude opus-4-7
Add a parallel diagram for the style-agnostic `*Base` helper traits (`PlanTestBase` -> `QueryTestBase` -> `SharedSparkSessionBase`) and note that `PlanTest` / `QueryTest` / `SharedSparkSession` are the FunSuite-flavored bundles built on top of them, analogous to `SparkFunSuite` = `AnyFunSuite` + `SparkTestSuite`. Generated-by: Claude opus-4-7
Drop the "manual session override", helper-trait combination, and "common mistakes" subsections from the Test Base Classes section to keep the guide focused on the layered base hierarchy and the linearization gotcha. Generated-by: Claude opus-4-7
Reorder the Test Base Classes section so the style-agnostic helper traits (SparkTestSuite, PlanTestBase, QueryTestBase, SharedSparkSessionBase) are introduced first, followed by the class-bearing FunSuite-style bases that bundle each helper with AnyFunSuite. The helpers are where the actual test functionality lives, so leading with them makes the relationship clearer. Generated-by: Claude opus-4-7
The section covers Scala test bases only (PySpark tests are handled separately under Build and Test), so the more specific name avoids confusion. Generated-by: Claude opus-4-7
Spark uses the AnyFunSuite ScalaTest style throughout, so the SparkTestSuite / PlanTestBase / QueryTestBase / SharedSparkSessionBase chain is not relevant to contributors picking a base for a new test suite. Collapse the section to just the class-bearing chain (SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession), drop the SparkTestSuite row from the decision table, fold its helpers into the SparkFunSuite row, drop the now-redundant base-chain column from the session-provider table, and trim the linearization gotcha to no longer cite SparkTestSuite / AnyWordSpec. Generated-by: Claude opus-4-7
65bb5e6 to
c89924b
Compare
Per Wenchen's offline comment: SharedSparkSession is a session provider, not another rung in the inheritance ladder, so it shouldn't sit on the same diagram as SparkFunSuite/PlanTest/QueryTest. Stop the base-class chain at QueryTest, drop the SharedSparkSession row from the base-class table, and add a dedicated "Providing a SparkSession for QueryTest" subsection with its own diagram showing how SharedSparkSession and TestHiveSingleton plug into QueryTest's abstract `spark`. Generated-by: Claude opus-4-7
| | `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. | | ||
| | `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g. `class X extends QueryTest with TestHiveSingleton`. Used by tests under `sql/hive`. | | ||
|
|
||
| Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). All three bases above plus `SharedSparkSession` carry the `SparkFunSuite` chain, so any of them can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait <Name>"` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. |
There was a problem hiding this comment.
do we need this paragraph at all? No *ErrorsBase *Helper is mentioned in this section
There was a problem hiding this comment.
Good point — dropped the paragraph in d22ff68. The four documented bases all carry the SparkFunSuite chain, so the gotcha is unnecessary in this section.
Cloud-fan: the gotcha references `*ErrorsBase` / `*Helper` traits that aren't mentioned anywhere else in the section, and the four documented bases all safely carry the `SparkFunSuite` chain. Drop the paragraph. Generated-by: Claude opus-4-7
|
|
||
| | Session provider | Module / location | Typical usage | | ||
| |---|---|---| | ||
| | `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. | |
There was a problem hiding this comment.
shall we recommend people to do extends QueryTest with SharedSparkSession? Then it's more consistent with hive
…NTS.md per review Cloud-fan: recommend the explicit `extends QueryTest with SharedSparkSession` form so the sql/core row parallels the sql/hive row (`extends QueryTest with TestHiveSingleton`). Also note that SharedSparkSession already extends QueryTest for historical reasons, so readers aren't confused when they see existing suites omit the explicit mixin. Generated-by: Claude opus-4-7
What changes were proposed in this pull request?
Add a
Scala Test Base Classessection toAGENTS.mdthat documents the layered Scala test base hierarchy in this repo and how to pick a base class for a new test suite. Spark uses theAnyFunSuiteScalaTest style throughout, and the chain is:QueryTestdeclaresspark: SparkSessionabstractly viaSparkSessionProvider, so a concrete SQL test suite mixes in one of the session-providing traits:The new section also includes:
SharedSparkSessionitself extendsQueryTest(so concrete suites justextends SharedSparkSession), whileTestHiveSingletonis mixed in alongsideQueryTest.extendsclause must transitively extend a class. Pure helper traits (*ErrorsBase,*Helper) cannot be put first.CLAUDE.mdis a symlink toAGENTS.md, so this change is picked up by both AI agent toolchains.Why are the changes needed?
Picking the wrong test base class (e.g. extending
QueryTestdirectly when a session is needed, orSparkFunSuitewhenPlanTestwould do) is a common stumble when adding new Scala test suites. The information is currently spread across the source ofSparkFunSuite,PlanTest,QueryTest, and the session-providing traits, with no single place that summarizes when to use which. Documenting it inAGENTS.mdgives both contributors and AI coding agents a quick reference.Does this PR introduce any user-facing change?
No. Documentation-only change to a developer/agent guide file.
How was this patch tested?
N/A. Documentation-only change; no code or tests are affected.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude opus-4-7