Skip to content

[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md#55707

Open
zhengruifeng wants to merge 14 commits into
apache:masterfrom
zhengruifeng:add-test-base-class-guide
Open

[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md#55707
zhengruifeng wants to merge 14 commits into
apache:masterfrom
zhengruifeng:add-test-base-class-guide

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 6, 2026

What changes were proposed in this pull request?

Add a Scala Test Base Classes section to AGENTS.md that documents the layered Scala test base hierarchy in this repo and how to pick a base class for a new test suite. Spark uses the AnyFunSuite ScalaTest style throughout, and the chain is:

SparkFunSuite                                                           (core)
  <- PlanTest                                                           (sql/catalyst)
    <- QueryTest                                                        (sql/core)

QueryTest declares spark: SparkSession abstractly via SparkSessionProvider, so a concrete SQL test suite mixes in one of the session-providing traits:

QueryTest                                                               (abstract `spark`)
  + SharedSparkSession (sql/core)        -> classic in-process `TestSparkSession`
  + TestHiveSingleton  (sql/hive)        -> Hive-backed `TestHive` session

The new section also includes:

  • A decision table mapping test scope (plain JVM, Catalyst plans, SQL/DataFrame with a session) to the right base.
  • A session-provider table noting that SharedSparkSession itself extends QueryTest (so concrete suites just extends SharedSparkSession), while TestHiveSingleton is mixed in alongside QueryTest.
  • A linearization gotcha: the first item in an extends clause must transitively extend a class. Pure helper traits (*ErrorsBase, *Helper) cannot be put first.

CLAUDE.md is a symlink to AGENTS.md, so this change is picked up by both AI agent toolchains.

Why are the changes needed?

Picking the wrong test base class (e.g. extending QueryTest directly when a session is needed, or SparkFunSuite when PlanTest would do) is a common stumble when adding new Scala test suites. The information is currently spread across the source of SparkFunSuite, PlanTest, QueryTest, and the session-providing traits, with no single place that summarizes when to use which. Documenting it in AGENTS.md gives both contributors and AI coding agents a quick reference.

Does this PR introduce any user-facing change?

No. Documentation-only change to a developer/agent guide file.

How was this patch tested?

N/A. Documentation-only change; no code or tests are affected.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude opus-4-7

@zhengruifeng zhengruifeng marked this pull request as ready for review May 6, 2026 12:26
@zhengruifeng zhengruifeng requested a review from cloud-fan May 6, 2026 12:26
Comment thread AGENTS.md Outdated
SparkTestSuite (core; thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError)
PlanTestBase (sql/catalyst; plan-comparison helpers — comparePlans, normalizePlan)
<- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark` via `SparkSessionProvider`)
<- SharedSparkSessionBase (sql/core; provides the actual `TestSparkSession`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm do we need to mention then? Inside Spark we should always pick AnyFunSuite style for consistency

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — agreed. Dropped the style-agnostic *Base chain (and the related SparkTestSuite row + linearization-gotcha example) in 65bb5e6. The section now only documents the SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession chain.

zhengruifeng added a commit to zhengruifeng/spark that referenced this pull request May 7, 2026
…les_for_files

### What changes were proposed in this pull request?

This PR extends `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to
ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`.

### Why are the changes needed?

A documentation-only PR that touches only `AGENTS.md` (e.g.
apache#55707) currently triggers all CI test jobs because
the file is not associated with any submodule, so it falls through to the `root`
module. Neither file affects code or tests, and neither is consumed by the docs build,
so they should be ignored just like `README.md`.

### Does this PR introduce _any_ user-facing change?

No, this is only a testing infra change.

### How was this patch tested?

Updated and ran the doctests in `dev/sparktestsupport/utils.py`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
HyukjinKwon pushed a commit that referenced this pull request May 8, 2026
…les_for_files

### What changes were proposed in this pull request?

Extend `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`.

### Why are the changes needed?

A documentation-only PR that touches only `AGENTS.md` (e.g. #55707, see [run](https://github.com/zhengruifeng/spark/actions/runs/25479939541/job/74761859785)) currently triggers all CI test jobs because the file is not associated with any submodule, so it falls through to the `root` module. Neither file affects code or tests, and neither is consumed by the docs build, so they should be ignored just like `README.md`.

### Does this PR introduce _any_ user-facing change?

No, this is only a testing infra change.

### How was this patch tested?

Updated and ran the doctests in `dev/sparktestsupport/utils.py`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Closes #55731 from zhengruifeng/skip-tests-for-top-level-md.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request May 8, 2026
…les_for_files

Extend `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`.

A documentation-only PR that touches only `AGENTS.md` (e.g. #55707, see [run](https://github.com/zhengruifeng/spark/actions/runs/25479939541/job/74761859785)) currently triggers all CI test jobs because the file is not associated with any submodule, so it falls through to the `root` module. Neither file affects code or tests, and neither is consumed by the docs build, so they should be ignored just like `README.md`.

No, this is only a testing infra change.

Updated and ran the doctests in `dev/sparktestsupport/utils.py`.

Generated-by: Claude Opus 4.7

Closes #55731 from zhengruifeng/skip-tests-for-top-level-md.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit bb72aef)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Add a "Test Base Classes" section to AGENTS.md describing the layered
SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession hierarchy,
when to pick each base, common redundant-mixin patterns to avoid, and
the linearization gotcha when the first item in an `extends` clause is
a pure trait without the SparkFunSuite class chain.

Generated-by: Claude opus-4-7
…sion) in AGENTS.md

Clarify that QueryTest declares `spark` abstractly and that `SharedSparkSession` is one of several session-providing traits in the repo. List the common providers (`SharedSparkSession`, `TestHiveSingleton`, `RemoteSparkSession`) and note the rare manual-override pattern.

Generated-by: Claude opus-4-7
…ENTS.md

Add a "Base chain" column to the session provider table and explain that
RemoteSparkSession directly extends AnyFunSuite (not SparkFunSuite), so
Connect client suites must combine it with QueryTest to recover the
SparkFunSuite-only features (per-test timeout, gridTest, retry,
log-capture).

Generated-by: Claude opus-4-7
…S.md

Add SparkTestSuite as the bottom of the test base class chain — it
holds the common Spark test functionality (thread audit, fixed
timezone/locale, withTempDir, withLogAppender, checkError) and is
style-agnostic. SparkFunSuite is AnyFunSuite + SparkTestSuite. Mention
the alternative-style demonstration suites under core/.../*SparkTestSuite.scala
and update the linearization gotcha to note that SparkTestSuite is
itself a pure trait.

Generated-by: Claude opus-4-7
…TS.md

Drop the RemoteSparkSession row and follow-up paragraph from the session
provider section. The trait does not formally extend SparkSessionProvider
and serves a separate audience (Connect client tests under
sql/connect/client/jvm), so it is out of scope for this guide.

Generated-by: Claude opus-4-7
QueryTest extends SparkFunSuite with QueryTestBase with PlanTest, so
PlanTest is a parent of QueryTest. Indent QueryTest one level deeper
to reflect the actual hierarchy and show QueryTest as PlanTest +
QueryTestBase (PlanTest already brings the SparkFunSuite chain).

Generated-by: Claude opus-4-7
Add a parallel diagram for the style-agnostic `*Base` helper traits
(`PlanTestBase` -> `QueryTestBase` -> `SharedSparkSessionBase`) and note
that `PlanTest` / `QueryTest` / `SharedSparkSession` are the
FunSuite-flavored bundles built on top of them, analogous to
`SparkFunSuite` = `AnyFunSuite` + `SparkTestSuite`.

Generated-by: Claude opus-4-7
Drop the "manual session override", helper-trait combination, and
"common mistakes" subsections from the Test Base Classes section to
keep the guide focused on the layered base hierarchy and the
linearization gotcha.

Generated-by: Claude opus-4-7
Reorder the Test Base Classes section so the style-agnostic helper
traits (SparkTestSuite, PlanTestBase, QueryTestBase,
SharedSparkSessionBase) are introduced first, followed by the
class-bearing FunSuite-style bases that bundle each helper with
AnyFunSuite. The helpers are where the actual test functionality
lives, so leading with them makes the relationship clearer.

Generated-by: Claude opus-4-7
The section covers Scala test bases only (PySpark tests are handled
separately under Build and Test), so the more specific name avoids
confusion.

Generated-by: Claude opus-4-7
Spark uses the AnyFunSuite ScalaTest style throughout, so the
SparkTestSuite / PlanTestBase / QueryTestBase / SharedSparkSessionBase
chain is not relevant to contributors picking a base for a new test
suite. Collapse the section to just the class-bearing chain
(SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession), drop
the SparkTestSuite row from the decision table, fold its helpers into
the SparkFunSuite row, drop the now-redundant base-chain column from
the session-provider table, and trim the linearization gotcha to no
longer cite SparkTestSuite / AnyWordSpec.

Generated-by: Claude opus-4-7
@zhengruifeng zhengruifeng force-pushed the add-test-base-class-guide branch from 65bb5e6 to c89924b Compare May 8, 2026 11:17
Per Wenchen's offline comment: SharedSparkSession is a session
provider, not another rung in the inheritance ladder, so it shouldn't
sit on the same diagram as SparkFunSuite/PlanTest/QueryTest. Stop the
base-class chain at QueryTest, drop the SharedSparkSession row from
the base-class table, and add a dedicated "Providing a SparkSession
for QueryTest" subsection with its own diagram showing how
SharedSparkSession and TestHiveSingleton plug into QueryTest's
abstract `spark`.

Generated-by: Claude opus-4-7
Comment thread AGENTS.md Outdated
| `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. |
| `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g. `class X extends QueryTest with TestHiveSingleton`. Used by tests under `sql/hive`. |

Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). All three bases above plus `SharedSparkSession` carry the `SparkFunSuite` chain, so any of them can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait <Name>"` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this paragraph at all? No *ErrorsBase *Helper is mentioned in this section

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — dropped the paragraph in d22ff68. The four documented bases all carry the SparkFunSuite chain, so the gotcha is unnecessary in this section.

Cloud-fan: the gotcha references `*ErrorsBase` / `*Helper` traits that
aren't mentioned anywhere else in the section, and the four documented
bases all safely carry the `SparkFunSuite` chain. Drop the paragraph.

Generated-by: Claude opus-4-7
Comment thread AGENTS.md Outdated

| Session provider | Module / location | Typical usage |
|---|---|---|
| `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we recommend people to do extends QueryTest with SharedSparkSession? Then it's more consistent with hive

…NTS.md per review

Cloud-fan: recommend the explicit `extends QueryTest with SharedSparkSession`
form so the sql/core row parallels the sql/hive row
(`extends QueryTest with TestHiveSingleton`). Also note that
SharedSparkSession already extends QueryTest for historical reasons, so
readers aren't confused when they see existing suites omit the explicit
mixin.

Generated-by: Claude opus-4-7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants