Skip to content

IGNITE-28634 Fix ClassCastException in IgniteTxManager.localTx when DHT remote tx is in thread context#13258

Open
brat-kuzma wants to merge 1 commit into
apache:masterfrom
brat-kuzma:ignite-28634
Open

IGNITE-28634 Fix ClassCastException in IgniteTxManager.localTx when DHT remote tx is in thread context#13258
brat-kuzma wants to merge 1 commit into
apache:masterfrom
brat-kuzma:ignite-28634

Conversation

@brat-kuzma

Copy link
Copy Markdown
Contributor

Problem

IgniteTxManager.localTx() used the generic method tx() with an implicit unchecked cast to IgniteTxLocalAdapter:

@Nullable public IgniteTxLocalAdapter localTx() {
    IgniteTxLocalAdapter tx = tx();  // Java infers T=IgniteTxLocalAdapter → implicit unchecked cast
    return tx != null && tx.local() ? tx : null;
}

tx() is declared as public <T extends IgniteInternalTx> T tx(). When assigning to IgniteTxLocalAdapter tx, the compiler inserts an implicit (IgniteTxLocalAdapter) cast that executes before the tx.local() check.

During the DHT commit phase, a GridDhtTxRemote can be present in the thread context. Since GridDhtTxRemote and IgniteTxLocalAdapter are unrelated branches of the type hierarchy (both extend IgniteTxAdapter but not each other), the cast throws ClassCastException:

Caused by: java.lang.ClassCastException: class GridDhtTxRemote cannot be cast to class IgniteTxLocalAdapter
    at IgniteTxManager.localTx(IgniteTxManager.java:960)
    at GridCacheMapEntry.currentTx(GridCacheMapEntry.java:3316)
    at GridCacheMapEntry.expireTime(GridCacheMapEntry.java:3268)
    at GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:663)

This causes commitIfLocked to fail with IgniteTxHeuristicCheckedException, which can hang the Cache 6 test suite.

Fix

Changed the variable type to IgniteInternalTx to eliminate the implicit generic cast, and added an explicit instanceof check before casting:

@Nullable public IgniteTxLocalAdapter localTx() {
    IgniteInternalTx tx = tx();
    return tx instanceof IgniteTxLocalAdapter && tx.local() ? (IgniteTxLocalAdapter)tx : null;
}

Testing

GridCacheNearRemoveFailureTest (part of IgniteCacheFailoverTestSuite / Cache 6):

  • Without the fix: ClassCastException reproduced on the first run
  • With the fix: Tests run: 3, Failures: 0, Errors: 0

@ignitetcbot

Copy link
Copy Markdown
Contributor

TCBot Test Analysis

Possible Blockers (17)

  • Thin Client: Java: 2 tests
    • ClientTestSuite: CacheEntryListenersTest.testContinuousQueriesWithConcurrentCompute - Test has low fail rate in base branch 0,0% and is not flaky
    • ClientTestSuite: CacheAsyncTest.testGetAsyncReportsCorrectIgniteFutureStates - Test has low fail rate in base branch 0,0% and is not flaky
  • Queries 2: 0 tests JVM CRASH , Exit Code , Failure on metric
    • Jvm Crash is a blocker. Base branch critical fail rate is 4,9% Failure on metric is a blocker. Base branch critical fail rate is 4,9% Exit Code is a blocker. Base branch fail rate is 31,4%
  • Platform C++ CMake (Win x64 | Release): 0 tests JVM CRASH
    • Jvm Crash is a blocker. Base branch critical fail rate is 3,9%
  • PDS 1: 1 tests
    • IgnitePdsTestSuite: DistributedMetaStoragePersistentTest.testUnstableTopology - Test has low fail rate in base branch 0,0% and is not flaky
  • ZooKeeper (Discovery) 1: 11 tests
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_Coordinator1_1 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_Coordinator1 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_Coordinator2 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_Coordinator3 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_Coordinator4 - Test has low fail rate in base branch 1,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore1 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore2 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_NonCoordinator1 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testConnectionRestore_NonCoordinator2 - Test has low fail rate in base branch 0,0% and is not flaky
    • ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySegmentationAndConnectionRestoreTest.testSegmentation1 - Test has low fail rate in base branch 0,0% and is not flaky
    • ... and 1 more test blockers
  • Snapshots 2: 1 tests
    • IgniteSnapshotTestSuite2: IgniteClusterSnapshotRestoreSelfTest.testClusterSnapshotRestoreOnSmallerTopology[encryption=true, onlyPrimay=false] - Test has low fail rate in base branch 0,0% and is not flaky

New Tests (0)

No new tests found.

@ignitetcbot

Copy link
Copy Markdown
Contributor

TCBot Test Analysis

Possible Blockers (0)

No blockers found.

New Tests (0)

No new tests found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants