HDDS-14800. Guard RocksDB iterator against closed DB during volume failure#9904
HDDS-14800. Guard RocksDB iterator against closed DB during volume failure#9904priyeshkaratha wants to merge 8 commits intoapache:masterfrom
Conversation
|
@priyeshkaratha , could you add a unit test for BackgroundContainerDataScanner? |
22bc2d7 to
765500e
Compare
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java
Show resolved
Hide resolved
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java
Outdated
Show resolved
Hide resolved
...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java
Show resolved
Hide resolved
...framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBStoreIteratorWithDBClose.java
Show resolved
Hide resolved
...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java
Outdated
Show resolved
Hide resolved
|
Let's add some test for the race condition scenarios. |
devmadhuu
left a comment
There was a problem hiding this comment.
@priyeshkaratha overall changes LGTM +1. Just few nits. Pls check.
|
|
||
| @Override | ||
| public final boolean hasNext() { | ||
| if (isDbClosed()) { |
There was a problem hiding this comment.
The isClosed() fast-fail only protects hasNext(). Methods like seekToFirst(), seekToLast(), seek() call native RocksDB directly without any check.
| } | ||
| } | ||
|
|
||
| private void waitAndClose() { |
There was a problem hiding this comment.
nit: Please put a comment - "// Wait until all active operations (including open iterators) complete.
// Iterators acquired after DB close is triggered will fast-fail in
// hasNext(), so this loop is expected to complete quickly in practice."
| public void close() { | ||
| rocksDBIterator.close(); | ||
| if (dbRef != null) { | ||
| dbRef.close(); |
There was a problem hiding this comment.
dbRef is final — it is never nulled out. If close() is called twice, dbRef.close() runs twice. Since dbRef holds counter::getAndDecrement, the counter gets decremented twice. Java's Closeable contract requires that calling close() more than once is safe.
| rocksDBIterator.close(); | ||
| if (dbRef != null) { | ||
| dbRef.close(); | ||
| } |
There was a problem hiding this comment.
| } | |
| private final AtomicBoolean iteratorClosed = new AtomicBoolean(false); | |
| @Override | |
| public void close() { | |
| if (iteratorClosed.compareAndSet(false, true)) { | |
| rocksDBIterator.close(); | |
| if (dbRef != null) { | |
| dbRef.close(); | |
| } | |
| } | |
| } |
| } catch (RocksDatabaseException e) { | ||
| LOG.warn("Failed to acquire DB reference for iterator on table {}: {}", | ||
| table.getName(), e.getMessage()); | ||
| return null; |
There was a problem hiding this comment.
Actually here exception swallowed, , no reference counting guard ? Am I missing something here ?
What changes were proposed in this pull request?
When
StorageVolumeCheckerdetects a volume failure and callsfailVolume(), it closes the underlying RocksDB instance whileBackgroundContainerDataScannerorOnDemandContainerScannermay still hold an active iterator over that DB, calling native RocksDB methods on a closed DB can cause a crash. This fix adds two complementary guards toRDBStoreAbstractIterator: a fast-fail check sohasNext()returns false immediately once the DB is closed (stopping the scan loop without touching native code), and reference counting by acquiring a slot on RocksDatabase.counter at iterator creation and releasing it on close(), so the existingwaitAndClose()mechanism waits for all iterators to finish before physically closing the DB. Together these ensure the scan exits cleanly and the DB cannot be destroyed while an iterator is still in use.What is the link to the Apache JIRA
HDDS-14800
How was this patch tested?
Tested using added unit test cases.