HDDS-14239. Simplify Prepare Batch looping for DeleteRange Op processing in RDBBatchOperation#9553
HDDS-14239. Simplify Prepare Batch looping for DeleteRange Op processing in RDBBatchOperation#9553swamirishi wants to merge 3 commits intoapache:masterfrom
Conversation
891711f to
78e1c8d
Compare
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
@swamirishi, are you working on this? |
|
…nges and simple logic Change-Id: I7a06f5a427ba7d1c4f760ff70fcb4093abaa3cbd
78e1c8d to
8ac6358
Compare
|
@szetszwo @jojochuang Can you take a look at this change? |
|
@swamirishi , in our current usage, is it the case that DeleteRange won't mix with other |
No that is not the case @szetszwo it completely depends on the transactions that happen together in the same double buffer. This sort of intersection can happen even in the same DirectoryPurgeRequest, look at #9423 DirectoryDeletingService purge path request could be opening up deleted sub directory(move from DirectoryTable to deletedDirectoryTable) and also deleting the same entry from deletedDirectoryTable because of the optimizeDirDeleteAndSubmitRequest in the same transaction. |
|
This PR #9690 improves the efficiency of finding out the intersection from O(n) to log(n) |
@swamirishi , I agree. I mean that the current code does not have such transactions and we are not going to support such code.
Let's talk about the correctness before talking about efficiency. As you mentioned in the description, the previously implementation (quoted below) indeed has a bug. So this change is very risky.
|
It was not a bug but just inefficient. We are good as long as the order of operations are correct. |
We had done it the earlier way consciously since the DELETE RANGE was only being used for Snapshot CREATE and there was going to be no intersection of DELETE RANGE with any of the operation beyond one transaction since SNAPSHOT CREATE splits the double buffer batch. This changes proposed here is for more generic use cases like @aryangupta1998 's PR change which is also a safe change. The earlier implementation had an Order of O(n + k) and with this PR we would make it O(n * k) where n is the number of Single Key Op and k is number of Delete range operations in a batch(P.S. we don't anticipate having a lot of delete range in one batch K should be ideally much smaller than n). The next PR brings this efficiency down to O(n Log k). |
|
@swamirishi , sorry that this does not look like safe change since it dramatically changes RDBBatchOperation. In particular, this PR adds the new
|
That is not even possible. It means double buffer would have to understand how the DBResponse interface is going to flush the transaction on a db. One transaction can do both put, delete and delete range. We can never guarantee one transaction is going to have only delete range operation. All of them need to be flushed in the same rocksdb batch for atomicity guarantees. |
Today, we don't have batch DeleteRange and everything works. If we add pure batch DeleteRange tomorrow, would anything become not working? |
Change-Id: Ia3db219dab70a7a2faf82009e8f0dac59c665938
Change-Id: Icae0e1299fcb7d37bd5a0f3c2f405537b1d68f3e
We don't have deleteRange because we reverted #8774 and snapshot create was using delete range operation. Without this patch we are going to kill snapshot create performance and the optimization to improve rocksdb performance in case of huge deletes. |
I see you point. How about extending RDBBatchOperation in a subclass, say RDBBatchOperationWithDeleteRange? The new code goes to RDBBatchOperationWithDeleteRange while keeping RDBBatchOperation unchanged. In case there is a bug, people not using OM snapshot won't be affected. |
Are you saying we have separate implementation and make it configurable? Then it would mean we would have to branch the logic OmResponse where it would have to understand which implementation has been passed |
Configurable is a good idea:
No. We could add a RDBBatchOperationFactory. |
What changes were proposed in this pull request?
The PrepareBatchOperation looping was very convoluted in HDDS-13415 (#8774) implementation. It also misses a case where a putKey/deleteKey can get added even though a deleteRange has been executed in the next batch after the following continuousDeleteRange batch. The following example will explain the scenario better.
Put Key1
DeleteRange Key2 - Key5
Put Key2
DeleteRange Key1 - Key4
Here the operation 1 should ideally be cancelled by Op4. But currently both Op1 & OP4 gets executed however it would be more optimal to just execute Op4 & Op1 is redundant.
Initially while implementing the deleteRangeWithBatch I had plans to optimize this the deleteRange search by sorting the continuous ranges and do a binary search to figure out the range matching the entry to reduce overall complexity. However we saw the code was getting more complex and not giving us much returns since deleteRange as an op was going to be infrequent and was only going to be used by snapshot create.
But the deleteRange is also going to be used by DirectoryPurgeRequest removing entries from the file table and directory table. This optimization would help reduce complexity of rocksdb compactions if a key is committed and also deleted within the same double buffer batch.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14239
How was this patch tested?
Updated unit tests. I intend to add another unit test suggested in this comment
#8774 (comment)