Skip to content

Conversation

@ArbaazKhan1
Copy link
Contributor

Closes Issue #5387

Added shared marker to DataFileValue to track files used by single or multiple tablets. Compactions now delete non-shared files directly, instead of creating GC markers. This Pr adds the following changes:

  • Added Shared field to DataFileValue
  • Compactions attempt deletion for non-shared files
  • Clone, split and bulk import now set shared markers as needed

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also need to modify the split code.

// File is not shared - try to delete directly
try {
Path filePath = new Path(file.getMetadataPath());
boolean deleted = ctx.getVolumeManager().deleteRecursively(filePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting files here is not correct because the conditional mutation to the metadata table has not yet been made. Only after successfully making the conditional mutation will we know what was shared or not. Need to do something like the following.

  1. Modify conditional mutation to check that the file values (this checks the new shared field) are the same for the tablet. This atomically ensures that what is thought to be shared is actually shared when the metadata table update happens. Currently the conditional mutation only checks the set of files, not their values.
  2. After successfully submitting the condtional mutation look at the table metadata and the list of files compacting (from commitData) . Compute a set of files that is in commitData.inputPaths and is marked as not shared in the tablet metadata. If a file is not in tablet metadata, add it to the set because its shared status is unknown (this can happen because of failure and retry). This is the set of compacting files that are known for sure to not be shared, in the case of failure and retry this set should be empty.
  3. If the conditional mutation was not successful, this set above should be empty.
  4. Pass the set computed above to PutGcCandidates
  5. In PutGcCandidates delete non shared files and add gc delete markers for shared files. Doing the actual file deletes in PutGcCandidates makes reasoning about retries (caused by process dying in middle of fate step) much easier.

There is existing code for comparing the files values, but its not exactly what we need. Would probably need to create a new requireFiles(Map<StoredTabletFile, DataFileValue> files) method that uses code similar to this in its impl.

}
}

markSourceFilesAsShared(srcTableId, tableId, context, bw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking the source tablets as shared here has race conditions, these may not be the files in the clone. Also marking them as shared must be done using a conditional writer that only changes the marking if the file exists in the tablet w/ the same value (this check must be done by the conditional mutation).

The following conditions must be true for clone to avoid race conditions.

  1. Files are marked as shared before cloning.
  2. After copying the tablets file they must still exist in the source tablet.

Need to rework the code to do the following algorithm for each tablet.

  1. Before cloning mark all files in the source tablet as shared using a conditional mutation.
  2. Copy the source tablets files to the destination tablet.
  3. Read the source and desitation tablets and ensure the destitnations tablet has a subset of the sources files. If so this tablet was successfully cloned and we know the GC did not delete any files because the source still has them. Done with tablet and do not need to next step.
  4. There was a concurrent change w/ the source tablets files, possible that GC even deleted some of them. Delete the destination tablet and go back to step 1.

The current code does this w/o step 1. Also the current code batches tablet read/writes for efficiency. The code would probably be much easier to understand w/o the batching, but doing one tablet at a time w/o batching would be really slow. So need to modify the code work in step 1 and still do the batching.

log.trace("{}: Started loading files at row: {}", fmtTid, startRow);

List<Map.Entry<KeyExtent,Bulk.Files>> allEntries = new ArrayList<>();
while (lmi.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pulling all of this into memory, it may be better to make two passes over the file. The first pass computes the shared files and the 2nd pass calls this method with the shared files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow compactions to delete non shared input files.

2 participants