While working on #17713, I identified several additional gaps in GcsPinotFS that should be tracked and fixed. Grouping them here for visibility.
1. Missing null checks on getBlob() callers
GcsPinotFS.getBlob() wraps Storage.get(BlobId) which returns null when the blob does not exist. Several callers do not handle this, leading to potential NPEs
| Method |
Unguarded call |
Impact |
lastModified() |
getBlob(new GcsUri(uri)).getUpdateTime() |
NPE if blob doesn't exist |
touch() |
blob.getUpdateTime(), blob.toBuilder() |
NPE if blob doesn't exist |
touch() |
getBlob(gcsUri).getUpdateTime() |
NPE if blob deleted between update and re-fetch |
open() |
blob.reader() |
NPE if blob doesn't exist |
copyFile() |
blob.copyTo(...) |
NPE on source blob; also leaves an empty destination blob (created on line 482) that needs cleanup |
For comparison, the S3 SDK throws NoSuchKeyException for missing objects, so S3PinotFS gets meaningful errors without explicit null checks. The GCS SDK returns null silently, so GcsPinotFS needs to handle it explicitly.
Methods that already handle null correctly:
2. PinotFS contract violations
touch() does not create files
The PinotFS interface specifies:
Updates the last modified time of an existing file or directory to be current time. If the file system object does not exist, creates an empty file.
LocalPinotFS and S3PinotFS both correctly implement this. S3PinotFS catches NoSuchKeyException and creates an empty object. LocalPinotFS checks !file.exists() and calls file.createNewFile().
GcsPinotFS.touch() does not handle missing files at all. It would NPE instead of creating the file.
lastModified() return value for missing files
The PinotFS interface specifies:
Returns the time the file was last modified or 0L if the file does not exist or if an I/O error occurs
LocalPinotFS.lastModified( returns file.lastModified() which returns 0L for non-existent files. GcsPinotFS would NPE instead.
3. Deprecated getUpdateTime() usage
BlobInfo.getUpdateTime() is deprecated in the GCS SDK in favor of getUpdateTimeOffsetDateTime(). It is used in four places:
lastModified()
touch() (twice)
listFilesWithMetadata()
4. Minor: redundant getBlob call in delete()
The private delete() method calls exists(segmentUri), which internally calls getBlob() for file URIs. Then it calls getBlob(segmentUri) again. This results in two GCS round-trips for a single file deletion. Not a bug but an efficiency concern for high-throughput deletion.
We can have separate PRs for every sub-issue.
While working on #17713, I identified several additional gaps in GcsPinotFS that should be tracked and fixed. Grouping them here for visibility.
1. Missing null checks on
getBlob()callersGcsPinotFS.getBlob()wrapsStorage.get(BlobId)which returns null when the blob does not exist. Several callers do not handle this, leading to potential NPEslastModified()getBlob(new GcsUri(uri)).getUpdateTime()touch()blob.getUpdateTime(), blob.toBuilder()touch()getBlob(gcsUri).getUpdateTime()open()blob.reader()copyFile()blob.copyTo(...)For comparison, the S3 SDK throws
NoSuchKeyExceptionfor missing objects, soS3PinotFSgets meaningful errors without explicit null checks. The GCS SDK returns null silently, soGcsPinotFSneeds to handle it explicitly.Methods that already handle null correctly:
delete():return blob != null && blob.delete()deleteBatch()(fixed in Improve GcsPinotFS deleteBatch resiliency #17713)length(): usescheckState(existsBlob(blob), ...)copyToLocalFile(): usescheckState(existsBlob(blob), ...)2. PinotFS contract violations
touch()does not create filesThe PinotFS interface specifies:
LocalPinotFSandS3PinotFSboth correctly implement this.S3PinotFScatchesNoSuchKeyExceptionand creates an empty object.LocalPinotFSchecks!file.exists()and callsfile.createNewFile().GcsPinotFS.touch()does not handle missing files at all. It would NPE instead of creating the file.lastModified()return value for missing filesThe PinotFS interface specifies:
LocalPinotFS.lastModified(returnsfile.lastModified()which returns 0L for non-existent files.GcsPinotFSwould NPE instead.3. Deprecated getUpdateTime() usage
BlobInfo.getUpdateTime()is deprecated in the GCS SDK in favor ofgetUpdateTimeOffsetDateTime(). It is used in four places:lastModified()touch()(twice)listFilesWithMetadata()4. Minor: redundant getBlob call in
delete()The private
delete()method callsexists(segmentUri), which internally callsgetBlob()for file URIs. Then it callsgetBlob(segmentUri)again. This results in two GCS round-trips for a single file deletion. Not a bug but an efficiency concern for high-throughput deletion.We can have separate PRs for every sub-issue.