feat(cubestore): GCS Workload Identity Federation (WIF/ADC) support#10498
Open
KrishnaRMaddikara wants to merge 3 commits intocube-js:masterfrom
Open
feat(cubestore): GCS Workload Identity Federation (WIF/ADC) support#10498KrishnaRMaddikara wants to merge 3 commits intocube-js:masterfrom
KrishnaRMaddikara wants to merge 3 commits intocube-js:masterfrom
Conversation
…ote storage Replace cloud_storage crate with object_store crate for GCS remote filesystem. Problem: - cloud_storage crate requires SERVICE_ACCOUNT or SERVICE_ACCOUNT_JSON env vars - Panics without them on GKE with Workload Identity (WIF) configured - Affects all GKE users running CubeStore with Workload Identity Fix: - Switch to object_store crate (already a transitive dep via Arrow/DataFusion) - GoogleCloudStorageBuilder::from_env() supports full ADC credential chain: 1. CUBESTORE_GCP_KEY_FILE (backward compat) 2. CUBESTORE_GCP_CREDENTIALS (backward compat) 3. GKE Workload Identity via metadata server (the main fix) 4. GOOGLE_APPLICATION_CREDENTIALS file 5. gcloud CLI credentials (dev machines) Backward compatible: existing SA key file deployments continue to work unchanged. Closes cube-js#9837 Closes cube-js#7279
…nt, comments - Upload: replace fs::read() full-buffer with streaming PutPayload::from_stream() - Download: replace bytes().await full-buffer with chunked stream to disk - Add legacy mapping for SERVICE_ACCOUNT_JSON, GOOGLE_APPLICATION_CREDENTIALS_JSON - Fix page count ceiling division: (len + 999) / 1000 - Fix head() comment: workaround is for object_store path handling, not GCS consistency
…tResult - Remove leftover second store.put(data) call in upload_file (data undefined) - Remove get_result.bytes().await before into_stream() — both consume GetResult - Fix misleading 'GCS consistency error' wording in check_upload_file
Author
|
Both blockers are resolved in commit 59e7538:
Ready for review. |
Author
|
I’ve addressed the follow-up review points in the latest commits: Latest commits on the PR branch: This should now be ready for another review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CubeStore's GCS remote storage implementation uses the
cloud_storagecrate,which requires
SERVICE_ACCOUNTorSERVICE_ACCOUNT_JSONenvironment variablesand panics when they are absent. This makes CubeStore completely non-functional
on GKE clusters that use Workload Identity Federation (WIF) — the recommended
GCP authentication model for Kubernetes workloads — where no service account key
file is needed or available.
Affects:
GOOGLE_APPLICATION_CREDENTIALS(ADC file path)gcloud auth application-default loginSolution
Replace
cloud_storagewithobject_storecrate for GCS operations.object_storeis already a transitive dependency of CubeStore via ApacheArrow/DataFusion, so this adds no new dependencies.
GoogleCloudStorageBuilder::from_env()supports the full ADC credential chainin priority order:
CUBESTORE_GCP_KEY_FILE— existing env var, backward compatibleCUBESTORE_GCP_CREDENTIALS— existing env var, backward compatible169.254.169.254← main fixGOOGLE_APPLICATION_CREDENTIALSfile path (standard ADC)gcloudCLI credentials (developer machines)Backward Compatibility
Backward compatible for existing CUBESTORE_GCP_KEY_FILE,
CUBESTORE_GCP_CREDENTIALS, and common ADC-based environments.
Legacy JSON aliases (SERVICE_ACCOUNT_JSON,
GOOGLE_APPLICATION_CREDENTIALS_JSON) are mapped with deprecation warnings.
Testing
Tested on GKE with:
serviceAccount@project.iam.gserviceaccount.comroles/storage.objectAdminon the export bucketChanges in v2 (follow-up commit)
(prevents OOM on workers with large pre-aggregation files)
Addressed all review feedback — pushed 2 follow-up commits:
Commit 2 (fix: streaming + legacy aliases):
fs::read()full-buffer withPutPayload::from_stream()— prevents OOM on large pre-aggregation filesbytes().awaitfull-buffer with chunkedinto_stream()write to diskSERVICE_ACCOUNT_JSON/GOOGLE_APPLICATION_CREDENTIALS_JSONaliases with deprecation warnings(len + 999) / 1000head()comment — workaround is forobject_storepath normalization, not GCS consistencyCommit 3 (fix: compile blockers):
store.put(data)call inupload_file(leftover from old buffered implementation,datawas undefined)get_result.bytes().awaitbeforeinto_stream()— both consumeGetResult, only streaming path remainsPR description updated: "Fully backward compatible" → scoped to commonly used credential env vars.
Closes #9837
Closes #7279