Skip to content

cli: add upload-server destination for debug zip upload#168310

Open
visheshbardia wants to merge 1 commit intocockroachdb:masterfrom
visheshbardia:upload-debugZ-to-blob
Open

cli: add upload-server destination for debug zip upload#168310
visheshbardia wants to merge 1 commit intocockroachdb:masterfrom
visheshbardia:upload-debugZ-to-blob

Conversation

@visheshbardia
Copy link
Copy Markdown
Contributor

Add support for uploading debug zip files directly to blob storage via CRL's upload server. The upload flow creates a session with the server to obtain short-lived GCS credentials, uploads the zip file directly to GCS, and marks the session complete.

The implementation is split across two files for clear separation of concerns:

  • zip_upload_blob.go: reusable blob storage abstraction (blobUploader interface, GCS implementation, progress writer)
  • zip_upload_server.go: upload server session lifecycle (create session, get upload token, complete session, validation)

The existing Datadog upload path is preserved and remains the default. Users select the upload server via --destination=upload-server along with --upload-server-api-key and --upload-server-url flags.

Part of: CRDB-61946
Epic: CRDB-60645

Release note: None

@visheshbardia visheshbardia requested review from a team as code owners April 14, 2026 09:27
@visheshbardia visheshbardia requested review from Abhinav1299 and dhartunian and removed request for a team April 14, 2026 09:27
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io bot commented Apr 14, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@visheshbardia visheshbardia force-pushed the upload-debugZ-to-blob branch 2 times, most recently from ef7418e to aa1db04 Compare April 14, 2026 09:43
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 14, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

1 similar comment
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 14, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

@visheshbardia visheshbardia force-pushed the upload-debugZ-to-blob branch 2 times, most recently from 05725df to 244bb7c Compare April 14, 2026 15:43
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 14, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

@visheshbardia visheshbardia force-pushed the upload-debugZ-to-blob branch from 244bb7c to 5c38ce5 Compare April 14, 2026 16:25
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 14, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

1 similar comment
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 14, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

@visheshbardia visheshbardia force-pushed the upload-debugZ-to-blob branch from 5c38ce5 to b6dcb2d Compare April 15, 2026 04:16
func newGCSBlobUploader(
ctx context.Context, bucket, pathPrefix, bearerToken string,
) (*gcsBlobUploader, error) {
uri := fmt.Sprintf(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but this token can appear in any error message that prints the URI, is there any way to redact it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URI doesn't appear in any error or log path in cloud pkg as of now. Additionally, BEARER_TOKEN is also already registered as a RedactedParam in the GCS provider. If we ever add logging or surface the URI in errors in the future, we can use cloud.SanitizeExternalStorageURI to strip it, as the established pattern across the codebase.

Comment thread pkg/cli/zip_upload_server.go Outdated
fmt.Fprintf(os.Stderr, "Session %s completed\n", sessionID)
}

fmt.Fprintf(os.Stderr, "Successfully uploaded debug zip.\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add few useful analytics during and after the upload is happening:

  • Total time elapsed to upload the debug zip successfully after completion.
  • Maybe reporting throughput during the upload process (MB/sec)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added! Now completion shows the stats as below:
image

Comment thread pkg/cli/zip_upload_server.go Outdated
}()

retryOpts := base.DefaultRetryOptions()
retryOpts.MaxRetries = 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: remove magic number here, define a const for this


// validateUploadServerReadiness checks that the required flags are set
// for upload-server uploads and that the input file is valid.
func validateUploadServerReadiness(inputPath string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a check here to verify if the file is actually a valid zip? Checking the magic literal file signature of zips here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the magic signature check.

Comment thread pkg/cli/zip_upload_server.go Outdated
Comment on lines +234 to +241
fmt.Fprintf(os.Stderr,
"Warning: failed to complete upload session: %v\n", err,
)
} else {
fmt.Fprintf(os.Stderr, "Session %s completed\n", sessionID)
}

fmt.Fprintf(os.Stderr, "Successfully uploaded debug zip.\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly misleading message here, if the completeUploadSession returns an error, we print Warning: failed to complete upload session along with Successfully uploaded debug zip..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified it to show warning along with error message as below:
image

Comment thread pkg/cli/zip_upload_server.go Outdated
// 2. POST /api/v1/sessions/{id}/upload-token → {access_token, bucket, prefix}
var createUploadSession = func(
ctx context.Context, serverURL, apiKey string,
) (sessionID, uploadToken, bucket, prefix, gcsToken string, err error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is returning too many values, can we create a struct to improve readability here? Maybe:

type uploadSessionInfo struct {
    SessionID   string
    UploadToken string
    Bucket      string
    Prefix      string
    GCSToken    string
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, added!

Comment thread pkg/cli/zip_upload_server_test.go Outdated
Comment on lines +137 to +145
tmpDir := t.TempDir()
zipPath := filepath.Join(tmpDir, "debug.zip")
zipContent := []byte("PK-fake-zip-content")
require.NoError(t, os.WriteFile(zipPath, zipContent, 0644))

origOpts := debugZipUploadOpts
origCreate := createUploadSession
origComplete := completeUploadSession
origNewUploader := newBlobUploader
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code lines in every test function, can we extract it to a helper function?

@visheshbardia visheshbardia force-pushed the upload-debugZ-to-blob branch from b6dcb2d to 154f54a Compare April 17, 2026 07:27
Add support for uploading debug zip files directly to blob storage
via CRL's upload server. The upload flow creates a session with the
server to obtain short-lived GCS credentials, uploads the zip file
directly to GCS, and marks the session complete.

The implementation is split across two files for clear separation
of concerns:
- zip_upload_blob.go: reusable blob storage abstraction (blobUploader
  interface, GCS implementation, progress writer)
- zip_upload_server.go: upload server session lifecycle (create
  session, get upload token, complete session, validation)

The existing Datadog upload path is preserved and remains the default.
Users select the upload server via --destination=upload-server along
with --upload-server-api-key and --upload-server-url flags.

Part of: CRDB-61946
Epic: CRDB-60645

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants