feat: add user metadata support for RLS policies#825
Conversation
Pull Request Test Coverage Report for Build 21612994975Details
💛 - Coveralls |
|
@TylerHillery awesome! Can we also pass in the |
I thought we couldn't do this because we don't get the We could switch the order but then we have to ensure to delete the object if user can't upload and we would be do the "expensive" part uploading when we didn't have to. Much quicker to test RLS first and skip uploading if we don't have to. Here is the /**
* Extracts file information from the request and upload the buffer
* to the remote storage
* @param request
* @param options
*/
async upload(request: UploadRequest) {
// NOTE: canUpload() gets called inside prepare upload
const version = await this.prepareUpload(request)
try {
const file = request.file
const s3Key = this.location.getKeyLocation({
tenantId: this.db.tenantId,
bucketId: request.bucketId,
objectName: request.objectName,
})
// NOTE: we don't get object metadata until here
const objectMetadata = await this.backend.uploadObject(
storageS3Bucket,
s3Key,
version,
file.body,
file.mimeType,
file.cacheControl,
request.signal
)
if (request.file.xRobotsTag) {
objectMetadata.xRobotsTag = request.file.xRobotsTag
}
if (file.isTruncated()) {
throw ERRORS.EntityTooLarge()
}
return this.completeUpload({
...request,
version,
objectMetadata: objectMetadata,
userMetadata: { ...request.userMetadata },
})
} catch (e) {
await ObjectAdminDelete.send({
name: request.objectName,
bucketId: request.bucketId,
tenant: this.db.tenant(),
version: version,
reqId: this.db.reqId,
})
throw e
}
} |
Other fields like etag, lastModified, it's fine if they are empty on the first RLS check |
| .from(bucketName) | ||
| .signUploadObjectUrl(objectName, urlPath as string, uploadSignedUrlExpirationTime, owner, { | ||
| upsert: request.headers['x-upsert'] === 'true', | ||
| userMetadata: userMetadata, |
There was a problem hiding this comment.
I think we don't actually support userMetadata on upload signed urls (which we should) so I'm glad you found this.
I believe we need to store the userMetadata in the returned JWT payload; otherwise, this information will be lost during the signedUpload request. We need to be careful, though, if we decide to store the metadata in the JWT, because it can be abused with large payloads. We can for now restrict the amount of data that we can store in the JWT to something like 1MB i think we already have a constant MAX_CUSTOM_METADATA_SIZE, which can be reused (re-refactored as a validation function, potentially in limits.ts)
But we also need to add the second part (on the upload route) to store the userMetadata received from the JWT. This way, the metadata can't be tempered.
There was a problem hiding this comment.
I will do a follow up PR to handle the upload part.
57b96fb to
05a322a
Compare
I have updated canUpload to accept a metadata parameter that gets passed to the RLS check. Only content-length and content-typ get passed through. Multipart uploads required a little more work as the |
| bucketId: string | ||
| objectName: string | ||
| file: FileUpload | ||
| userMetadata: Record<string, unknown> | undefined |
There was a problem hiding this comment.
why do we diverge from optionality? userMetadata?: Record<string, unknown>
There was a problem hiding this comment.
While implementing this change I found it very easy to forget to pass userMetadata somewhere when it should be passed so I thought I make it required but allow undefined if it's not needed.
I called this out in the PR as wasn't sure if there was a better way and wasn't 100% confident about this solution. If you think optionality is better here I am happy to change back.
calls to ensure userMetadata is passed through. I found it difficult to ensure I caught every cause though, e.g. it was very easy to miss uploadNewObject needed to be updated in the uploadFromRequest. I'm open to feedback to see if there is a more robust solution we could implement that could be caught with typescript type
There was a problem hiding this comment.
I see your point but I would go with optional since more idiomatic. Maybe for cleanup afterwards
There was a problem hiding this comment.
reverted back to optional
| owner_id: owner, | ||
| user_metadata: metadata, | ||
| user_metadata: userMetadata, | ||
| metadata: metadata, |
There was a problem hiding this comment.
don't we need a guard rule for new column similar to user_metadata?
There was a problem hiding this comment.
yes you're right. I added a guard rule but it feels a bit hacky. The reason I didn't add it to rules in normalizedColumns is because metadata column does exist in objects table. It's my understanding any metadata column would get remove regardless of the table it's selecting from.
So instead I checked if latest migration was >= so s3-multipart-uploads-metadata and added it to the list of columns to select.
I almost feel normalizedColumns should be table dependent. This might be better in a follow up PR thought.
05a322a to
5d4edd0
Compare
5d4edd0 to
8c316f6
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a nullable jsonb Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP_Route as HTTP Route / TUS Entry
participant Uploader
participant Storage
participant DB
participant RLS
Client->>HTTP_Route: Upload request (headers: x-metadata, content-type, content-length)
HTTP_Route->>Uploader: call canUpload/prepareUpload (includes userMetadata, metadata)
Uploader->>Storage: create/prepare multipart (passes userMetadata, metadata)
Storage->>DB: insert/find multipart row (user_metadata, metadata)
DB->>RLS: evaluate policy with user_metadata & metadata
RLS-->>DB: allow/deny
DB-->>Storage: result
alt allowed
Storage->>Uploader: success (tokens/parts)
Uploader-->>Client: signed URL / upload acceptance
else denied
Uploader-->>Client: error (403/400)
end
Assessment against linked issues
Out-of-scope changes
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| properties: { | ||
| 'x-upsert': { type: 'string' }, | ||
| 'content-type': { type: 'string' }, | ||
| 'content-length': { type: 'string' }, |
There was a problem hiding this comment.
don't we need x-metadata as well since we try to read below?
| const customMd = request.headers['x-metadata'] | ||
|
|
||
| if (typeof customMd === 'string') { | ||
| userMetadata = parseUserMetadata(customMd) |
There was a problem hiding this comment.
this parser returns Record<string, string> by casting due to its expectation (and S3 compatibility) but there is no validation. Values could be anything.
This is an existing helper, feel free to add a TODO to be addressed later
There was a problem hiding this comment.
I looked into supabase-js and s3 part of this, type is wrong but handling is correct and we return x-amz-meta-missing header for non-string/non-printable
There was a problem hiding this comment.
Another thing to note is that when it's set when content-type is multipart/form-data, we check/limit its size but not when it comes from a header.
However, the value is 1kb and a header can definitely be larger than that as well saw value wrong, it's 1mb so this is a valid assumption. We could reuse the code, instead of two parsers as a cleaup but not in this PR.
|
|
||
| const uploader = new Uploader(this.storage.backend, this.storage.db, this.storage.location) | ||
|
|
||
| const multipart = await this.shouldAllowPartUpload(UploadId, ContentLength, maxFileSize) |
There was a problem hiding this comment.
this ordering can cause issues for valid uploads because it mutates the state.
For example (numbers are arbitrary)
- in progress 50mb and new part is 5mb
- in progress becomes 55mb
- RLS fails
- next request in progress becomes 60mb
- RLS checks but size fails because it was expecting at most 55mb
There was a problem hiding this comment.
go catch, changed the order to have canUpload called before shouldAllowPartUpload. What I don't like about this is that this adds an extra call to findMultipartUpload as this is also called in shouldAllowPartUpload I thought about moving canUpload within in shouldAllowPartUpload but it uses asSuperUser before starting the transaction which would bypass RLS.
Can you confirm my understanding? Is there anyway to do this that I'm missing that doesn't involve extra findMultipartUpload call?
| const multipart = await this.storage.db | ||
| .asSuperUser() | ||
| .findMultipartUpload(UploadId, 'id,version') | ||
| .findMultipartUpload(UploadId, 'id,version,user_metadata,metadata') |
There was a problem hiding this comment.
we need the column guard for these reads similar to create as well
There was a problem hiding this comment.
added the guard to the findMultipartUpload with a TODO comment to refactor once we change normalizeColumns to be table aware.
| objectName: destinationKey, | ||
| owner, | ||
| isUpsert: upsert, | ||
| userMetadata: userMetadata, |
There was a problem hiding this comment.
similar to metadata, aren't we supposed to use origin.user_metadata if copyMetadata is false because it's the written value eventually?
There was a problem hiding this comment.
yes we are, changed it to the same logic we use for metadata
There was a problem hiding this comment.
I take it back, using the same logic breaks some tests for userMetadata we don't want to merge the two. Either use all of the origin if copy or use the new destination metadata
|
Can we also add more tests please?
metadata checked at sign time but mutable at upload time (to be addressed separately by me when this is merged) |
8c316f6 to
6ea5123
Compare
a473d9b to
f62ddfa
Compare
There was a problem hiding this comment.
Pull request overview
This PR resolves #757 by ensuring user_metadata (and related request metadata) is available during pre-upload RLS permission checks across upload flows, so RLS policies can reference it during inserts.
Changes:
- Extends
Uploader.canUpload/prepareUploadto acceptuserMetadataandmetadataand threads these through standard, signed URL, TUS, and S3/multipart paths. - Persists multipart upload
metadatains3_multipart_uploadswith migration/version guards for backwards compatibility. - Expands/updates integration tests and RLS test spec to cover metadata-driven insert policies and multipart ordering.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/storage/uploader.ts | Updates canUpload/prepareUpload API and threads metadata/userMetadata; adjusts request parsing. |
| src/storage/object.ts | Passes userMetadata/metadata into upload/copy/sign flows for RLS checks. |
| src/storage/protocols/s3/s3-handler.ts | Propagates metadata/userMetadata for S3 multipart + ordering changes around canUpload. |
| src/storage/database/adapter.ts | Updates DB interface for multipart upload persistence. |
| src/storage/database/knex.ts | Stores/fetches multipart metadata with migration guard for older schemas. |
| src/storage/schemas/multipart.ts | Adds metadata to multipart upload schema. |
| src/internal/database/migrations/types.ts | Registers migration ID for multipart metadata column. |
| migrations/tenant/0057-s3-multipart-uploads-metadata.sql | Adds metadata jsonb column to storage.s3_multipart_uploads. |
| src/http/routes/tus/lifecycle.ts | Ensures TUS requests can run canUpload with parsed metadata/user_metadata. |
| src/http/routes/tus/index.ts | Wires datastore into onIncomingRequest callback. |
| src/http/routes/object/getSignedUploadURL.ts | Allows signed-upload RLS checks to include metadata/user_metadata from headers. |
| src/test/rls_tests.yaml | Adds RLS policies/tests for user_metadata + metadata-based checks. |
| src/test/rls.test.ts | Extends test runner to exercise new upload variants and metadata inputs. |
| src/test/s3-protocol.test.ts | Adds multipart ordering regression test + migration compatibility integration tests. |
| src/test/scanner.test.ts | Updates test uploads for adjusted UploadRequest shape. |
| src/test/cdn.test.ts | Updates uploadNewObject usage to pass userMetadata at request level. |
Comments suppressed due to low confidence (1)
src/storage/uploader.ts:479
- The return object from
fileUploadFromRequestsetscontentLengthtwice (first fromfileContentLength, then overwritten by the rawcontent-lengthheader). This both causes a duplicate-key/TS error and changes behavior by overwriting the intentionally-undefined multipart file size with the (unreliable) multipart request content-length. Keep a singlecontentLengthvalue (likelyfileContentLength) and, if the header value is needed for RLS, store it under a different name and validate it separately.
body,
mimeType,
cacheControl,
contentLength: fileContentLength,
isTruncated,
userMetadata,
maxFileSize,
xRobotsTag,
}
}
export function parseUserMetadata(metadata: string) {
try {
const json = Buffer.from(metadata, 'base64').toString('utf8')
return JSON.parse(json) as Record<string, string>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What kind of change does this PR introduce?
resolves #757 by adding support for referencing
user_metadatafield in RLS policies for insertsWhat is the current behavior?
Currently
user_metadatais not passed intocanUploadchecks, so RLS policies can't reference it during file uploads.What is the new behavior?
Now
user_metadatais passed through all upload paths (standard HTTP, S3, TUS, multipart) so RLS policies can use it.Additional context
canUploadto acceptuserMetatdata. I updated all callers to pass inuserMetadata. Becauseuploadgets called inprepareUploadwhich gets called inuploadI also updated alluploadcalls to ensureuserMetadatais passed through. I found it difficult to ensure I caught every cause though, e.g. it was very easy to missuploadNewObjectneeded to be updated in theuploadFromRequest. I'm open to feedback to see if there is a more robust solution we could implement that could be caught with typescript types.canUploadchecks for multipart uploads. It required changing the order of certain operations around so thatuser_metadatawas available. Wasn't entirely sure if these changes were okay.userMetadatafrom theFileUploadinterface to theUploadRequestbut wasn't sure if this was the right approach so would appreciate some feedback