Skip to content

Comments

feat: Allow large Parse File uploads#9286

Open
dalyaidan1 wants to merge 28 commits intoparse-community:alphafrom
dalyaidan1:parse-file-large-upload
Open

feat: Allow large Parse File uploads#9286
dalyaidan1 wants to merge 28 commits intoparse-community:alphafrom
dalyaidan1:parse-file-large-upload

Conversation

@dalyaidan1
Copy link

@dalyaidan1 dalyaidan1 commented Aug 18, 2024

Pull Request

Issue

Closes: #9283

Approach

During routing if a file size is larger than can be handled by the V8 engine as a string, it is used as a Blob. This changes touches both the FilesRouter and GridFSBucketAdapter where the creation/upload happens, and the file is handled as a Buffer. I also added a new FilesRouter test spec for these changes.

Tasks

  • Add tests
  • Add changes to documentation (guides, repository pages, code comments)

Summary by CodeRabbit

  • New Features

    • Added support for Blob data in file operations alongside existing Buffer support.
    • Enabled streaming encryption for large file uploads to handle files exceeding size limits.
  • Bug Fixes

    • Improved error handling for cipher operations during file encryption.
    • Enhanced robustness of file size computation for various data types.
  • Tests

    • Expanded test coverage for file uploads across multiple size boundaries.
    • Added comprehensive tests for both string and blob data handling.

@parse-github-assistant
Copy link

Thanks for opening this pull request!


afterAll(async () => {
// clean up the server for resuse
if (server && server.close) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to just call await reconfigureServer() to init the server with the default config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your right. Missed that, new to testing here.

Signed-off-by: Manuel <5673677+mtrezza@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Aug 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.52%. Comparing base (30a836d) to head (a87a452).

Additional details and impacted files
@@            Coverage Diff             @@
##            alpha    #9286      +/-   ##
==========================================
+ Coverage   93.50%   93.52%   +0.02%     
==========================================
  Files         186      186              
  Lines       14810    14834      +24     
==========================================
+ Hits        13848    13874      +26     
+ Misses        962      960       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtrezza
Copy link
Member

mtrezza commented Aug 18, 2024

There seems to be some test coverage missing, could you add that?

@dalyaidan1 dalyaidan1 requested a review from mtrezza August 19, 2024 18:29
@dalyaidan1 dalyaidan1 requested a review from mtrezza August 20, 2024 17:09
@mtrezza mtrezza requested a review from a team September 4, 2024 15:47
Signed-off-by: Manuel <5673677+mtrezza@users.noreply.github.com>
Signed-off-by: Manuel <5673677+mtrezza@users.noreply.github.com>
@mtrezza
Copy link
Member

mtrezza commented Oct 8, 2024

Could you take a look at the failing postgres tests?

a storage adapter is needed to test postgres
@dalyaidan1
Copy link
Author

dalyaidan1 commented Nov 2, 2024

Could you take a look at the failing postgres tests?

@mtrezza I changed those failing tests to only mongo. Without pulling in a storage adapter such as the fs adapter, those tests will not be able to pass in postgres. Let me know if pulling in one as a dev dependency would be preferable?

All the adapters will also need changes similar to the inbuilt GridFS adapters changes in order to handle the large files. I have adjustments for the fs and s3 adapters that I can also make PRs in their respective repositories for as needed.

@mtrezza
Copy link
Member

mtrezza commented Nov 3, 2024

I understand the change is really for the GridFSStorageAdapter and it just so happens that the adapter is bundled with Parse Server. It isn't related to DB adapters, so maybe the tests should be MongoDB only in this PR. Does that make sense? Only if the Postgres adapter supported Parse.File upload (like MongoDB does with GridFS), then there would be some Postgres related changes needed in this PR, does it?

Yes, for other storage adapters the PRs in their respective repos would be great.

@dalyaidan1
Copy link
Author

Yes that is correct. There would need to be inbuilt storage adapters to test for postgres.

I've made the other PRs:

@mtrezza
Copy link
Member

mtrezza commented Nov 3, 2024

Could you change the test from it to it_only_db('mongo') I believe it's called.

@dalyaidan1
Copy link
Author

Could you change the test from it to it_only_db('mongo') I believe it's called.

I think I covered this in 18157be already?

The one current fail appears to be one of the flaky tests mentioned in here

@parseplatformorg
Copy link
Contributor

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

This PR addresses the 512MB file upload limitation by replacing base64 string construction with a Blob-backed approach for large buffers, while adding streaming encryption support in the storage adapter to handle large Blob data efficiently.

Changes

Cohort / File(s) Summary
File Router Implementation
src/Routers/FilesRouter.js
Added size-aware logic to detect when request body exceeds V8 string limit (536,870,912 bytes) and use Blob-backed files instead of base64 strings. Updated fileSize computation and createFile invocation to pass Blob or Buffer as fileData.
Storage Adapter Updates
src/Adapters/Files/GridFSBucketAdapter.js, src/Adapters/Files/FilesAdapter.js
Rewrote GridFSBucketAdapter.createFile to support streaming encryption for Blob data with Transform streams and conditional cipher initialization. Updated FilesAdapter JSDoc to clarify data parameter accepts Buffer or Blob.
Router Test Suite
spec/FilesRouter.spec.js
Added new test suite for FilesRouter on Mongo with 1GB maxUploadSize configuration. Tests file uploads at size boundaries: under 512MB, exactly 512MB, and just over 512MB using streamed multipart form data.
Storage Adapter Tests
spec/GridFSBucketStorageAdapter.spec.js
Refactored tests to be type-parameterized for both string and Blob data types. Introduced createData and getDataAsString helpers for multi-type handling. Expanded coverage for encryption/decryption, error scenarios (cipher failures, bucket errors), and metadata handling across both data types.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FilesRouter
    participant GridFSBucketAdapter
    participant CryptoStream
    participant MongoDB

    Client->>FilesRouter: POST /files/{filename} (large file)
    alt File size >= 512MB
        FilesRouter->>FilesRouter: Detect size > V8 limit
        FilesRouter->>FilesRouter: Convert Buffer to Blob
    else File size < 512MB
        FilesRouter->>FilesRouter: Keep as base64 string
    end
    
    FilesRouter->>GridFSBucketAdapter: createFile(filename, fileData: Blob|Buffer)
    
    alt Data is Blob + Encryption enabled
        GridFSBucketAdapter->>GridFSBucketAdapter: Create Transform stream with cipher
        GridFSBucketAdapter->>CryptoStream: Pipe Blob through cipher
        CryptoStream->>CryptoStream: Apply encryption chunk-by-chunk
        CryptoStream->>CryptoStream: Append IV & auth tag
        CryptoStream->>MongoDB: Upload encrypted stream
    else Data is Buffer + Encryption enabled
        GridFSBucketAdapter->>GridFSBucketAdapter: Single-step cipher operation
        GridFSBucketAdapter->>MongoDB: Upload encrypted buffer
    else No encryption
        GridFSBucketAdapter->>MongoDB: Upload data directly
    end
    
    MongoDB-->>GridFSBucketAdapter: Confirmation
    GridFSBucketAdapter-->>FilesRouter: Success
    FilesRouter-->>Client: 200 OK
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: Allow large Parse File uploads' directly and concisely summarizes the main change—enabling file uploads larger than the previous V8 string limit, which is the primary objective of this PR.
Description check ✅ Passed The PR description covers the required sections: Issue (closes #9283), Approach (explains Blob/Buffer handling in FilesRouter and GridFSBucketAdapter), and marked relevant Tasks as completed. Documentation and tests were added as required.
Linked Issues check ✅ Passed The PR successfully addresses the core requirement from issue #9283: files larger than ~512MB can now be uploaded by converting large buffers to Blobs instead of strings, preventing the ERR_STRING_TOO_LONG error. Changes span FilesRouter and GridFSBucketAdapter with comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the PR objective of supporting large file uploads. The modifications to FilesRouter, GridFSBucketAdapter, and related test files address only the V8 string limit issue and its implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (4)
src/Routers/FilesRouter.js (1)

236-240: Redundant optional chaining on line 240.

Line 236 already guards that fileObject.file._source?.file instanceof Blob is truthy, so on line 240 _source is guaranteed non-nullish. The ?. on fileObject.file._source?.file is unnecessary and obscures the intent.

Proposed fix
          // set the file data
-          fileData = fileObject.file._source?.file;
+          fileData = fileObject.file._source.file;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Routers/FilesRouter.js` around lines 236 - 240, The optional chaining on
fileObject.file._source?.file is redundant because the preceding check uses
fileObject.file._source?.file instanceof Blob; update the assignment in
FilesRouter.js to use the guaranteed path fileObject.file._source.file when
setting fileData (and keep the fileSize assignment as-is), i.e. remove the "?.",
referencing the same fileObject/fileData/fileSize symbols to make intent
explicit.
spec/FilesRouter.spec.js (1)

46-77: Temp file cleanup is not guaranteed on test failure.

If postFile or an expect throws, fs.unlinkSync is never reached, leaving large temp files (up to ~562 MB) on disk. Use try/finally or an afterEach hook to ensure cleanup.

Proposed fix (example for one test; apply to all three)
     it('should allow Parse.File uploads under 512MB', async () => {
       const filePath = path.join(__dirname, 'file.txt');
-      await fs.promises.writeFile(filePath, Buffer.alloc(1024 * 1024));
-
-      const response = await postFile('file.txt', filePath);
-      expect(response.ok).toBe(true);
-
-      fs.unlinkSync(filePath);
+      try {
+        await fs.promises.writeFile(filePath, Buffer.alloc(1024 * 1024));
+        const response = await postFile('file.txt', filePath);
+        expect(response.ok).toBe(true);
+      } finally {
+        if (fs.existsSync(filePath)) fs.unlinkSync(filePath);
+      }
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/FilesRouter.spec.js` around lines 46 - 77, The tests create large
temporary files but call fs.unlinkSync only on the success path, risking
leftover files if postFile or expect throws; update each test ("should allow
Parse.File uploads under 512MB", "exactly 512MB", "over 512MB") to ensure
cleanup by either (a) declaring filePath in the test scope and wrapping the
postFile/expect sequence in a try/finally that always calls
fs.unlinkSync(filePath) in the finally, or (b) move filePath creation to a
beforeEach/outer scope and add an afterEach that checks for existence and
unlinks it; reference the tests and the postFile/fs.unlinkSync calls when making
the changes.
spec/GridFSBucketStorageAdapter.spec.js (1)

506-533: Mock cipher is missing getAuthTag() — non-blob encrypted path would throw a different error.

The mock on line 512 returns { update, final } but omits getAuthTag(). For the blob path, if update throws in the transform callback, flush (which calls getAuthTag) is never reached, so the test passes. But for the non-blob path (Buffer.concat([cipher.update(data), cipher.final(), iv, cipher.getAuthTag()])), if update throws, getAuthTag is never reached either. This works by coincidence for the update-error test.

However, for the final-error test (lines 536-563), the non-blob path calls cipher.update(data) (succeeds), then cipher.final() (throws). getAuthTag() is never called. So both tests pass, but the mock is incomplete. If anyone adjusts the error-handling order, this could produce confusing TypeError: cipher.getAuthTag is not a function instead of the expected error.

Consider adding getAuthTag to the mock for completeness:

Proposed fix
     spyOn(crypto, 'createCipheriv').and.returnValue({
       update: (_chunk) => { throw error; },
       final: () => { return Buffer.from('encryptedData'); },
+      getAuthTag: () => { return Buffer.alloc(16); },
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/GridFSBucketStorageAdapter.spec.js` around lines 506 - 533, The cipher
mock used in the GridFSBucketAdapter tests is incomplete (crypto.createCipheriv
is mocked to return only update and final), which can spuriously cause
"cipher.getAuthTag is not a function" errors; update the mock returned by the
spy on crypto.createCipheriv in the tests that simulate cipher errors (the tests
calling gfsAdapter.createFile) to also include a getAuthTag() method (e.g.,
returning a Buffer or appropriate value) so both blob and non-blob paths can
call getAuthTag safely, and keep the existing update/final behaviors and the
final crypto.createCipheriv.and.callThrough() restore.
src/Adapters/Files/GridFSBucketAdapter.js (1)

81-156: stream.on('finish', resolve) should be registered before piping starts.

Although practically safe due to Node's single-threaded event loop (pipe is async), it's a best-practice to attach event listeners before initiating the data flow. Moving the finish/error listeners on stream above the pipe/write section makes the intent clearer and avoids a subtle ordering footgun if the code is later refactored.

Proposed restructure sketch
     return new Promise((resolve, reject) => {
       try {
         const iv = ...;
         const cipher = ...;

+        stream.on('finish', resolve);
+        stream.on('error', reject);
+
         if (data instanceof Blob) {
           // ... pipe logic (remove duplicate stream.on('error', reject) inside)
         } else {
           // ... write logic
         }
-
-        stream.on('finish', resolve);
-        stream.on('error', reject);
       } catch (e) {
         reject(e);
       }
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Adapters/Files/GridFSBucketAdapter.js` around lines 81 - 156, Move the
stream event listeners so they are attached before any piping or writes start:
register stream.on('finish', resolve) and stream.on('error', reject) (and any
intermediate error handlers like on the cipherTransform) prior to calling
readableStream.pipe(...)/stream.write(...)/stream.end() inside the Promise block
in GridFSBucketAdapter.js; update the Promise flow around the
readableStream/cipherTransform/stream code paths (the branches that use cipher
and iv, the Blob branch, and the non-Blob write branch) so listeners are
installed first, then start the piping or write operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spec/FilesRouter.spec.js`:
- Around line 27-31: The test's headers object sets 'Content-Type' to
'multipart/form-data' even though the request body is a raw file stream; update
the headers in FilesRouter.spec.js (the headers constant used in the test) to
use 'application/octet-stream' or the file's real MIME type so the Content-Type
accurately reflects a raw stream rather than multipart/form-data.
- Around line 46-55: The tests use the legacy async done callback pattern which
can mask promise rejections; update each test (e.g., the it blocks with
descriptions "should allow Parse.File uploads under 512MB" and the two following
its) by removing the done parameter (change async done => to async () =>) and
deleting the terminal done() call(s), leaving the existing await calls so Jest
handles promise resolution/rejection natively; apply the same change to the
other two it blocks referenced in the diff so all three tests use pure
async/await.

In `@spec/GridFSBucketStorageAdapter.spec.js`:
- Around line 128-131: The async helper expectMissingFile is being invoked
without awaiting in several cleanup loops, causing assertions to run
asynchronously; update the test cleanup loops that call expectMissingFile with
await (e.g., inside iterations over fileNames and where fileName3 is used) so
calls like expectMissingFile(unencryptedAdapter, fileNames[i]),
expectMissingFile(oldEncryptedAdapter, fileNames[i]) and
expectMissingFile(unEncryptedAdapter, fileName3) become awaited, ensuring the
async assertions complete before proceeding.
- Around line 29-40: getDataAsString's handling of Blob is incorrect because
Blob.prototype.text() accepts no encoding; update the 'blob' branch in
getDataAsString to call await data.text() with no args for UTF-8, and if a
non-UTF-8 encoding is requested use await data.arrayBuffer() combined with new
TextDecoder(encoding).decode(...) to respect the encoding; keep the existing
fallback for non-Blob values returning data.toString(encoding).

In `@src/Adapters/Files/GridFSBucketAdapter.js`:
- Around line 93-134: The readableStream from the Blob can emit 'error' which
.pipe() won't forward, causing the promise to hang; add a
readableStream.on('error', reject) handler (and remove it when stream
finishes/errors) for both the cipher branch (before piping into cipherTransform)
and the non-cipher branch (before piping into stream) so any read errors
immediately call reject; reference the Blob-derived readableStream,
Readable.fromWeb conversion, the Transform instance cipherTransform, and the
destination gridfs stream variable stream to locate where to attach the handler.
- Around line 96-99: The code currently calls Readable.fromWeb(readableStream)
unconditionally; instead, gate that call by checking the Node.js runtime version
(process.versions.node) and only use Readable.fromWeb when the Node version is
new enough (>=22.17.0 or your chosen minimum such as >=24.11.0); for older Node
versions implement a safe fallback that converts a Web ReadableStream into a
Node Readable by creating an async generator that uses
readableStream.getReader() to yield chunks and then pass that generator into
Readable.from — update the conversion logic in GridFSBucketAdapter (the
Readable.fromWeb / ReadableStream branch) to perform the version check and use
the manual async-generator->Readable fallback when the runtime is too old.

In `@src/Routers/FilesRouter.js`:
- Around line 185-195: The code creates a Blob from req.body which doubles
memory during large uploads; update the FilesRouter handling around
MAX_V8_STRING_SIZE_BYTES and Parse.File so you either (a) defer Blob creation
until right before the storage adapter upload call (move new Blob([req.body])
close to where Parse.File is handed to the adapter) or (b) immediately drop the
original Buffer reference by setting req.body = null right after constructing
the Blob (ensure filename and contentType are preserved), so the temporary
overlap of req.body and the Blob is minimized.
- Around line 234-245: The code is directly accessing Parse.File internals
(fileObject.file._source and _data) which are undocumented and brittle; update
the upload flow to avoid these private fields by either (1) passing the original
raw file (Blob or Buffer) into the request payload or into fileObject (e.g.,
fileObject.rawFile) so this router can read size and data from that documented
source, or (2) if the Parse.File is already saved, use documented APIs (e.g.,
fetch the file via fileObject.file.url() over HTTP) to obtain bytes instead of
reading _source/_data; if neither is possible add a concise comment on why a
private field is used and implement a guarded fallback that throws a clear error
when those internals are missing. Ensure changes touch the code paths that set
fileData/fileObject.fileSize and reference symbols fileObject.file,
fileObject.fileSize, and fileData so the coupling to Parse.File internals is
removed or clearly documented and fail-safe.

---

Nitpick comments:
In `@spec/FilesRouter.spec.js`:
- Around line 46-77: The tests create large temporary files but call
fs.unlinkSync only on the success path, risking leftover files if postFile or
expect throws; update each test ("should allow Parse.File uploads under 512MB",
"exactly 512MB", "over 512MB") to ensure cleanup by either (a) declaring
filePath in the test scope and wrapping the postFile/expect sequence in a
try/finally that always calls fs.unlinkSync(filePath) in the finally, or (b)
move filePath creation to a beforeEach/outer scope and add an afterEach that
checks for existence and unlinks it; reference the tests and the
postFile/fs.unlinkSync calls when making the changes.

In `@spec/GridFSBucketStorageAdapter.spec.js`:
- Around line 506-533: The cipher mock used in the GridFSBucketAdapter tests is
incomplete (crypto.createCipheriv is mocked to return only update and final),
which can spuriously cause "cipher.getAuthTag is not a function" errors; update
the mock returned by the spy on crypto.createCipheriv in the tests that simulate
cipher errors (the tests calling gfsAdapter.createFile) to also include a
getAuthTag() method (e.g., returning a Buffer or appropriate value) so both blob
and non-blob paths can call getAuthTag safely, and keep the existing
update/final behaviors and the final crypto.createCipheriv.and.callThrough()
restore.

In `@src/Adapters/Files/GridFSBucketAdapter.js`:
- Around line 81-156: Move the stream event listeners so they are attached
before any piping or writes start: register stream.on('finish', resolve) and
stream.on('error', reject) (and any intermediate error handlers like on the
cipherTransform) prior to calling
readableStream.pipe(...)/stream.write(...)/stream.end() inside the Promise block
in GridFSBucketAdapter.js; update the Promise flow around the
readableStream/cipherTransform/stream code paths (the branches that use cipher
and iv, the Blob branch, and the non-Blob write branch) so listeners are
installed first, then start the piping or write operations.

In `@src/Routers/FilesRouter.js`:
- Around line 236-240: The optional chaining on fileObject.file._source?.file is
redundant because the preceding check uses fileObject.file._source?.file
instanceof Blob; update the assignment in FilesRouter.js to use the guaranteed
path fileObject.file._source.file when setting fileData (and keep the fileSize
assignment as-is), i.e. remove the "?.", referencing the same
fileObject/fileData/fileSize symbols to make intent explicit.

Comment on lines +27 to +31
const headers = {
'X-Parse-Application-Id': Parse.applicationId,
'X-Parse-Master-Key': Parse.masterKey,
'Content-Type': 'multipart/form-data',
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Content-Type: multipart/form-data is semantically incorrect for a raw stream body.

The request sends a raw file stream, not a multipart-encoded body. multipart/form-data requires boundary delimiters. Since express.raw() accepts any content type, this works by accident. Use application/octet-stream (or the actual MIME type) to be accurate.

Proposed fix
       const headers = {
         'X-Parse-Application-Id': Parse.applicationId,
         'X-Parse-Master-Key': Parse.masterKey,
-        'Content-Type': 'multipart/form-data',
+        'Content-Type': 'application/octet-stream',
       };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const headers = {
'X-Parse-Application-Id': Parse.applicationId,
'X-Parse-Master-Key': Parse.masterKey,
'Content-Type': 'multipart/form-data',
};
const headers = {
'X-Parse-Application-Id': Parse.applicationId,
'X-Parse-Master-Key': Parse.masterKey,
'Content-Type': 'application/octet-stream',
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/FilesRouter.spec.js` around lines 27 - 31, The test's headers object
sets 'Content-Type' to 'multipart/form-data' even though the request body is a
raw file stream; update the headers in FilesRouter.spec.js (the headers constant
used in the test) to use 'application/octet-stream' or the file's real MIME type
so the Content-Type accurately reflects a raw stream rather than
multipart/form-data.

Comment on lines +46 to +55
it('should allow Parse.File uploads under 512MB', async done => {
const filePath = path.join(__dirname, 'file.txt');
await fs.promises.writeFile(filePath, Buffer.alloc(1024 * 1024));

const response = await postFile('file.txt', filePath);
expect(response.ok).toBe(true);

fs.unlinkSync(filePath);
done();
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace done callback with pure async/await.

All three tests use the async done => pattern. Since the test body is fully await-based, the done callback is unnecessary and can mask unhandled rejections. This applies equally to the tests on lines 57 and 68.

Based on learnings: "New tests in the parse-server repository should use async/await with promise-based patterns rather than callback patterns with done()."

Proposed fix (apply same pattern to all three tests)
-    it('should allow Parse.File uploads under 512MB', async done => {
+    it('should allow Parse.File uploads under 512MB', async () => {
       const filePath = path.join(__dirname, 'file.txt');
       await fs.promises.writeFile(filePath, Buffer.alloc(1024 * 1024));

       const response = await postFile('file.txt', filePath);
       expect(response.ok).toBe(true);

       fs.unlinkSync(filePath);
-      done();
     });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('should allow Parse.File uploads under 512MB', async done => {
const filePath = path.join(__dirname, 'file.txt');
await fs.promises.writeFile(filePath, Buffer.alloc(1024 * 1024));
const response = await postFile('file.txt', filePath);
expect(response.ok).toBe(true);
fs.unlinkSync(filePath);
done();
});
it('should allow Parse.File uploads under 512MB', async () => {
const filePath = path.join(__dirname, 'file.txt');
await fs.promises.writeFile(filePath, Buffer.alloc(1024 * 1024));
const response = await postFile('file.txt', filePath);
expect(response.ok).toBe(true);
fs.unlinkSync(filePath);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/FilesRouter.spec.js` around lines 46 - 55, The tests use the legacy
async done callback pattern which can mask promise rejections; update each test
(e.g., the it blocks with descriptions "should allow Parse.File uploads under
512MB" and the two following its) by removing the done parameter (change async
done => to async () =>) and deleting the terminal done() call(s), leaving the
existing await calls so Jest handles promise resolution/rejection natively;
apply the same change to the other two it blocks referenced in the diff so all
three tests use pure async/await.

Comment on lines +29 to 40
const getDataAsString = async (type, data, encoding = 'utf8') => {
switch (type) {
case 'string':
return data.toString(encoding);
case 'blob':
return (data instanceof Blob)
? await data.text(encoding) :
data.toString(encoding);
default:
throw new Error(`Invalid type: ${type}`);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Blob.prototype.text() method parameters encoding Web API MDN

💡 Result:

Blob.prototype.text() has no parameters—its syntax is simply:

blob.text()

It always decodes the blob’s bytes as UTF-8 (you can’t pass an encoding like "iso-8859-1"). MDN states the data is “always presumed to be in UTF-8 format” and lists Parameters: None. [1] The File API spec defines text() as reading all bytes and then applying UTF-8 decode. [2]

If you need a different encoding, use FileReader.readAsText(blob, encoding) or TextDecoder on an ArrayBuffer instead. [1] [2]

Sources: [1] MDN Blob: text() method https://developer.mozilla.org/en-US/docs/Web/API/Blob/text, [2] W3C File API (Editor’s Draft) 3.3.3 The text() method https://w3c.github.io/FileAPI/


Blob.prototype.text() does not accept an encoding parameter.

Per the Web API spec, Blob.text() takes no arguments and always decodes as UTF-8. The encoding parameter on line 35 is silently ignored. This works by accident since the default is 'utf8', but the function signature is misleading.

Proposed fix
-const getDataAsString = async (type, data, encoding = 'utf8') => {
+const getDataAsString = async (type, data) => {
   switch (type) {
     case 'string':
-      return data.toString(encoding);
+      return data.toString('utf8');
     case 'blob':
       return (data instanceof Blob)
-        ? await data.text(encoding) :
-        data.toString(encoding);
+        ? await data.text() :
+        data.toString('utf8');
     default:
       throw new Error(`Invalid type: ${type}`);
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/GridFSBucketStorageAdapter.spec.js` around lines 29 - 40,
getDataAsString's handling of Blob is incorrect because Blob.prototype.text()
accepts no encoding; update the 'blob' branch in getDataAsString to call await
data.text() with no args for UTF-8, and if a non-UTF-8 encoding is requested use
await data.arrayBuffer() combined with new TextDecoder(encoding).decode(...) to
respect the encoding; keep the existing fallback for non-Blob values returning
data.toString(encoding).

Comment on lines +128 to +131
for (let i = 0; i < fileNames.length; i++) {
await unencryptedAdapter.deleteFile(fileNames[i]);
expectMissingFile(unencryptedAdapter, fileNames[i]);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the expectMissingFile function definition
rg -n 'expectMissingFile' spec/GridFSBucketStorageAdapter.spec.js -B 2 -A 5 | head -60

Repository: parse-community/parse-server

Length of output: 2111


🏁 Script executed:

# Get the remaining occurrences around lines 376-383
rg -n 'expectMissingFile' spec/GridFSBucketStorageAdapter.spec.js -A 2 | tail -30

Repository: parse-community/parse-server

Length of output: 680


🏁 Script executed:

# Get full context around the problematic lines to confirm the pattern
sed -n '128,133p' spec/GridFSBucketStorageAdapter.spec.js
echo "---"
sed -n '184,189p' spec/GridFSBucketStorageAdapter.spec.js
echo "---"
sed -n '376,385p' spec/GridFSBucketStorageAdapter.spec.js

Repository: parse-community/parse-server

Length of output: 789


Add missing await for expectMissingFile calls in cleanup loops.

The expectMissingFile function (line 7) is async and contains expect assertions, but it's called without await on lines 130, 186, 240, 308, 313, 378, and 383. This allows the test to complete before the assertion runs, risking silent failures or stale state in subsequent iterations.

Locations requiring `await`
  • Line 130: expectMissingFile(unencryptedAdapter, fileNames[i]);
  • Line 186: expectMissingFile(oldEncryptedAdapter, fileNames[i]);
  • Line 240: expectMissingFile(oldEncryptedAdapter, fileNames[i]);
  • Line 308: expectMissingFile(oldEncryptedAdapter, fileNames[i]);
  • Line 313: expectMissingFile(unEncryptedAdapter, fileName3);
  • Line 378: expectMissingFile(oldEncryptedAdapter, fileNames[i]);
  • Line 383: expectMissingFile(unEncryptedAdapter, fileName3);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/GridFSBucketStorageAdapter.spec.js` around lines 128 - 131, The async
helper expectMissingFile is being invoked without awaiting in several cleanup
loops, causing assertions to run asynchronously; update the test cleanup loops
that call expectMissingFile with await (e.g., inside iterations over fileNames
and where fileName3 is used) so calls like expectMissingFile(unencryptedAdapter,
fileNames[i]), expectMissingFile(oldEncryptedAdapter, fileNames[i]) and
expectMissingFile(unEncryptedAdapter, fileName3) become awaited, ensuring the
async assertions complete before proceeding.

Comment on lines +93 to +134
if (data instanceof Blob) {
let readableStream = data.stream();

// may come in as a web stream, so we need to convert it to a node stream
if (readableStream instanceof ReadableStream) {
readableStream = Readable.fromWeb(readableStream);
}

if (cipher && iv) {
// we need to stream the data through the cipher
const cipherTransform = new Transform({
transform(chunk, encoding, callback) {
try {
const encryptedChunk = cipher.update(chunk);
callback(null, encryptedChunk);
} catch (err) {
callback(err);
}
},
// at the end we need to push the final cipher text, iv, and auth tag
flush(callback) {
try {
this.push(cipher.final());
this.push(iv);
this.push(cipher.getAuthTag());
callback();
} catch (err) {
callback(err);
}
}
});
// pipe the stream through the cipher and then to the gridfs stream
readableStream
.pipe(cipherTransform)
.on('error', reject)
.pipe(stream)
.on('error', reject);
} else {
// if we don't have a cipher, we can just pipe the stream to the gridfs stream
readableStream.pipe(stream)
.on('error', reject)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing error handler on readableStream — promise can hang.

Node's .pipe() does not forward 'error' events. If the readable stream from the Blob errors, neither cipherTransform/stream will receive it, so reject is never called and the returned promise hangs indefinitely.

Add an error listener on readableStream itself:

Proposed fix
          if (readableStream instanceof ReadableStream) {
            readableStream = Readable.fromWeb(readableStream);
          }

+         readableStream.on('error', reject);
+
          if (cipher && iv) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (data instanceof Blob) {
let readableStream = data.stream();
// may come in as a web stream, so we need to convert it to a node stream
if (readableStream instanceof ReadableStream) {
readableStream = Readable.fromWeb(readableStream);
}
if (cipher && iv) {
// we need to stream the data through the cipher
const cipherTransform = new Transform({
transform(chunk, encoding, callback) {
try {
const encryptedChunk = cipher.update(chunk);
callback(null, encryptedChunk);
} catch (err) {
callback(err);
}
},
// at the end we need to push the final cipher text, iv, and auth tag
flush(callback) {
try {
this.push(cipher.final());
this.push(iv);
this.push(cipher.getAuthTag());
callback();
} catch (err) {
callback(err);
}
}
});
// pipe the stream through the cipher and then to the gridfs stream
readableStream
.pipe(cipherTransform)
.on('error', reject)
.pipe(stream)
.on('error', reject);
} else {
// if we don't have a cipher, we can just pipe the stream to the gridfs stream
readableStream.pipe(stream)
.on('error', reject)
}
if (data instanceof Blob) {
let readableStream = data.stream();
// may come in as a web stream, so we need to convert it to a node stream
if (readableStream instanceof ReadableStream) {
readableStream = Readable.fromWeb(readableStream);
}
readableStream.on('error', reject);
if (cipher && iv) {
// we need to stream the data through the cipher
const cipherTransform = new Transform({
transform(chunk, encoding, callback) {
try {
const encryptedChunk = cipher.update(chunk);
callback(null, encryptedChunk);
} catch (err) {
callback(err);
}
},
// at the end we need to push the final cipher text, iv, and auth tag
flush(callback) {
try {
this.push(cipher.final());
this.push(iv);
this.push(cipher.getAuthTag());
callback();
} catch (err) {
callback(err);
}
}
});
// pipe the stream through the cipher and then to the gridfs stream
readableStream
.pipe(cipherTransform)
.on('error', reject)
.pipe(stream)
.on('error', reject);
} else {
// if we don't have a cipher, we can just pipe the stream to the gridfs stream
readableStream.pipe(stream)
.on('error', reject)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Adapters/Files/GridFSBucketAdapter.js` around lines 93 - 134, The
readableStream from the Blob can emit 'error' which .pipe() won't forward,
causing the promise to hang; add a readableStream.on('error', reject) handler
(and remove it when stream finishes/errors) for both the cipher branch (before
piping into cipherTransform) and the non-cipher branch (before piping into
stream) so any read errors immediately call reject; reference the Blob-derived
readableStream, Readable.fromWeb conversion, the Transform instance
cipherTransform, and the destination gridfs stream variable stream to locate
where to attach the handler.

Comment on lines +96 to +99
// may come in as a web stream, so we need to convert it to a node stream
if (readableStream instanceof ReadableStream) {
readableStream = Readable.fromWeb(readableStream);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check minimum Node version required by Parse Server
echo "=== engines field from package.json ==="
cat package.json | jq '.engines'

Repository: parse-community/parse-server

Length of output: 187


🌐 Web query:

Node.js Readable.fromWeb stable release version history

💡 Result:

stream.Readable.fromWeb() stability / release history (Node.js)

Node.js version Date (first release) Readable.fromWeb status / change
v17.0.0 Oct 19, 2021 API added (stream.Readable.fromWeb) (not yet marked stable). [1][2]
v22.17.0 Apr 24, 2024 API marked stable (backported to the v22 line). [1][3]
v24.0.0 May 6, 2025 API marked stable in the v24 line (same stabilization milestone as v22.17.0). [1][4]

Sources: Node.js Stream API docs “History” table for stream.Readable.fromWeb() [1], Node.js releases page for v17 first-release date [2], Node.js v22.17.0 archive page (first released date) [3], Node.js v24.0.0 release post (release date) [4].


Use Readable.fromWeb() only with Node >=24.11.0, or upgrade minimum Node version requirements.

Readable.fromWeb() is experimental in Node 20.x and unstable in Node 22.x versions below 22.17.0. Parse Server currently supports Node 20.19.0+ and 22.12.0+, both of which include versions with experimental/unstable Readable.fromWeb(). The API was stabilized in Node 22.17.0 and 24.0.0+.

Either guard the code path to skip Readable.fromWeb() for older versions, or raise the minimum Node version requirement to >=22.17.0 (or >=24.11.0 for full v24 coverage).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Adapters/Files/GridFSBucketAdapter.js` around lines 96 - 99, The code
currently calls Readable.fromWeb(readableStream) unconditionally; instead, gate
that call by checking the Node.js runtime version (process.versions.node) and
only use Readable.fromWeb when the Node version is new enough (>=22.17.0 or your
chosen minimum such as >=24.11.0); for older Node versions implement a safe
fallback that converts a Web ReadableStream into a Node Readable by creating an
async generator that uses readableStream.getReader() to yield chunks and then
pass that generator into Readable.from — update the conversion logic in
GridFSBucketAdapter (the Readable.fromWeb / ReadableStream branch) to perform
the version check and use the manual async-generator->Readable fallback when the
runtime is too old.

Comment on lines +185 to +195
// If the request body is a buffer and it's size is greater than the V8 string size limit
// we need to use a Blob to avoid the V8 string size limit
const MAX_V8_STRING_SIZE_BYTES = 536_870_912;

let file;

if (Buffer.isBuffer(req.body) && req.body?.length >= MAX_V8_STRING_SIZE_BYTES) {
file = new Parse.File(filename, new Blob([req.body]), contentType);
} else {
file = new Parse.File(filename, { base64: req.body.toString('base64') }, contentType);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Large uploads temporarily double memory usage.

new Blob([req.body]) copies the buffer contents, so for a 512 MB+ upload both req.body (Buffer) and the Blob coexist in memory until the handler completes. This means a single ~600 MB upload consumes ~1.2 GB of process memory.

Consider whether the original req.body reference can be released early (e.g., req.body = null after creating the Blob), or whether the Blob wrapping can be deferred closer to the storage adapter call to shorten the overlap window.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Routers/FilesRouter.js` around lines 185 - 195, The code creates a Blob
from req.body which doubles memory during large uploads; update the FilesRouter
handling around MAX_V8_STRING_SIZE_BYTES and Parse.File so you either (a) defer
Blob creation until right before the storage adapter upload call (move new
Blob([req.body]) close to where Parse.File is handed to the adapter) or (b)
immediately drop the original Buffer reference by setting req.body = null right
after constructing the Blob (ensure filename and contentType are preserved), so
the temporary overlap of req.body and the Blob is minimized.

Comment on lines +234 to +245
let fileData;
// if the file is a blob, get the size from the blob
if (fileObject.file._source?.file instanceof Blob) {
// get the size of the blob
fileObject.fileSize = fileObject.file._source.file.size;
// set the file data
fileData = fileObject.file._source?.file;
} else {
const bufferData = Buffer.from(fileObject.file._data, 'base64');
fileObject.fileSize = Buffer.byteLength(bufferData);
fileData = bufferData;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat package.json | jq '.dependencies["parse"], .devDependencies["parse"]'

Repository: parse-community/parse-server

Length of output: 84


🏁 Script executed:

rg -n '_source\b|_data\b' --type=js src/ | head -30

Repository: parse-community/parse-server

Length of output: 2505


🏁 Script executed:

# Check if Parse.File is instantiated elsewhere and how
rg -n 'new Parse\.File|Parse\.File\(' --type=js src/ | head -20

Repository: parse-community/parse-server

Length of output: 597


🏁 Script executed:

# Look for type definitions or documentation about Parse.File
fd -e d.ts -e ts -e js 'parse' --search-path=node_modules 2>/dev/null | head -10

Repository: parse-community/parse-server

Length of output: 54


🌐 Web query:

Parse.File _source _data internal properties JavaScript SDK documentation

💡 Result:

In the Parse JavaScript SDK, Parse.File’s underscored fields like _source and _data are internal implementation details (not part of the public API), so they’re not documented as supported/stable. The supported/public surface is methods like name(), url(), save(), getData() and toJSON(). [1]

What _source is (internal)

From the SDK implementation, Parse.File stores an internal _source object that describes where the file’s content came from and how it should be uploaded. It’s set by the constructor based on what you pass as data (bytes/base64/blob/uri), e.g. formats like:

  • { format: 'base64', base64: '...', type: '...' }
  • { format: 'file', file: Blob, type: '...' }
  • { format: 'uri', uri: '...', type: '...' } [2]

This mirrors what the constructor docs say you can pass as data (byte array, {base64: ...}, {uri: ...}, browser File/Blob). [1]

What _data is (internal)

In current SDK builds, _data is an internal base64 cache of the file’s contents:

  • It’s populated when you construct the file from bytes/base64 (and after downloading via getData()).
  • getData() returns _data immediately if it’s already present; otherwise it downloads from _url and stores result.base64 into _data. [2]
  • When uploading from a browser Blob/file, _data is cleared (null) after a successful save() (since the SDK didn’t necessarily keep a base64 copy around). [2]

What to use instead of _source / _data

Use the public methods:

  • await file.getData() to obtain base64 (rather than reading _data) [1]
  • file.url() / file.name() [1]
  • file.toJSON() to get the { __type: 'File', name, url } representation stored in objects [2]

Sources:
[1] Parse JS SDK API Reference – Parse.File (v2.6.0) https://parseplatform.org/Parse-SDK-JS/api/2.6.0/Parse.File.html
[2] Parse JS SDK source (ParseFile.js, parse@8.0.3 build on UNPKG) https://app.unpkg.com/parse@8.0.3/files/lib/browser/ParseFile.js


🌐 Web query:

Parse JavaScript SDK Parse.File public API extract file data

💡 Result:

In the Parse JavaScript SDK, Parse.File’s public instance API does not expose the file’s raw bytes/base64 after it’s been created—its documented public methods are essentially name(), save(), and url() (plus options). So the supported way to “extract file data” is to download it from the file URL. [1]

Browser (download as Blob)

const file = obj.get('myFile');          // Parse.File
const res = await fetch(file.url());
const blob = await res.blob();           // <-- file data

Node / Cloud Code (download as Buffer)

const file = request.object.get('myFile');  // Parse.File
const res = await Parse.Cloud.httpRequest({ url: file.url() });
const buf = res.buffer;                     // <-- file data (Buffer)

This pattern (get file.url() then httpRequest) is commonly used in Cloud Code. [2]

Note: file.url() is only available after the file is saved or loaded from a Parse.Object. [1]

Sources: Parse JS SDK Parse.File API docs [1]; example approach via Parse.Cloud.httpRequest [2].

[1] https://parseplatform.org/Parse-SDK-JS/api/2.1.0/Parse.File.html
[2] https://stackoverflow.com/questions/44604937/get-data-from-parse-file-via-beforesave


🏁 Script executed:

# Check if there's any Parse SDK type definitions or documentation in node_modules
find node_modules/parse -name "*.d.ts" -o -name "*.md" 2>/dev/null | head -20

Repository: parse-community/parse-server

Length of output: 54


🏁 Script executed:

# Look at the Parse.File usage pattern more carefully - what fields are publicly documented?
cat node_modules/parse/package.json 2>/dev/null | jq '.version, .main, .types'

Repository: parse-community/parse-server

Length of output: 54


Reliance on undocumented Parse.File internals (_source, _data) creates maintenance risk.

This code directly accesses fileObject.file._source.file, fileObject.file._source.type, and fileObject.file._data—all internal implementation details of Parse.File not exposed in the public API. The Parse SDK web search confirms these are undocumented and not guaranteed stable across versions. Even with Parse SDK pinned at 8.0.3, a future minor/patch update could restructure these internals without breaking semver contracts, silently breaking file uploads.

Consider extracting file data through documented APIs where available, or document the tight coupling to Parse SDK internals with a comment explaining why these private fields are required (e.g., Parse.File.getData() requires saved files, which is not applicable in this upload context).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Routers/FilesRouter.js` around lines 234 - 245, The code is directly
accessing Parse.File internals (fileObject.file._source and _data) which are
undocumented and brittle; update the upload flow to avoid these private fields
by either (1) passing the original raw file (Blob or Buffer) into the request
payload or into fileObject (e.g., fileObject.rawFile) so this router can read
size and data from that documented source, or (2) if the Parse.File is already
saved, use documented APIs (e.g., fetch the file via fileObject.file.url() over
HTTP) to obtain bytes instead of reading _source/_data; if neither is possible
add a concise comment on why a private field is used and implement a guarded
fallback that throws a clear error when those internals are missing. Ensure
changes touch the code paths that set fileData/fileObject.fileSize and reference
symbols fileObject.file, fileObject.fileSize, and fileData so the coupling to
Parse.File internals is removed or clearly documented and fail-safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse.Files cannot be created if over 512MB

3 participants