Skip to content

archive:create-from and archive:create support for archives larger than 2 GB#2624

Open
vincentml wants to merge 8 commits intoBaseXdb:mainfrom
vincentml:archive-create-from-2gb
Open

archive:create-from and archive:create support for archives larger than 2 GB#2624
vincentml wants to merge 8 commits intoBaseXdb:mainfrom
vincentml:archive-create-from-2gb

Conversation

@vincentml
Copy link
Contributor

@vincentml vincentml commented Mar 24, 2026

This pull request makes it possible for archive:create and archive:create-from to create zip files that are larger than 2 Gb.

Using BaseX version 12.2, when attempting to create zip files using archive:create or archive:create-from and the size of the files is larger than about 2 Gb I've run into error messages such as "java.lang.ArrayIndexOutOfBoundsException: Maximum array size exceeded (2147483640 > 2147483639)."

For example, this error is produced if the total size of a folder being zipped is 3 Gb when passing the result of archive:create-from directly to file:write-binary:

declare variable $large_3gb_folder :=
  "path/to/folder";
declare variable $zipFile :=
  "path/to/file.zip";

file:write-binary($zipFile, archive:create-from($large_3gb_folder))

and when using a variable to pass the result of archive:create-from to file:write-binary:

declare variable $large_3gb_folder :=
  "path/to/folder";
declare variable $zipFile :=
  "path/to/file.zip";

let $archive := archive:create-from($large_3gb_folder)
return file:write-binary($zipFile || '\file.zip', $archive)

After the changes in this pull request, the above queries produce the expected zip file and the error does not occur.

The current limitation of ~ 2 Gb is due to the file contents being accumulated in memory and exceding the maximum array size set by Java's Integer.MAX_VALUE.

This pull request solves this problem by avoiding the use of an array, and instead accumulates data in memory up to a threshold then switches to a temporary file if the data exceeds the threshold. The threshold is determined from available memory capped at the maximum array size. The temporary file, if created, is deleted automatically. This approach attempts to optimize for the typical use cases of creating small or mid-size archives while making it possible to create very large archives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant