Skip to content

File-operation order uses excessive disk space #119

@g5t

Description

@g5t

In mcpl_merge_outfiles_mpi the order of merging, merged-file compression, and temporary file removal could be problematic on systems with limited disk space.

mcpl/mcpl_core/src/mcpl.c

Lines 4530 to 4545 in 838417c

mcpl_outfile_t outfh = mcpl_merge_files( targetfn.c_str, nproc,
(const char**)fns);
if ( !mcpl_closeandgzip_outfile(outfh) )
mcpl_error("mcpl_merge_outfiles_mpi: problems gzipping final output");
//Remove worker files:
for ( unsigned long iproc = 0; iproc < nproc; ++iproc ) {
char * bn = mcpl_basename(fns[iproc]);
size_t n = 128 + strlen(bn);
char * buf = mcpl_internal_malloc(n);
snprintf(buf,n,"MCPL: Removing file %s\n",bn);
mcpl_internal_delete_file( fns[iproc] );
mcpl_print(buf);
free(bn);
free(buf);
}
//Cleanup memory:

The files present on disk at points through the execution of mcpl_merge_outfiles_mpi are:

point worker files .mcpl file .mcpl.gz file disk utilization
before mcpl_merge_files nproc 0 0 1
after mcpl_merge_files nproc 1 0 2
during mcpl_closeandgzip_outfile nproc 1 1 3
after mcpl_closeandgzip_outfile nproc 0 1 2
after removing worker files 0 0 1 1

On systems with limited disk space, the compression operation could fail after exhausting the available storage due to the unnecessary presence of the nproc worker files.

Possible solution

Instead, by moving the compression operation after worker-file removal, the maximum disk utilization can be reduced to only twice the final file size:

 mcpl_outfile_t outfh = mcpl_merge_files( targetfn.c_str, nproc, 
                                          (const char**)fns); 
 //Remove worker files: 
 for ( unsigned long iproc = 0; iproc < nproc; ++iproc ) { 
   char * bn = mcpl_basename(fns[iproc]); 
   size_t n = 128 + strlen(bn); 
   char * buf = mcpl_internal_malloc(n); 
   snprintf(buf,n,"MCPL: Removing file %s\n",bn); 
   mcpl_internal_delete_file( fns[iproc] ); 
   mcpl_print(buf); 
   free(bn); 
   free(buf); 
 } 
 if ( !mcpl_closeandgzip_outfile(outfh) ) 
   mcpl_error("mcpl_merge_outfiles_mpi: problems gzipping final output"); 

The files present on disk at points through the execution of this modified code would be:

point worker files .mcpl file .mcpl.gz file disk utilization
before mcpl_merge_files nproc 0 0 1
after mcpl_merge_files nproc 1 0 2
after removing worker files 0 1 0 1
during mcpl_closeandgzip_outfile 0 1 1 2
after mcpl_closeandgzip_outfile 0 0 1 1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions