Skip to content

DELETE/UPDATE/MERGE failures on Iceberg tables leave orphan files in object storage #26922

@acarpente-denodo

Description

@acarpente-denodo

When executing the DELETE, UPDATE, and MERGE commands on Iceberg tables and the command fails, they should always clean up the new files created and uploaded to the object storage during the execution regardless of where the failure occurs. These files were not committed in a new snapshot, so they are garbage that will never be used.

If the MERGE command fails during the MergeWriterOperator execution, then the method MergeWriterOperator.abort() will clean up the new data files stored in the object storage during the command execution.

The TableFinishOperator creates the new deletion files to record deleted or updated rows and stores them in object storage. The process is the same for DELETE, UPDATE, and MERGE commands.

If the DELETE, UPDATE or MERGE commands fail during the TableFinishOperator execution, Presto does not delete the new data files stored in object storage. As a result, after a command failure, garbage remains in object storage.

Expected Behavior

Presto should delete the new files stored in the object storage during the command execution.

Current Behavior

Presto leaves files in the object storage that will never be used.

Possible Solution

Currently, the workaround to clean up these files is executing the iceberg.system.remove_orphan_files()

Steps to Reproduce

  1. Simulate an error executing the DELETE, UPDATE or MERGE command. To do it, you can add a throw new RuntimeException() in the methods finishWrite() and finishDeleteWithOutput() in the com.facebook.presto.iceberg.IcebergAbstractMetadata Java class.
Image Image
  1. Run Presto.
  2. Create an Iceberg table:
CREATE TABLE iceberg.default.mytest(a int, b int)
WITH (
    location = 'hdfs:///user/presto/warehouse/mytest/'
);

INSERT INTO iceberg.default.mytest VALUES(1, 0), (2, 0);
  1. List the data files in the object storage:
$ ls /user/presto/warehouse/mytest/data/
Found 1 items
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/7f3ff2fa-35b3-46b5-83fa-e836050dd08d.parquet
  1. Delete some rows in mytest table:
    DELETE FROM iceberg.default.mytest WHERE a > 1
    The command will return the error: FORCED FAILURE DURING DELETE COMMAND EXECUTION

  2. List the data files in the object storage:
    $ ls /user/presto/warehouse/mytest/data/

Found 2 items
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/7f3ff2fa-35b3-46b5-83fa-e836050dd08d.parquet
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/delete_file_e09d21c4-72fe-4b84-a3dc-9a7de687074e.parquet

The delete_file_e09d21c4-72fe-4b84-a3dc-9a7de687074e.parquet file should not be there. Presto must delete this file when the DELETE command fails.

  1. Update some rows in mytest table:
    UPDATE iceberg.default.mytest SET a = 1111 WHERE a > 1
    The command will return the error: FORCED FAILURE DURING UPDATE OR MERGE COMMAND EXECUTION

  2. List the data files in the object storage:
    $ ls /user/presto/warehouse/mytest/data/

Found 4 items
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/7f3ff2fa-35b3-46b5-83fa-e836050dd08d.parquet
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/delete_file_e09d21c4-72fe-4b84-a3dc-9a7de687074e.parquet
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/f031d070-c05e-4296-a6ff-78c61abd750d.parquet
-rw-rw-rw-   3 presto hadoop  hdfs:///user/presto/warehouse/mytest/data/delete_file_29f5dd84-22af-489d-b873-a95cbf5402a1.parquet

The delete_file_29f5dd84-22af-489d-b873-a95cbf5402a1.parquet and f031d070-c05e-4296-a6ff-78c61abd750d.parquet files should not be there. Presto must delete these files when the UPDATE command fails.

Metadata

Metadata

Assignees

Labels

bugicebergApache Iceberg related

Type

No type

Projects

Status

🆕 Unprioritized

Status

🆕 Unprioritized

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions