[python/hotfix] Fix IndexError when reading manifest with empty _MIN_VALUES#6971
Closed
XiaoHongbo-Hope wants to merge 3 commits intoapache:masterfrom
Closed
[python/hotfix] Fix IndexError when reading manifest with empty _MIN_VALUES#6971XiaoHongbo-Hope wants to merge 3 commits intoapache:masterfrom
XiaoHongbo-Hope wants to merge 3 commits intoapache:masterfrom
Conversation
Contributor
|
Why Python does not produce empty _MIN_VALUES? Different to Java? |
|
After the repair, a new problem appeared, and the error message is as follows: INFO:pypaimon.catalog.rest.rest_token_file_io:end refresh data token for identifier [Identifier(database='adn', object='wide_table_200cols', branch=None)] expiresAtMillis [1767884499000]
2026-01-08 11:01:44,750 - paimon_dataset.py:189 - ERROR - Error reading table using Paimon API: '_VALUE_STATS_COLS'
Traceback (most recent call last):
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 190, in _paimon_table_to_data_files
raise e
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 178, in _paimon_table_to_data_files
splits = scan.plan().splits()
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/table_scan.py", line 45, in plan
return self.starting_scanner.scan()
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 77, in scan
file_entries = self.plan_files()
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 95, in plan_files
return self.read_manifest_entries(manifest_files)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 102, in read_manifest_entries
return self.manifest_file_manager.read_entries_parallel(manifest_files,
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 57, in read_entries_parallel
for entries in future_results:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 51, in _process_single_manifest
return self.read(manifest_file.file_name, manifest_entry_filter, drop_stats)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 90, in read
fields = self._get_value_stats_fields(file_dict, schema_fields)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 134, in _get_value_stats_fields
if file_dict['_VALUE_STATS_COLS'] is None:
KeyError: '_VALUE_STATS_COLS' |
Contributor
Author
👌 |
Contributor
Author
Actually, Python and Java produce the same result for empty stats - both serialize to 12 bytes, which I added an assertion to show it. I searched all the git history, did not find the root cause of the 0 bytes MIN_VALUES. The data is written by older version pypaimon too. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #6962
Tests
BinaryRowTestAPI and Format
Documentation