Conversation
There was a problem hiding this comment.
Pull request overview
Fixes failing LMDB-related CI tests by ensuring LMDB database connections are explicitly closed between test phases, and by slightly refactoring LMDBDataset’s post-init connection handling.
Changes:
- Add a public
LMDBDataset.close()method that closes the underlying LMDB environment. - Update the LMDB converter/dataset test to close LMDB datasets before re-opening the same database for the precomputed-representation checks.
- Move the
_close_connection()call to the end ofLMDBDataset._post_init().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/data/test_dataconverters_and_datasets.py |
Explicitly closes LMDB datasets during the LMDB converter test to avoid multiple concurrent opens of the same LMDB path. |
src/graphnet/data/dataset/lmdb/lmdb_dataset.py |
Refactors post-init connection closing order and introduces a public close() method. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -310,6 +309,7 @@ | |||
| dataset_from_lmdb_raw[ix].x, dataset_sqlite[ix].x | |||
| ) | |||
|
|
|||
| dataset_from_lmdb_raw.close() # Close connection | |||
| # Test 2: Check that pre-computed representation matches real-time computed | |||
There was a problem hiding this comment.
dataset_from_lmdb_raw.close() is only reached if all assertions above pass. If an assertion fails, the LMDB environment can remain open and may cause later LMDB tests in the same session to error when creating a new dataset for the same path. Consider wrapping the dataset usage in a try/finally (or using contextlib.closing) so close() is always executed.
| dataset_from_lmdb_precomputed = LMDBDataset( | ||
| path, **opt_precomputed # type: ignore | ||
| ) | ||
|
|
||
| assert len(dataset_from_lmdb_precomputed) == len(dataset_sqlite) | ||
|
|
||
| # Compare pre-computed representation with real-time computed from SQLite | ||
| for ix in range(len(dataset_sqlite)): | ||
| precomputed_event = dataset_from_lmdb_precomputed[ix] | ||
| realtime_event = dataset_sqlite[ix] | ||
|
|
||
| # Compare node features | ||
| assert torch.allclose(precomputed_event.x, realtime_event.x) | ||
|
|
||
| # Compare edge indices if they exist | ||
| if hasattr(precomputed_event, "edge_index") and hasattr( | ||
| realtime_event, "edge_index" | ||
| ): | ||
| assert torch.equal( | ||
| precomputed_event.edge_index, realtime_event.edge_index | ||
| ) | ||
|
|
||
| # Compare number of pulses | ||
| assert precomputed_event.n_pulses == realtime_event.n_pulses | ||
|
|
||
| # Compare truth labels | ||
| for truth_key in TRUTH.DEEPCORE: | ||
| if hasattr(precomputed_event, truth_key) and hasattr( | ||
| realtime_event, truth_key | ||
| ): | ||
| precomputed_truth = getattr(precomputed_event, truth_key) | ||
| realtime_truth = getattr(realtime_event, truth_key) | ||
| if isinstance(precomputed_truth, torch.Tensor): | ||
|
|
||
| assert torch.allclose( | ||
| precomputed_truth, | ||
| realtime_truth.to(precomputed_truth.dtype), | ||
| ) | ||
| else: | ||
| assert precomputed_truth == realtime_truth | ||
| dataset_from_lmdb_precomputed.close() # Close connection |
There was a problem hiding this comment.
dataset_from_lmdb_precomputed.close() is not guaranteed to run if any of the comparisons/assertions in the loop fail. To avoid leaking an open LMDB environment (which can make subsequent tests flaky), wrap the dataset lifetime in a try/finally (or contextlib.closing) so the connection is always closed.
The current checks are running into an error, the inital merge of #852 did not seem to have the test fail but all subsequent PR's have the test failing. (I also reran the test for #852 and the test seems to fail, it is beyond me how it passed initially...)
The error comes from opening multiple connections of the same LMDB database which does not seem to be allowed. This runs into an error at
graphnet/src/graphnet/data/utilities/lmdb_utilities.py
Line 62 in 8477757
Noneis returned forgraphnet/src/graphnet/data/utilities/lmdb_utilities.py
Line 50 in 8477757
graphnet/src/graphnet/data/dataset/lmdb/lmdb_dataset.py
Line 340 in 8477757
The fix I propose is to add a public close() (which just calls the private _close_connection(), this is more convention than anything else) and then call the close() in the test before testing the precomputed dataset version.
Another minor adjustment i made is move the close_connection() call to the end of the post_init() call, this is just a small refactoring.