Skip to content

Conversation

@erikvansebille
Copy link
Member

This PR builds on #2094, following @VeckoTheGecko's suggestion at Parcels-code/parcels-benchmarks#1 (comment) to keep the ParticleData in both an xarray.DataSet structure (ParticleSet._ds) and a dict-of-numpy-arrays (ParticleSet._data).

This new branch has the same performance as #2094 (see Parcels-code/parcels-benchmarks#1 (comment)), but advantage is that users can also access the data as a xarray.DataSet.

This will be useful e.g. in the __repr__ (to be implemented) and the new implementation of ParticleFile

  • Chose the correct base branch (particledata_as_dict)
  • Added tests

@VeckoTheGecko
Copy link
Contributor

I think a difficulty with this is that once a new array is created (e..g, from the deletion of particles), then there will be drift between these two data structures. I don't think this is currently working as intended.

Here's a test:

def test_pset_remove_indices(fieldset):
    npart = 10
    lon_start = np.linspace(0, 1, npart)
    lat_start = np.linspace(1, 0, npart)
    pset = ParticleSet(fieldset, lon=np.linspace(0, 1, npart), lat=np.linspace(1, 0, npart))
    assert len(pset._ds.lon) == len(pset._data["lon"]) == npart

    pset.remove_indices([0])
    assert len(pset._ds.lon) == len(pset._data["lon"]) == npart - 1
________________________________________________________________ test_pset_remove_indices _________________________________________________________________

fieldset = <parcels.fieldset.FieldSet object at 0x3249d7f20>

    def test_pset_remove_indices(fieldset):
        npart = 10
        lon_start = np.linspace(0, 1, npart)
        lat_start = np.linspace(1, 0, npart)
        pset = ParticleSet(fieldset, lon=np.linspace(0, 1, npart), lat=np.linspace(1, 0, npart))
        assert len(pset._ds.lon) == len(pset._data["lon"]) == npart
    
        pset.remove_indices([0])
>       assert len(pset._ds.lon) == len(pset._data["lon"]) == npart - 1
E       AssertionError: assert 10 == 9
E        +  where 10 = len(<xarray.DataArray 'lon' (trajectory: 10)> Size: 40B\narray([0.        , 0.11111111, 0.22222222, 0.33333334, 0.44444445,..., 0.8888889 , 1.        ],\n      dtype=float32)\nCoordinates:\n  * trajectory  (trajectory) int64 80B 0 1 2 3 4 5 6 7 8 9)
E        +    where <xarray.DataArray 'lon' (trajectory: 10)> Size: 40B\narray([0.        , 0.11111111, 0.22222222, 0.33333334, 0.44444445,..., 0.8888889 , 1.        ],\n      dtype=float32)\nCoordinates:\n  * trajectory  (trajectory) int64 80B 0 1 2 3 4 5 6 7 8 9 = <xarray.Dataset> Size: 640B\nDimensions:         (trajectory: 10, ngrid: 1)\nCoordinates:\n  * trajectory      (trajector... datetime64[ns] 80B NaT NaT NaT ... NaT NaT NaT\nAttributes:\n    ngrid:    1\n    ptype:    ParticleType(pclass=Particle).lon
E        +      where <xarray.Dataset> Size: 640B\nDimensions:         (trajectory: 10, ngrid: 1)\nCoordinates:\n  * trajectory      (trajector... datetime64[ns] 80B NaT NaT NaT ... NaT NaT NaT\nAttributes:\n    ngrid:    1\n    ptype:    ParticleType(pclass=Particle) = <[KeyError('pclass') raised in repr()] ParticleSet object at 0x3249d4080>._ds
E        +  and   9 = len(array([0.11111111, 0.22222222, 0.33333334, 0.44444445, 0.5555556 ,\n       0.6666667 , 0.7777778 , 0.8888889 , 1.        ], dtype=float32))

tests/v4/test_particleset_execute.py:142: AssertionError

@erikvansebille
Copy link
Member Author

Ah, good catch! I hadn't realised that deleting a particle would create a new dataset, indeed. And so would adding two ParticleSets.

We could change the code to all update the dict-of-numpy-arrays whenever we change the dataset; or is that too prone to errors?

Adding @VeckoTheGecko's failing test showing that the dict-of-numpys does not track the xarray dataset anymore after a deletion
@erikvansebille
Copy link
Member Author

I just added the test above to the unit tests suite, so that we got failing CI and don't accidentally merge this PR

@erikvansebille
Copy link
Member Author

Clsogn this PR, as the combination of a dict-of-numpy-arrays and an xarray Dataset does not seem robust enough (see #2097 (comment))

@github-project-automation github-project-automation bot moved this from Backlog to Done in Parcels development Jul 22, 2025
@erikvansebille erikvansebille deleted the particledata_as_datastruct_and_dict branch January 21, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants