-
Notifications
You must be signed in to change notification settings - Fork 116
Multigeometries #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Multigeometries #255
Changes from 3 commits
cc1f5b8
6b25e4b
cd8a88e
39fe278
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,7 +46,8 @@ def gen_zonal_stats( | |
| raster_out=False, | ||
| prefix=None, | ||
| geojson_out=False, | ||
| boundless=True, **kwargs): | ||
| boundless=True, | ||
| split_multi_geom=False, **kwargs): | ||
| """Zonal statistics of raster values aggregated to vector geometries. | ||
|
|
||
| Parameters | ||
|
|
@@ -151,32 +152,68 @@ def gen_zonal_stats( | |
| features_iter = read_features(vectors, layer) | ||
| for _, feat in enumerate(features_iter): | ||
| geom = shape(feat['geometry']) | ||
|
|
||
| if 'Point' in geom.type: | ||
| geom = boxify_points(geom, rast) | ||
|
|
||
| geom_bounds = tuple(geom.bounds) | ||
| #Multiple polygons found, and split_multi_geom option is enabled | ||
| #In this case, we read the polygons in one at a time and concact the raster | ||
| #values together. In order to save memory | ||
| if ('MultiPolygon' in geom.type) and (split_multi_geom): | ||
| fsrc = None | ||
| rv_array = None | ||
| isnodata = None | ||
|
|
||
| #iterate through individual polygons | ||
| for singlePolygon in geom.geoms: | ||
| polygon_bounds = tuple(singlePolygon.bounds) | ||
| polygon_fsrc = rast.read(bounds=polygon_bounds) | ||
|
|
||
| # rasterized geometry | ||
| polygon_rv_array = rasterize_geom(singlePolygon, like=polygon_fsrc, all_touched=all_touched) | ||
|
|
||
| # nodata mask | ||
| polygon_isnodata = (polygon_fsrc.array == polygon_fsrc.nodata) | ||
|
|
||
| # add nan mask (if necessary) | ||
| has_nan = ( | ||
| np.issubdtype(polygon_fsrc.array.dtype, np.floating) | ||
| and np.isnan(polygon_fsrc.array.min())) | ||
| if has_nan: | ||
| polygon_isnodata = (polygon_isnodata | np.isnan(polygon_fsrc.array)) | ||
|
|
||
| if fsrc is None: | ||
| fsrc = np.ravel(polygon_fsrc.array) | ||
| rv_array = np.ravel(polygon_rv_array) | ||
| isnodata = np.ravel(polygon_isnodata) | ||
| else: | ||
| fsrc = np.append(fsrc, polygon_fsrc.array) | ||
| rv_array = np.append(rv_array, polygon_rv_array) | ||
| isnodata = np.append(isnodata, polygon_isnodata) | ||
| masked = np.ma.MaskedArray(fsrc, mask = (isnodata | ~rv_array)) | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused as to how this saves any memory. My interpretation is this code is an alternate method for creating the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will be easier to explain with a diagram but it seems I can't upload a picture here, so let me try with description. I'll use the example of a raster with 4 polygons that are each the size of 1 pixel, making this easier to imagine. Let's say the raster is a 10 x 10 raster, with the 4 polygons falling on the pixels (0,0), (9,0), (0,9), and (9,9). So each polygon is at the corner of the raster. Even though the polygons only cover 4 total pixels If you break the multipolygon into 4 separate polygons, then each call to
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I finally got a chance to look at this more closely and I see the optimization now, thanks for the explanation @yyu1! I'd suggest adding a test to make sure the results are equal. This almost works, there are some floating point differences that you'll have to account for but otherwise they are equivalent: def test_multipolygons_split():
multipolygons = os.path.join(DATA, 'multipolygons.shp')
stats = zonal_stats(multipolygons, raster, stats="*")
stats2 = zonal_stats(multipolygons, raster, stats="*", split_multi_geom=True)
assert stats == stats2Our test multipolygon actually makes a great example due to the null space between the two parts. The only problem with this approach is that it changes the shape of the masked array. While this doesn't affect the built-in stats, there are two other parts of the API that could break here:
So we have a few options to proceed. How about this?
Of course a test to assert the above behavior would also be appreciated. I can help with that if needed. |
||
| else: | ||
|
|
||
| geom_bounds = tuple(geom.bounds) | ||
|
|
||
| fsrc = rast.read(bounds=geom_bounds, boundless=boundless) | ||
| fsrc = rast.read(bounds=geom_bounds, boundless=boundless) | ||
|
|
||
| # rasterized geometry | ||
| rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched) | ||
| # rasterized geometry | ||
| rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched) | ||
|
|
||
| # nodata mask | ||
| isnodata = (fsrc.array == fsrc.nodata) | ||
| # nodata mask | ||
| isnodata = (fsrc.array == fsrc.nodata) | ||
|
|
||
| # add nan mask (if necessary) | ||
| has_nan = ( | ||
| np.issubdtype(fsrc.array.dtype, np.floating) | ||
| and np.isnan(fsrc.array.min())) | ||
| if has_nan: | ||
| isnodata = (isnodata | np.isnan(fsrc.array)) | ||
| # add nan mask (if necessary) | ||
| has_nan = ( | ||
| np.issubdtype(fsrc.array.dtype, np.floating) | ||
| and np.isnan(fsrc.array.min())) | ||
| if has_nan: | ||
| isnodata = (isnodata | np.isnan(fsrc.array)) | ||
|
|
||
| # Mask the source data array | ||
| # mask everything that is not a valid value or not within our geom | ||
| masked = np.ma.MaskedArray( | ||
| fsrc.array, | ||
| mask=(isnodata | ~rv_array)) | ||
| # Mask the source data array | ||
| # mask everything that is not a valid value or not within our geom | ||
| masked = np.ma.MaskedArray( | ||
| fsrc.array, | ||
| mask=(isnodata | ~rv_array)) | ||
|
|
||
| # If we're on 64 bit platform and the array is an integer type | ||
| # make sure we cast to 64 bit to avoid overflow. | ||
|
|
||

Uh oh!
There was an error while loading. Please reload this page.