Ocn 027 rw0 by clairehemmerly · Pull Request #292 · resource-watch/data-pre-processing

clairehemmerly · 2022-06-07T16:42:09Z

Checklist for Reviewing a Pre-Processing Script

…plumes_processing.py

chrowe

A few minor change requests and a couple questions but overall looks good.
Only thing I didn't check was if the column names in Carto match those in the files sent to s3. Did you do that?

chrowe · 2022-06-08T16:37:06Z

ocn_027a_rw0_nitrogen_plumes/README.md

+
+You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07).
+
+###### Note: This dataset processing was done by Claire Hemmerly, and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe).


You can add any link here if you want.

ocn_027a_rw0_nitrogen_plumes/README.md

chrowe · 2022-07-08T16:39:20Z

ocn_027a_rw0_nitrogen_plumes/README.md

+
+You can view the processed Wastewater Plumes in Coastal Areas dataset [on Resource Watch](https://resourcewatch.org/data/explore/11804f04-d9c7-47b9-8d27-27ce6ed6c042).
+
+You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07).


Since this if a raster file we don't link to our s3 version. You can just link to the source.
Side note, do you know why the zip file on our s3 is so large? My browser says it is 11GB but the source file looks to be 250MB.

I compressed the zipfile and uploaded to the correct folder. I deleted the one that was 11G

ocn_027a_rw0_nitrogen_plumes/README.md

chrowe · 2022-07-08T18:42:49Z

ocn_027b_rw0_wastewater_pourpoints/ocn_027b_rw0_wastewater_pourpoints_processing.py

+
+# convert the data type of columns to integer 
+for col in gdf.columns[1:9]:
+    gdf[col] = gdf[col].fillna(0).astype('int')


Are you sure we want na values to be zero? Are there any existing zero values?

Only NA values are in the columns that show % nitrogen input, which we're not using, and the same rows have 0 for the nitrogen input in g/yr, so I think updating the NAs to 0 should be fine. I had trouble creating the processed shapefile when the values were floats not integers, I got an error basically saying the numbers were too big.

ocn_027c_rw0_wastewater_watersheds/README.md

chrowe · 2022-07-08T19:08:45Z

ocn_027c_rw0_wastewater_watersheds/ocn_027c_rw0_wastewater_watersheds_processing.py

+
+# convert the data type of columns to integer 
+for col in gdf.columns[1:9]:
+    gdf[col] = gdf[col].fillna(0).astype('int')


Again, just want to make sure we are accurately translating the data here.

chrowe · 2022-07-08T19:20:04Z

ocn_027b_rw0_wastewater_pourpoints/ocn_027b_rw0_wastewater_pourpoints_processing.py

+'''
+
+# load in the polygon shapefile
+shapes = os.path.join(raw_data_file_unzipped, 'effluent_N_pourpoints_all.shp')


Can you use processed_data_file here instead of shapes for consistency with our other scripts?

Actually I think the main thing is that the final thing you upload is called processed_data_file.

Ive updated this section, let me know if it looks ok

chrowe · 2022-07-08T19:27:40Z

ocn_027c_rw0_wastewater_watersheds/ocn_027c_rw0_wastewater_watersheds_processing.py

+set_default_credentials(username=CARTO_USER, base_url="https://{user}.carto.com/".format(user=CARTO_USER),api_key=CARTO_KEY)
+
+# upload data frame to Carto
+to_carto(gdf, dataset_name + '_edit', if_exists='replace')


For consistency can you use processed_data_file instead of gdf?

chrowe · 2022-07-12T16:50:55Z

ocn_027a_rw0_nitrogen_plumes/ocn_027a_rw0_nitrogen_plumes_processing.py

+Upload original data and processed data to Amazon S3 storage
+'''
+# initialize AWS variables
+aws_bucket = 'wri-public-data'


This should actually be aws_bucket = 'wri-projects' for raster datasets.

…pre-processing into ocn_027_rw0

clairehemmerly · 2022-07-13T22:50:28Z

I added the processed data file steps and checked the col names, should match now

clairehemmerly and others added 12 commits May 9, 2022 11:34

first upload of nitrogen plume data

5b2548f

Rename ocn_027_rw0_nitrogen_plumes_plumes.py to ocn_027_rw0_nitrogen_…

f2c7aa9

…plumes_processing.py

debugging

89dd771

debugging

8ab9e2a

updated minor fixes to plumes preprocessing

a0d497e

updated preprocessing file

0332613

deleting old version

7361502

Create README.md

a160987

Update README.md

6075223

Update README.md

e4553b7

Update README.md

c1ee8d9

Update README.md

1503240

clairehemmerly requested a review from chrowe June 7, 2022 16:42

clairehemmerly self-assigned this Jun 7, 2022

clairehemmerly and others added 5 commits June 14, 2022 11:58

watershed and pourpoint preprocessing files

dc54217

Merge branch 'ocn_027_rw0' of https://github.com/resource-watch/data-…

16acb61

…pre-processing into ocn_027_rw0

fixed typo

b1108d3

Create README.md

9311d6d

Create README.md

a0e861f

chrowe requested changes Jul 8, 2022

View reviewed changes

chrowe reviewed Jul 12, 2022

View reviewed changes

clairehemmerly and others added 6 commits July 12, 2022 13:39

Update README.md

19453f9

Update README.md

83daf03

Update README.md

78c4d30

Update README.md

bbf6ec3

corrections based on pull request comments

c0d3b36

Merge branch 'ocn_027_rw0' of https://github.com/resource-watch/data-…

29c67ba

…pre-processing into ocn_027_rw0


		You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07).

		###### Note: This dataset processing was done by Claire Hemmerly, and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe).


		You can view the processed Wastewater Plumes in Coastal Areas dataset [on Resource Watch](https://resourcewatch.org/data/explore/11804f04-d9c7-47b9-8d27-27ce6ed6c042).

		You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07).

Conversation

clairehemmerly commented Jun 7, 2022 • edited by chrowe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist for Reviewing a Pre-Processing Script

Uh oh!

chrowe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clairehemmerly Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clairehemmerly Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clairehemmerly commented Jul 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clairehemmerly commented Jun 7, 2022 •

edited by chrowe

Loading

clairehemmerly Jul 12, 2022 •

edited

Loading

clairehemmerly Jul 13, 2022 •

edited

Loading