Conversation
…plumes_processing.py
chrowe
left a comment
There was a problem hiding this comment.
A few minor change requests and a couple questions but overall looks good.
Only thing I didn't check was if the column names in Carto match those in the files sent to s3. Did you do that?
|
|
||
| You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07). | ||
|
|
||
| ###### Note: This dataset processing was done by Claire Hemmerly, and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe). |
There was a problem hiding this comment.
You can add any link here if you want.
|
|
||
| You can view the processed Wastewater Plumes in Coastal Areas dataset [on Resource Watch](https://resourcewatch.org/data/explore/11804f04-d9c7-47b9-8d27-27ce6ed6c042). | ||
|
|
||
| You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07). |
There was a problem hiding this comment.
Since this if a raster file we don't link to our s3 version. You can just link to the source.
Side note, do you know why the zip file on our s3 is so large? My browser says it is 11GB but the source file looks to be 250MB.
There was a problem hiding this comment.
I compressed the zipfile and uploaded to the correct folder. I deleted the one that was 11G
|
|
||
| # convert the data type of columns to integer | ||
| for col in gdf.columns[1:9]: | ||
| gdf[col] = gdf[col].fillna(0).astype('int') |
There was a problem hiding this comment.
Are you sure we want na values to be zero? Are there any existing zero values?
There was a problem hiding this comment.
Only NA values are in the columns that show % nitrogen input, which we're not using, and the same rows have 0 for the nitrogen input in g/yr, so I think updating the NAs to 0 should be fine. I had trouble creating the processed shapefile when the values were floats not integers, I got an error basically saying the numbers were too big.
|
|
||
| # convert the data type of columns to integer | ||
| for col in gdf.columns[1:9]: | ||
| gdf[col] = gdf[col].fillna(0).astype('int') |
There was a problem hiding this comment.
Again, just want to make sure we are accurately translating the data here.
| ''' | ||
|
|
||
| # load in the polygon shapefile | ||
| shapes = os.path.join(raw_data_file_unzipped, 'effluent_N_pourpoints_all.shp') |
There was a problem hiding this comment.
Can you use processed_data_file here instead of shapes for consistency with our other scripts?
There was a problem hiding this comment.
Actually I think the main thing is that the final thing you upload is called processed_data_file.
There was a problem hiding this comment.
Ive updated this section, let me know if it looks ok
| set_default_credentials(username=CARTO_USER, base_url="https://{user}.carto.com/".format(user=CARTO_USER),api_key=CARTO_KEY) | ||
|
|
||
| # upload data frame to Carto | ||
| to_carto(gdf, dataset_name + '_edit', if_exists='replace') |
There was a problem hiding this comment.
For consistency can you use processed_data_file instead of gdf?
| Upload original data and processed data to Amazon S3 storage | ||
| ''' | ||
| # initialize AWS variables | ||
| aws_bucket = 'wri-public-data' |
There was a problem hiding this comment.
This should actually be aws_bucket = 'wri-projects' for raster datasets.
|
I added the processed data file steps and checked the col names, should match now |
Checklist for Reviewing a Pre-Processing Script