Conversation
R/read_waterdata_stats_datatable.R
Outdated
| combined <- data.table::rbindlist(combined_list) | ||
| combined <- combined[return_list, on = "rid"] | ||
|
|
||
| col_order <- c("parent_time_series_id", "monitoring_location_id", "monitoring_location_name", "parameter_code", |
There was a problem hiding this comment.
Ideally we can figure out a way to not write these out by hand. On the off-chance they add some columns or change the function will start to fail. (and yes, similar hard-coding has caused some headaches in the past with dataRetrieval). We added some logic (I think just in the current PR that removes max_results) to move all "id" columns to the far right (because users don't want to see those big ol'hashes first) UNLESS they are special like monitoring_location_id.
There was a problem hiding this comment.
The column order is no longer hard-coded in the function, but I don't think the current order we get from unpacking the nested JSON is ideal:
c("parameter_code", "unit_of_measure", "parent_time_series_id",
"value", "percentile", "sample_count", "approval_status", "computation_id",
"computation", "time_of_year", "time_of_year_type", "monitoring_location_id",
"monitoring_location_name", "site_type", "site_type_code", "country_code",
"state_code", "county_code", "geometry") I can merge the newest changes into this branch to see if what you mentioned changes anything. Otherwise, what would you recommend?
…responses, fix data.table joining.
Getting ready for CRAN
Remove constructUseURL
Merge most recent develop changes into statistics
|
Tests appear to be failing when installing vctrs 0.7.0: https://github.com/DOI-USGS/dataRetrieval/actions/runs/21305229784/job/61331526434#step:5:3135. That isn't an explicit dependency of data.table, so I'm not sure why it's installing it in the first place. |
|
I merged the main branch updates into /statistics. Not sure what substantively changed, but the GH action passed successfully 🤷🏻. I think this is ready for a re-review to verify it's matching dR conventions. For example, the columns are probably not in the order we'd like, but I think they're in the de facto order that comes about from unpacking the nested JSON output. I'm not sure how to deal with this other than hard-coding the desired order, which we don't want. Edit: here's the current column order if you don't want to install my version of the package: c("parameter_code", "unit_of_measure", "parent_time_series_id",
"value", "percentile", "sample_count", "approval_status", "computation_id",
"computation", "time_of_year", "time_of_year_type", "monitoring_location_id",
"monitoring_location_name", "site_type", "site_type_code", "country_code",
"state_code", "county_code", "geometry")In my mind, the ideal order would be something like this: c("monitoring_location_id", "monitoring_location_name", "parameter_code", "computation",
"time_of_year", "time_of_year_type", "value", "percentile", "sample_count", "unit_of_measure",
"approval_status", "site_type", "site_type_code", "county_code", "state_code", "country_code",
"parent_time_series_id", "computation_id", "geometry") |
|
|
||
| httr2::request("https://api.waterdata.usgs.gov/statistics/") |> | ||
| httr2::req_url_path_append(paste0("v", version)) |> | ||
| httr2::req_url_path_append(paste0("observation", service)) |
There was a problem hiding this comment.
Add in the user agent info, and probably the gzip encoding and bump up the timeout.
httr2::req_user_agent(default_ua()) |>
httr2::req_headers(`Accept-Encoding` = "gzip") |>
httr2::req_timeout(seconds = 180)|
A few minor things that come out during CHECK:
❯ checking Rd metadata ... WARNING
Rd files with duplicated alias 'dataRetrieval-package':
'dataRetrieval-package.Rd' 'dataRetrieval.Rd'I'm 99% sure it's because the ancient file "dataRetrievals-package.R" has that wayward s. You could first rename that to dataRetrieval-package.R, then re-run the "usethis" code to get the data.table imports, or just paste those importsFrom into the original dataRetrievals-package.R file. Much like coding with dplyr, we have to do some tricks to get rid of that: rid <- data <- ts_id <- values <- NULLI'm not entirely sure I understand what the .j is doing in the code. Maybe double check that line to make sure you understand it, and figure out if it's just another variable like the others, or if it's special.
I've been trying to keep the checks NOTE free (never a good idea to threaten CRAN with anything but perfectly clean submission), so just split those lines up: # - Month-of-year percentiles for June, computed using
# all June data (not just June 1 through June 15).
# Note: the month-of-year percentiles are returned only when the month-day range includes
# the beginning of the month (e.g., "06-01") OK, so those seems like the straightforward things to update from running the package check.
I'm playing around now with the |
| { | ||
| out <- get_statistics_data(list(), "Normals") | ||
|
|
||
| expect_s3_class(out, "sf") |
There was a problem hiding this comment.
Since you set up the mocking, seems like a great time to actually test that the data frame out is exactly like you expect (like, value column is numeric and exactly equal to c(8032.903, 12521.667, 1, 2) or whatever it's suppose to be). We can't do that for actually dataRetrieval calls since the data changes - but with mocking that would be fantastic.
… col with Laura's version, and fix some whitespace
|
Thanks, @ldecicco-USGS. I believe I've addressed all of the points from R CMD CHECK. I also replaced my clean stats col function with yours, which ended up being faster and more readable, IMO. |


Still to-do:
get_params,get_description, andbase_urlfunctions to pull the docs for /statistics too?get_statistics_datainto separate functions to match other read_waterdata patternsread_ogc_datahelpers are needed:return_list <- do.call(rbind, return_list_tmp)as it's a slight bottleneck.And there's Laura's list of to-dos here: https://code.usgs.gov/water/dataRetrieval/-/issues/450