Skip to content

[R] Error message caused by reading sparsely populated data is misleading #35806

@thisisnic

Description

@thisisnic

Describe the bug, including details regarding any error messages, version, and platform.

We should either remove the error message or change/append the helper text for null columns.

library(arrow)
library(dplyr)

df <- tibble::tibble(x = 1:1000001, y = c(rep(NA, 1000000), 123456))
tf <- tempfile()               
write_csv_arrow(df, tf)
open_dataset(tf, format = "csv") %>% collect()
#> Error in `compute.Dataset()`:
#> ! Invalid: In CSV column #1: Row #1000002: CSV conversion error to null: invalid value '123456'
#> ℹ If you have supplied a schema and your data contains a header row, you should supply the argument `skip = 1` to prevent the header being read in as data.
#> Backtrace:
#>      ▆
#>   1. ├─open_dataset(tf, format = "csv") %>% collect()
#>   2. ├─dplyr::collect(.)
#>   3. └─arrow:::collect.Dataset(.)
#>   4.   ├─arrow:::collect.ArrowTabular(compute.Dataset(x), as_data_frame)
#>   5.   │ └─base::as.data.frame(x, ...)
#>   6.   └─arrow:::compute.Dataset(x)
#>   7.     └─base::tryCatch(...)
#>   8.       └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   9.         └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  10.           └─value[[3L]](cond)
#>  11.             └─arrow:::augment_io_error_msg(e, call, schema = schema())
#>  12.               └─arrow:::handle_csv_read_error(msg, call, schema)
#>  13.                 └─rlang::abort(msg, call = call)

Component(s)

R

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions