Skip to content

"group1" / "group2" / "column" column path creation notation #1757

@Jolanrensen

Description

@Jolanrensen

Related to #1684

In DataFrame, we currently have a shortcut for creating column paths inside the columns selection DSL:

df.select { "group1"["group2"]["col"] }

and with type:

df.select { "group1"["group2"]["col"]<Int>() }

We discussed an alternative notation akin to the myPath / Path("subFolder") shortcut in the standard library.

Props to @koperagen for finding it :)

Adding this notation would make the API look like:

df.select { "group1" / "group2" / "col" }

and with type:

df.select { ("group1" / "group2" / "col")<Int>() }

Why to consider this alternative, pros:

  • Stdlib has it for paths
  • "/" Is often denoted in paths; we even print it when we refer to a nested column in errors :)
  • It's one less character than "[]" (but adding two more for typed versions)
  • It's more readable
  • It makes all parts of the path appear equally important
  • It cannot be confused like "a"["b", "c"] and "a"["b"]["c"] can. "a" / "b", "c" is simply impossible
  • It can be mixed with column extensions if we want to: a.b / "c" / (a.b / "c")<Int>()

Why not consider it and the pros of our existing method:

  • We already use "group1"["group2"]["col"]
  • "a"["b"]["c"] could be seen as two "getting" operations like datarow["b"]["c"], similar to maps
  • It's easier to add types to: "a"["b"]["c"]<T>() compared to ("a"/"b"/"c")<T>()
  • It can be mixed with column extensions: a.b["c"] / a.b["c"]<Int>(), which is a bit more intuitive, as it's just a get call.

Of course we still have pathOf("group1", "group2", "group3") which also works outside this DSL and is not going anywhere.

Feel free to leave ideas, examples, thoughts or any other comments below.

Metadata

Metadata

Assignees

No one assigned

    Labels

    APIIf it touches our APIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions