Related to #1684
In DataFrame, we currently have a shortcut for creating column paths inside the columns selection DSL:
df.select { "group1"["group2"]["col"] }
and with type:
df.select { "group1"["group2"]["col"]<Int>() }
We discussed an alternative notation akin to the myPath / Path("subFolder") shortcut in the standard library.
Props to @koperagen for finding it :)
Adding this notation would make the API look like:
df.select { "group1" / "group2" / "col" }
and with type:
df.select { ("group1" / "group2" / "col")<Int>() }
Why to consider this alternative, pros:
- Stdlib has it for paths
- "/" Is often denoted in paths; we even print it when we refer to a nested column in errors :)
- It's one less character than "[]" (but adding two more for typed versions)
- It's more readable
- It makes all parts of the path appear equally important
- It cannot be confused like
"a"["b", "c"] and "a"["b"]["c"] can. "a" / "b", "c" is simply impossible
- It can be mixed with column extensions if we want to:
a.b / "c" / (a.b / "c")<Int>()
Why not consider it and the pros of our existing method:
- We already use
"group1"["group2"]["col"]
"a"["b"]["c"] could be seen as two "getting" operations like datarow["b"]["c"], similar to maps
- It's easier to add types to:
"a"["b"]["c"]<T>() compared to ("a"/"b"/"c")<T>()
- It can be mixed with column extensions:
a.b["c"] / a.b["c"]<Int>(), which is a bit more intuitive, as it's just a get call.
Of course we still have pathOf("group1", "group2", "group3") which also works outside this DSL and is not going anywhere.
Feel free to leave ideas, examples, thoughts or any other comments below.
Related to #1684
In DataFrame, we currently have a shortcut for creating column paths inside the columns selection DSL:
df.select { "group1"["group2"]["col"] }and with type:
df.select { "group1"["group2"]["col"]<Int>() }We discussed an alternative notation akin to the
myPath / Path("subFolder")shortcut in the standard library.Props to @koperagen for finding it :)
Adding this notation would make the API look like:
df.select { "group1" / "group2" / "col" }and with type:
df.select { ("group1" / "group2" / "col")<Int>() }Why to consider this alternative, pros:
"a"["b", "c"]and"a"["b"]["c"]can."a" / "b", "c"is simply impossiblea.b / "c"/(a.b / "c")<Int>()Why not consider it and the pros of our existing method:
"group1"["group2"]["col"]"a"["b"]["c"]could be seen as two "getting" operations likedatarow["b"]["c"], similar to maps"a"["b"]["c"]<T>()compared to("a"/"b"/"c")<T>()a.b["c"]/a.b["c"]<Int>(), which is a bit more intuitive, as it's just agetcall.Of course we still have
pathOf("group1", "group2", "group3")which also works outside this DSL and is not going anywhere.Feel free to leave ideas, examples, thoughts or any other comments below.