Converting our "movies" example to the compiler plugin, I came across the following notation:
.split { title }.by {
listOf<Any>(
"""\s*\(\d{4}\)\s*$""".toRegex().replace(title, ""),
"\\d{4}".toRegex().findAll(title).lastOrNull()?.value?.toIntOrNull() ?: -1,
)
}.into("title", "year")
This creates the columns title: DataColumn<String> and year: DataColumn<Int>. The problem with this list notation is that the compiler plugin cannot read the types easily. It can only see them as Any, because that's the type of the list.
I tried several alternatives, but they all boiled down to multiple steps:
- add
title2 and year, remove the old title, rename title2 to title
- replace or convert the
title column to a column group of title/year, ungroup it again
- use
split and then cast or requireColumn
But all seem to be way too complicated for what we want here: simply replacing a column with two new ones.
A new API could look like:
df.replace { title }.by { // AddDsl
"title" from {
"""\s*\(\d{4}\)\s*$""".toRegex().replace(title, "")
}
"year" from {
"\\d{4}".toRegex().findAll(title).lastOrNull()?.value?.toIntOrNull() ?: -1
}
}
or more generally:
df.replace { name and firstName and age }.by {
"name" from {
"$firstName $name ($age)"
}
"welcomeMessage" from {
"Hi, $firstName!"
}
}
We could explore other notations or names, of course, but it would narrow down to:
- remove some columns
- add some new ones
in one operation
Converting our "movies" example to the compiler plugin, I came across the following notation:
.split { title }.by { listOf<Any>( """\s*\(\d{4}\)\s*$""".toRegex().replace(title, ""), "\\d{4}".toRegex().findAll(title).lastOrNull()?.value?.toIntOrNull() ?: -1, ) }.into("title", "year")This creates the columns
title: DataColumn<String>andyear: DataColumn<Int>. The problem with thislistnotation is that the compiler plugin cannot read the types easily. It can only see them asAny, because that's the type of the list.I tried several alternatives, but they all boiled down to multiple steps:
title2andyear, remove the oldtitle, renametitle2totitletitlecolumn to a column group oftitle/year, ungroup it againsplitand thencastorrequireColumnBut all seem to be way too complicated for what we want here: simply replacing a column with two new ones.
A new API could look like:
df.replace { title }.by { // AddDsl "title" from { """\s*\(\d{4}\)\s*$""".toRegex().replace(title, "") } "year" from { "\\d{4}".toRegex().findAll(title).lastOrNull()?.value?.toIntOrNull() ?: -1 } }or more generally:
df.replace { name and firstName and age }.by { "name" from { "$firstName $name ($age)" } "welcomeMessage" from { "Hi, $firstName!" } }We could explore other notations or names, of course, but it would narrow down to:
in one operation