Conversation
There was a problem hiding this comment.
Claude says: the grammar's gnf_column_path rule indexes into a Sequence[String] with $$[0], which requires the sequence/list indexing support added in that file.
There was a problem hiding this comment.
Right, GetElement was just for tuple types. We never had a need for indexing into lists, until now.
| | "[" STRING* "]" | ||
| construct: $$ = $2 | ||
| deconstruct if builtin.length($$) != 1: | ||
| $2: Sequence[String] = $$ |
There was a problem hiding this comment.
Is there semantics for when gnf_column_path is a sequence of length 1? Is it something we can explicitly disallow?
There was a problem hiding this comment.
I think the common case is a path of length 1 (i.e. just the column name). We do support it, we just don't require wrapping it in [].
minsungc
left a comment
There was a problem hiding this comment.
Quick clarifying question otherwise LGTM!
| CSVLocator locator = 1; | ||
| CSVConfig config = 2; | ||
| repeated CSVColumn columns = 3; | ||
| repeated GNFColumn columns = 3; |
There was a problem hiding this comment.
I wonder if at some point we can unify or better distinguish between columns used for data ingestion vs data export? We currently also have ExportCSVColumn which is actually pretty similar to GNFColumn, just without the types field.
There was a problem hiding this comment.
My work on NOT NULL constraints (inclusion dependencies) may have adjacencies to this
consideration. Please keep me in the loop.
There was a problem hiding this comment.
Huh, yeah good point. Maybe even lower than that, it seems that we have the need for a mapping from RelationId to a String[] path (possibly of length 1) in several places, most recently the SnapshotMapping.
There was a problem hiding this comment.
🎉 cool preview of the future ahead
There was a problem hiding this comment.
Dress for the job you want... ;)
|
Looks like you need to rebuild the printers |
staworko
left a comment
There was a problem hiding this comment.
Nothing on my side just a small question to check if I'm missing something
| rel_edb | ||
| : "(" "rel_edb" relation_id rel_edb_path rel_edb_types ")" | ||
| construct: $$ = logic.RelEDB(target_id=$3, path=$4, types=$5) | ||
| edb |
There was a problem hiding this comment.
As much as I love Rel, I like the removal of references to rel_ from the raicode.
| snapshot | ||
| : "(" "snapshot" rel_edb_path relation_id ")" | ||
| construct: $$ = transactions.Snapshot(destination_path=$3, source_relation=$4) | ||
| snapshot_mapping |
There was a problem hiding this comment.
I don't see what has changed here except for the ability to do a snapshot of multiple
relations at once. Is that the whole change? Reading on.
There was a problem hiding this comment.
That's the whole change, yeah, multiple snapshots in one action 👍 Its a little awkward to have to do this rather than just having multiple snapshot actions in the same epoch, but it simplifies the implementation on the engine side.
| CSVLocator locator = 1; | ||
| CSVConfig config = 2; | ||
| repeated CSVColumn columns = 3; | ||
| repeated GNFColumn columns = 3; |
There was a problem hiding this comment.
My work on NOT NULL constraints (inclusion dependencies) may have adjacencies to this
consideration. Please keep me in the loop.
| (snapshot ["my_edb"] :my_rel) | ||
| (snapshot ["database" "table"] :computed) | ||
| (snapshot ["schema" "namespace" "relation"] :big_signed)) | ||
| (snapshot |
There was a problem hiding this comment.
Yeah, I still don't see any big change other than multiple relation at one snapshot rather
than one per snapshot. Am I missing something?
Naturally, there are conceivable reason why this is smarter e.g., triggering snapshot
creation comes with an overhead. but this doesn't need any additional explanation.
There was a problem hiding this comment.
See my comment above, it has to do with some awkwardness in how to implement snapshot on the engine, because it is kind of both, a read and a write, which doesn't fit well into Arroyo's epoch model. There would be ways to implement it in a nicer way, where could just support many single snapshot actions in one epoch, but not in the short term.
RelEDB → EDB, dropping the "Rel" prefix which we've been using to indicate constructs that will go away once we drop Rel support. This is not the case for EDB references.
CSVColumn → GNFColumn, generalizing to make column declarations reusable between CSV and Iceberg data.
GNFColumncan represent nested paths like/:append/:METADATA$KEY.Generalize
Snapshotto support multiple mappings. The snapshot implementation on the engine is more awkward than I had hoped for, and this allows us to at least do it more efficiently.