Conversation
|
Thanks @rafafrdz! Kicking off CI. |
|
This looks good so far @rafafrdz. CI shows all three On the compatibility side, I noticed that the PR doesn't distinguish between ANSI and legacy mode for invalid URL handling. DataFusion's |
|
Thanks for the suggestion @andygrove What I did:
I also registered |
7c8a8d9 to
9d7720f
Compare
| -- under the License. | ||
|
|
||
| -- ConfigMatrix: parquet.enable.dictionary=false,true | ||
| -- MinSparkVersion: 3.5 |
There was a problem hiding this comment.
Does Spark 3.4 not support parse_url?
| SELECT parse_url(url, 'AUTHORITY') FROM test_parse_url | ||
|
|
||
| query | ||
| SELECT parse_url(url, 'USERINFO') FROM test_parse_url |
There was a problem hiding this comment.
Could you also add a literal arguments query to the SQL file, since literal and column inputs can use different code paths? Something like:
SELECT parse_url('http://spark.apache.org/path?q=1', 'HOST')
| } | ||
|
|
||
| test("parse_url with invalid URL in legacy mode") { | ||
| assume(isSpark40Plus) |
There was a problem hiding this comment.
Could you add a comment explaining why this test is specific to Spark 4?
There was a problem hiding this comment.
my mistake, drop assumed part
| } | ||
|
|
||
| test("parse_url in ANSI mode (Spark 3.5)") { | ||
| assume(!isSpark40Plus) |
There was a problem hiding this comment.
Why does this test exclude Spark 4?
|
Moving this to draft until feedback is addressed |
- Remove try_parse_url SQL queries: try_parse_url is a Comet-internal DataFusion function name used when serializing parse_url with failOnError=false. It is not a registered Spark SQL function and calling it directly via SQL raises UNRESOLVED_ROUTINE on any Spark version. The NULL-on-invalid-URL behaviour is covered by the 'parse_url with invalid URL in legacy mode' Scala test. - Replace ftp port 21 with 2121 in test URLs: the Rust url crate omits well-known default ports (ftp=21) when serialising authority(), while Java URI.getRawAuthority() preserves them verbatim. Using a non-default port avoids this pre-existing semantic gap and keeps the AUTHORITY test case meaningful on both Spark 3.5 and 4.0.
Use Expression directly instead of a generic type parameter T <: Expression. This eliminates the asInstanceOf cast in convertFromInvokeUnchecked and the unchecked method itself, since QueryPlanSerde can now call convertFromInvoke directly without existential type issues.
3b07a93 to
54fc48a
Compare
Summary
parse_urlby mappingParseUrlto the native scalar functionparse_urlin expression serde.parse_urlas supported.Why
This closes one of the missing DataFusion 50 migration functions from issue #2443
Part of #2443