Conversation
|
@adamziel Thanks for sharing the draft and APIs!
Interesting! I was only thinking of implementing the full support in the lexer. I don't think it's very hard, but probably not very easy either. That said, do you think it makes sense to get it in in the "naive" form? With the quick fixes I did today on the WP CLI SQLite side, covered with tests, it supports everything apart from |
|
Yeah if there isn't a use-case, I think it's fine for this to sit here until one emerges. It likely will as a part of the streaming importer work, and we may need to explore a non-naive query stream implementation for that. |
|
@adamziel If we run into another issue with the current WP SQLite CLI parsing, I'll definitely use this in some form. Also, when moving the CLI commands to the SQLite repo, it could make sense to give this a try. |
Adapts [WP_MySQL_Naive_Query_Stream](WordPress/sqlite-database-integration#264) to support multiline SQL queries in the `runSql` step. With this PR, the following call works: ```ts await runSql(php, { sql: new File( [ `SELECT * FROM wp_users -- users table ; SELECT * FROM wp_posts;` ] 'no-trailing-newline.sql' ), }); ``` Whereas before this PR, the `runSql` step assumed every line of a SQL file is a separate query and would fail on the above call. ## Implementation details See WordPress/sqlite-database-integration#264. Tl;dr we tokenize the query and treat `;` and EOF tokens as query separators. The stream is only "naive" in that every query must be smaller than 15MB. It might fail for some very large WordPress posts, but should work most of the time. Once the lexer provides an explicit distinction between syntax errors and incomplete input, we'll be able to support arbitrarily large queries. ## Testing Instructions (or ideally a Blueprint) Tests have been updated to verify multiline query handling, SQL comment preservation, and queries with subqueries. The streaming parser correctly handles edge cases like empty lines, semicolon-only lines, and queries split across chunk boundaries. cc @JanJakes
Proposes a
WP_MySQL_Naive_Query_Streamto enable stream-processing large SQL files one query at a time without running out of memory.Usage:
This class is naive because it doesn't understand what a valid query is.
We assume an invalid query if we can't get the next token and the input source is already exhausted or we have over 2MB of buffered SQL. We can't do better until the lexer provides an explicit distinction between syntax errors and incomplete input. I expect this heuristic to be sufficient in many scenarios, but it will of course fail in pathological cases such as
SELECT SELECT SELECT ...without any semicolons.Related to Automattic/wp-cli-sqlite-command#13
Remaining work
Review this PR, reformat code, add some more comments.
cc @JanJakes @sejas