Skip to content

CREATE TABLE AS not checking column unicity #22167

@jccampagne

Description

@jccampagne

Describe the bug

I was going through the examples in the manual (https://datafusion.apache.org/user-guide/sql/select.html)
and running them in my project, which has the particularity of doing CREATE TABLE ... AS ... ( ) automatically.

Here is a minimal SQL reproducible example in Datafusion CLI:

% datafusion-cli
DataFusion CLI v53.1.0
> CREATE TABLE x AS VALUES(1,2);
0 row(s) fetched.
Elapsed 0.002 seconds.

> SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2;
+---------+---------+---------+---------+
| column1 | column2 | column1 | column2 |
+---------+---------+---------+---------+
| 1       | 2       | NULL    | NULL    |
+---------+---------+---------+---------+
1 row(s) fetched.
Elapsed 0.002 seconds.

> CREATE TABLE T AS (SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2);
0 row(s) fetched.
Elapsed 0.002 seconds.

> SELECT * FROM T;
Schema error: Schema contains duplicate qualified field name t.column1
> \d t
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion    | public       | t          | column1     | Int64     | YES         |
| datafusion    | public       | t          | column2     | Int64     | YES         |
| datafusion    | public       | t          | column1     | Int64     | YES         |
| datafusion    | public       | t          | column2     | Int64     | YES         |
+---------------+--------------+------------+-------------+-----------+-------------+
4 row(s) fetched.
Elapsed 0.002 seconds.

I would think the statement CREATE TABLE T AS (SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2); should fail instead of creating a table that you cannot select on.

Postgres comparison

In postgres, the create table as fails on schema validation, preventing the creation of the table:

% psql
psql (14.22 (Homebrew))
Type "help" for help.

jc=# CREATE TABLE x AS VALUES(1,2);
SELECT 1
jc=# SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2;
 column1 | column2 | column1 | column2
---------+---------+---------+---------
       1 |       2 |         |
(1 row)

jc=# CREATE TABLE T AS (SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2);
ERROR:  column "column1" specified more than once

To Reproduce

This should fail

CREATE TABLE x AS VALUES(1,2);
CREATE TABLE T AS (SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2); -- this should fail probably

but it runs and the table is created but the table has an invalid statement:

SELECT * FROM T;
Schema error: Schema contains duplicate qualified field name t.column1

Expected behavior

CREATE TABLE x AS VALUES(1,2);
CREATE TABLE T AS (SELECT * FROM x LEFT JOIN x y ON x.column1 = y.column2);

should fail with something like ERROR: column "column1" specified more than once

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions