Summary
Datafaker was built targeting PostgreSQL, which means psycopg2 (sync) and asyncpg (async) are the only database drivers currently listed as dependencies. To support Microsoft SQL Server (MS-SQL), the driver layer needs to be extended.
Problem
There are three related issues in the current code:
-
psycopg2 is imported unconditionally in datafaker/utils.py:29:
This import is only used at line 651 to catch a PostgreSQL-specific exception (psycopg2.errors.UndefinedObject). On an MS-SQL installation where psycopg2 is not installed, this import will fail at startup.
-
No MS-SQL driver is declared as a dependency in pyproject.toml. The standard SQLAlchemy driver for MS-SQL is pyodbc (via mssql+pyodbc://) or pymssql (via mssql+pymssql://). Neither is listed.
-
The async DSN is built by hardcoded string replacement in datafaker/utils.py:208:
async_dsn = db_dsn.replace("postgresql://", "postgresql+asyncpg://")
This silently produces a malformed or unchanged DSN for any non-PostgreSQL connection string, including mssql://.
Proposed steps
-
Add an optional MS-SQL dependency group in pyproject.toml, e.g.:
[tool.poetry.extras]
mssql = ["pyodbc"]
This keeps the MS-SQL driver optional so existing PostgreSQL users are not affected.
-
Make the psycopg2 import conditional — import it only when the active DSN is a PostgreSQL connection, or guard the import with a try/except ImportError. Move the psycopg2.errors.UndefinedObject error check behind a dialect check so it degrades gracefully on non-PostgreSQL connections.
-
Replace the hardcoded async DSN rewrite with a dialect-aware helper. SQLAlchemy DSNs follow the pattern dialect+driver://..., so the rewrite should inspect the dialect and substitute the appropriate async driver:
postgresql:// → postgresql+asyncpg://
mssql:// → mssql+aioodbc:// (or similar, depending on chosen async driver)
-
Add pyodbc (or pymssql) to the CI matrix so MS-SQL connectivity is tested.
Acceptance criteria
- Installing datafaker without
psycopg2 (e.g. in an MS-SQL-only environment) does not raise an ImportError at startup.
- A connection string of the form
mssql+pyodbc://... can be passed via SRC_DSN / DST_DSN without producing a malformed async DSN.
- Existing PostgreSQL tests continue to pass unchanged.
Summary
Datafaker was built targeting PostgreSQL, which means
psycopg2(sync) andasyncpg(async) are the only database drivers currently listed as dependencies. To support Microsoft SQL Server (MS-SQL), the driver layer needs to be extended.Problem
There are three related issues in the current code:
psycopg2is imported unconditionally indatafaker/utils.py:29:This import is only used at line 651 to catch a PostgreSQL-specific exception (
psycopg2.errors.UndefinedObject). On an MS-SQL installation wherepsycopg2is not installed, this import will fail at startup.No MS-SQL driver is declared as a dependency in
pyproject.toml. The standard SQLAlchemy driver for MS-SQL ispyodbc(viamssql+pyodbc://) orpymssql(viamssql+pymssql://). Neither is listed.The async DSN is built by hardcoded string replacement in
datafaker/utils.py:208:This silently produces a malformed or unchanged DSN for any non-PostgreSQL connection string, including
mssql://.Proposed steps
Add an optional MS-SQL dependency group in
pyproject.toml, e.g.:This keeps the MS-SQL driver optional so existing PostgreSQL users are not affected.
Make the
psycopg2import conditional — import it only when the active DSN is a PostgreSQL connection, or guard the import with atry/except ImportError. Move thepsycopg2.errors.UndefinedObjecterror check behind a dialect check so it degrades gracefully on non-PostgreSQL connections.Replace the hardcoded async DSN rewrite with a dialect-aware helper. SQLAlchemy DSNs follow the pattern
dialect+driver://..., so the rewrite should inspect the dialect and substitute the appropriate async driver:postgresql://→postgresql+asyncpg://mssql://→mssql+aioodbc://(or similar, depending on chosen async driver)Add
pyodbc(orpymssql) to the CI matrix so MS-SQL connectivity is tested.Acceptance criteria
psycopg2(e.g. in an MS-SQL-only environment) does not raise anImportErrorat startup.mssql+pyodbc://...can be passed viaSRC_DSN/DST_DSNwithout producing a malformed async DSN.