Skip to content

Add MS-SQL driver support (pyodbc/pymssql) #93

@myyong

Description

@myyong

Summary

Datafaker was built targeting PostgreSQL, which means psycopg2 (sync) and asyncpg (async) are the only database drivers currently listed as dependencies. To support Microsoft SQL Server (MS-SQL), the driver layer needs to be extended.

Problem

There are three related issues in the current code:

  1. psycopg2 is imported unconditionally in datafaker/utils.py:29:

    import psycopg2

    This import is only used at line 651 to catch a PostgreSQL-specific exception (psycopg2.errors.UndefinedObject). On an MS-SQL installation where psycopg2 is not installed, this import will fail at startup.

  2. No MS-SQL driver is declared as a dependency in pyproject.toml. The standard SQLAlchemy driver for MS-SQL is pyodbc (via mssql+pyodbc://) or pymssql (via mssql+pymssql://). Neither is listed.

  3. The async DSN is built by hardcoded string replacement in datafaker/utils.py:208:

    async_dsn = db_dsn.replace("postgresql://", "postgresql+asyncpg://")

    This silently produces a malformed or unchanged DSN for any non-PostgreSQL connection string, including mssql://.

Proposed steps

  1. Add an optional MS-SQL dependency group in pyproject.toml, e.g.:

    [tool.poetry.extras]
    mssql = ["pyodbc"]

    This keeps the MS-SQL driver optional so existing PostgreSQL users are not affected.

  2. Make the psycopg2 import conditional — import it only when the active DSN is a PostgreSQL connection, or guard the import with a try/except ImportError. Move the psycopg2.errors.UndefinedObject error check behind a dialect check so it degrades gracefully on non-PostgreSQL connections.

  3. Replace the hardcoded async DSN rewrite with a dialect-aware helper. SQLAlchemy DSNs follow the pattern dialect+driver://..., so the rewrite should inspect the dialect and substitute the appropriate async driver:

    • postgresql://postgresql+asyncpg://
    • mssql://mssql+aioodbc:// (or similar, depending on chosen async driver)
  4. Add pyodbc (or pymssql) to the CI matrix so MS-SQL connectivity is tested.

Acceptance criteria

  • Installing datafaker without psycopg2 (e.g. in an MS-SQL-only environment) does not raise an ImportError at startup.
  • A connection string of the form mssql+pyodbc://... can be passed via SRC_DSN / DST_DSN without producing a malformed async DSN.
  • Existing PostgreSQL tests continue to pass unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions