Skip to content

Replace hardcoded async DSN rewrite with dialect-aware helper #94

@myyong

Description

@myyong

Summary

The async connection string is built by a hardcoded string replacement in datafaker/utils.py:

async_dsn = db_dsn.replace("postgresql://", "postgresql+asyncpg://")

This has two failure modes for any non-PostgreSQL connection:

  1. Silently broken for MS-SQL: an mssql:// DSN passes through unchanged and is handed to create_async_engine, which then fails with a confusing dialect error rather than a clear message.
  2. Silently broken when a driver is already specified: a DSN such as postgresql+psycopg2://... does not match the literal postgresql:// prefix, so the replace is a no-op and the sync driver is passed to the async engine.

Proposed steps

  1. Introduce a make_async_dsn(db_dsn: str) -> str helper that parses the DSN with SQLAlchemy's make_url, extracts the dialect (the portion before any +), and rewrites the drivername component using a lookup table:

    from sqlalchemy.engine import make_url
    
    _ASYNC_DRIVER_MAP = {
        "postgresql": "postgresql+asyncpg",
        "mssql": "mssql+aioodbc",
    }
    
    def make_async_dsn(db_dsn: str) -> str:
        url = make_url(db_dsn)
        dialect = url.drivername.split("+")[0]
        async_driver = _ASYNC_DRIVER_MAP.get(dialect)
        if async_driver is None:
            raise ValueError(
                f"No async driver is registered for dialect '{dialect}'."
            )
        return str(url.set(drivername=async_driver))

    This correctly handles postgresql://, postgresql+psycopg2://, mssql://, and mssql+pyodbc:// — and raises a clear ValueError for unknown dialects instead of producing a silent no-op.

  2. Replace the call site in create_db_engine (datafaker/utils.py):

    # before
    async_dsn = db_dsn.replace("postgresql://", "postgresql+asyncpg://")
    engine = create_async_engine(async_dsn, **kwargs)
    
    # after
    engine = create_async_engine(make_async_dsn(db_dsn), **kwargs)
  3. Add aioodbc as an optional dependency (see Add MS-SQL driver support (pyodbc/pymssql) #93) for the mssql extra so the async MS-SQL driver is available when needed.

  4. Add unit tests covering:

    • postgresql://postgresql+asyncpg://
    • postgresql+psycopg2://postgresql+asyncpg://
    • mssql://mssql+aioodbc://
    • mssql+pyodbc://mssql+aioodbc://
    • Unknown dialect raises ValueError
    • Credentials, host, port, and database name are preserved through the rewrite

Acceptance criteria

  • create_db_engine produces a valid async DSN for both postgresql:// and mssql:// connection strings.
  • A DSN with an explicit driver (e.g. postgresql+psycopg2://) is correctly rewritten rather than silently passed through.
  • Passing an unsupported dialect raises a ValueError with a message naming the dialect.
  • All existing PostgreSQL async tests continue to pass.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions