Skip to content

Latest commit

 

History

History
58 lines (47 loc) · 3.94 KB

File metadata and controls

58 lines (47 loc) · 3.94 KB

publish_pandas

Source: ds_platform_utils.metaflow.pandas.publish_pandas

Writes a pandas DataFrame to Snowflake.

Signature

publish_pandas(
    table_name: str,
    df: pd.DataFrame,
    add_created_date: bool = False,
    chunk_size: int | None = None,
    compression: Literal["snappy", "gzip"] = "snappy",
    warehouse: Literal["XS", "MED", "XL"] = None,
    parallel: int = 4,
    quote_identifiers: bool = False,
    auto_create_table: bool = False,
    overwrite: bool = False,
    use_logical_type: bool = True,
    use_utc: bool = True,
    use_s3_stage: bool = False,
    table_definition: list[tuple[str, str]] | None = None,
) -> None

What it does

  • Validates DataFrame input.
  • Writes directly via write_pandas or via S3 stage flow for large data.
  • Adds a Snowflake table URL to Metaflow card output.

Parameters

Parameter Type Required Description
table_name str Yes Destination Snowflake table name.
df pd.DataFrame Yes DataFrame to publish.
add_created_date bool No If True, adds a created_date UTC timestamp column before publish.
chunk_size int | None No Number of rows per uploaded chunk. If not provided, calculate based on DataFrame size.
compression Literal["snappy", "gzip"] No Compression codec used for staged parquet files.
warehouse str | None No Snowflake warehouse override for this operation. Supports XS/MED/XL shortcuts or a full warehouse name.
parallel int No Number of upload threads used by write_pandas path.
quote_identifiers bool No If False, passes identifiers unquoted so Snowflake applies uppercase coercion.
auto_create_table bool No If True, creates destination table when missing.
overwrite bool No If True, replaces existing table contents.
use_logical_type bool No Controls parquet logical type handling when loading data.
use_utc bool No If True, uses UTC timezone for Snowflake session.
use_s3_stage bool No If True, publishes via S3 stage flow; otherwise uses direct write_pandas.
table_definition list[tuple[str, str]] | None No Optional Snowflake table schema; used by S3 stage flow when table creation is needed.

Returns: None

Limitations

  • When use_s3_stage=True, some column data types may not map exactly as expected between pandas/parquet and Snowflake.
  • If needed, provide an explicit table_definition and/or cast columns before publishing to avoid data type mismatches.