Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
07a5a0f
feat(ui): add flagged status to test definitions (TG-976 Phase 1)
rboni-dk Feb 6, 2026
84b0ad1
feat(ui): add notes dialog, review column, and sorting improvements (…
rboni-dk Feb 11, 2026
afcb822
Merge remote-tracking branch 'origin/enterprise' into feat/TG-976-fla…
Feb 12, 2026
07b6327
test: extract testable logic from command functions and add 67 unit t…
rboni-dk Feb 13, 2026
b57e70f
fix: resolve ruff linting issues (import sorting, dict literals, unus…
rboni-dk Feb 13, 2026
323cd82
Merge remote-tracking branch 'origin/enterprise' into feat/TG-976-fla…
Feb 16, 2026
e550cdb
Merge remote-tracking branch 'origin/enterprise' into feat/TG-976-fla…
Feb 17, 2026
002d5da
refactor: remove pydantic and streamlit-pydantic dependencies
rboni-dk Feb 19, 2026
c123552
feat(mcp): add MCP server foundation with JWT auth and ping tool
rboni-dk Feb 20, 2026
87f7205
refactor: hide MCP behind feature flag and standardize boolean settings
rboni-dk Feb 24, 2026
2e80e9a
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Feb 24, 2026
09fe3c4
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Feb 24, 2026
4ec8498
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Feb 24, 2026
a26b4a6
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Feb 25, 2026
fb909e8
Merge branch 'main' into 'enterprise'
aarthy-dk Feb 25, 2026
0db6fdb
fix: improve upgrade commands to update revision after each script
aarthy-dk Feb 25, 2026
67749e4
refactor: decouple RBAC from enterprise plugin via PluginHook
rboni-dk Feb 25, 2026
8f69936
feat(mcp): add P0 tools, resources, prompts and model extensions
rboni-dk Feb 23, 2026
c73dfcc
refactor(mcp): rewrite server instructions and handle plugin load errors
rboni-dk Feb 26, 2026
ea6b600
ci(docker): generate third-party-notices file
aarthy-dk Feb 26, 2026
cdc95d8
Merge branch 'feat/TG-989-mcp-server' into 'enterprise'
Feb 26, 2026
4d93d41
Merge remote-tracking branch 'origin/enterprise' into aarthy/ci-impro…
Feb 26, 2026
12992df
Merge branch 'aarthy/ci-improvements' into 'enterprise'
Feb 26, 2026
0e5a630
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Feb 26, 2026
6533624
feat(projects): add a project membership to handle user role in a pro…
luis-dk Feb 5, 2026
6c232ef
feat(plugins): support multiple pages per plugin spec
luis-dk Feb 26, 2026
e654258
feat(auth): scope sidebar projects to user memberships
luis-dk Feb 26, 2026
b0c96a3
feat(ui): add Dialog component; fix Portal stacking context
luis-dk Feb 26, 2026
356ba70
feat(ui): add disabled prop to Toggle; add notIn form validator
luis-dk Feb 26, 2026
08a6a8a
fix(ui): misc JS fixes
luis-dk Feb 26, 2026
9b4e120
fix(ui): mirror utils.js JSDoc type improvement to static copy
luis-dk Feb 26, 2026
c3451b4
Apply 16 suggestion(s) to 2 file(s)
rboni-dk Feb 27, 2026
e2e95df
fix(ui): visual inconsistencies and navigation bug
luis-dk Mar 2, 2026
2761ffa
feat(mcp): polish inspector output and add validation
rboni-dk Mar 2, 2026
3100ae7
fix(ui): hide portal when main content scrolls
luis-dk Mar 2, 2026
b33feb0
Merge branch 'enterprise' of gitlab.com:dkinternal/testgen/dataops-te…
rboni-dk Mar 2, 2026
f1d29e1
fix(navigation): add project/permission checks and redirects
luis-dk Mar 3, 2026
62e490b
feat(catalog): add CSV import/export for metadata
rboni-dk Feb 18, 2026
e760cb6
Merge branch 'project-scoped-users' into 'enterprise'
Mar 3, 2026
9b46fdc
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Mar 3, 2026
6a54f3d
Merge remote-tracking branch 'origin/enterprise' into feat/TG-988-dat…
Mar 3, 2026
8fc8436
Merge remote-tracking branch 'origin/enterprise' into feat/TG-983-ext…
Mar 3, 2026
6f65fce
Merge branch 'feat/TG-983-extract-testable-logic' into 'enterprise'
Mar 3, 2026
6204b71
Merge remote-tracking branch 'origin/enterprise' into feat/TG-988-dat…
Mar 3, 2026
f4e876a
feat(mcp): enforce project-level permission scoping on all tools
rboni-dk Mar 3, 2026
d427220
Merge branch 'enterprise' of gitlab.com:dkinternal/testgen/dataops-te…
rboni-dk Mar 4, 2026
cb1c9dc
fix(ui): address MR review feedback for CSV metadata import (TG-988)
rboni-dk Mar 4, 2026
cacab09
refactor(mcp): hide internal test_type codes from user-facing output
rboni-dk Mar 4, 2026
e694400
fix(ui): skip entire row on CDE error, not just the field (TG-988)
rboni-dk Mar 4, 2026
6abdbb3
Merge branch 'feat/TG-988-data-catalog-csv-import' into 'enterprise'
Mar 5, 2026
aa0d057
Merge remote-tracking branch 'origin/enterprise' into feat/TG-989-mcp…
Mar 5, 2026
99e8781
refactor(mcp): replace global admin bypass with role-based permissions
rboni-dk Mar 5, 2026
2519e58
Merge remote-tracking branch 'origin/enterprise' into feat/TG-976-fla…
rboni-dk Mar 5, 2026
5e0d766
Merge branch 'feat/TG-989-mcp-p0-tools' into 'enterprise'
Mar 5, 2026
e3b0581
fix(ui): fix auth base class kwarg, review column spacing, and grants
rboni-dk Mar 6, 2026
1f2a5f8
fix(ui): address MR review feedback for notes dialog (TG-976)
rboni-dk Mar 6, 2026
7516fac
Merge branch 'feat/TG-976-flagged-test-definitions' into 'enterprise'
Mar 7, 2026
cffe34c
feat: support oauth for databricks
aarthy-dk Mar 6, 2026
1f1e883
Merge branch 'aarthy/databricks-oauth' into 'enterprise'
Mar 9, 2026
f50b454
refactor(profiling): replace yaml files with TG- conditional SQL temp…
aarthy-dk Feb 9, 2026
e13a7f5
fix: discrepancies between flavors in hygiene issues and test types
aarthy-dk Feb 9, 2026
085e45d
feat: add support for Oracle 12c+
aarthy-dk Feb 9, 2026
0e6be25
fix(sql server): make Dupe Rows test case sensitive
aarthy-dk Feb 9, 2026
911f81a
fix: discrepancies in Weekly Record Count test
aarthy-dk Feb 10, 2026
4b993b5
feat: add support for SAP HANA
aarthy-dk Feb 13, 2026
edd17bb
Merge branch 'aarthy/sap' into 'enterprise'
Mar 10, 2026
84e8eaf
misc: upgrade libraries
aarthy-dk Mar 5, 2026
3c55edb
Merge branch 'aarthy/security' into 'enterprise'
Mar 11, 2026
24e6299
fix(edit-monitors): bugs in form - validate required fields
aarthy-dk Mar 9, 2026
351d54d
fix(data catalog): remove test suite links for catalog role
aarthy-dk Mar 5, 2026
c1691b8
ci: bump base image to v12
Mar 12, 2026
e103b08
ci: handle hdbcli manylinux-only wheels on Alpine
aarthy-dk Mar 11, 2026
cab56a0
ci: register hdbcli dist-info so pip resolves transitive dep
aarthy-dk Mar 12, 2026
9b2ef2c
misc: upgrade fastapi
aarthy-dk Mar 12, 2026
7ec7f5e
Merge branch 'ci/TG-997-hdbcli-alpine-workaround' into 'enterprise'
Mar 12, 2026
0e47ec3
Merge remote-tracking branch 'origin/enterprise' into aarthy/required…
Mar 12, 2026
7925cc2
ci: clear pip cache in Docker images to fix Trivy false positive
aarthy-dk Mar 12, 2026
a59ddbe
Merge branch 'ci/clear-pip-cache-trivy' into 'enterprise'
Mar 12, 2026
86f2ac1
Merge remote-tracking branch 'origin/enterprise' into aarthy/required…
Mar 12, 2026
db9b626
ci: bump base image to v13
Mar 12, 2026
41549a4
Merge remote-tracking branch 'origin/enterprise' into aarthy/required…
Mar 12, 2026
094a15b
refactor(flavor): make FlavorService stateless with explicit params
rboni-dk Mar 16, 2026
d940b6f
Merge branch 'aarthy/required-fields' into 'enterprise'
Mar 17, 2026
c550e89
Merge remote-tracking branch 'origin/enterprise' into refactor/flavor…
Mar 17, 2026
3808819
Merge branch 'refactor/flavor-service-stateless' into 'enterprise'
Mar 17, 2026
8e68494
fix: update doc links
aarthy-dk Mar 18, 2026
97891e5
Merge branch 'docs' into 'enterprise'
Mar 18, 2026
8b39bbb
fix(monitors): generate freshness monitors when profiling data alread…
aarthy-dk Mar 20, 2026
4d760b6
Merge branch 'fix/TG-1003-missing-freshness-monitors' into 'enterprise'
Mar 20, 2026
f82e34c
refactor: introduce database_session context manager
rboni-dk Mar 20, 2026
a0be715
feat: add safe_rerun to prevent data loss on Streamlit rerun
rboni-dk Mar 20, 2026
e43a3ab
refactor: replace st.rerun with safe_rerun in UI code
rboni-dk Mar 20, 2026
de20ee0
fix: use database_session() context manager for schedule query
rboni-dk Mar 20, 2026
292f1ab
refactor: track writes via after_flush, clear cache in safe_rerun
rboni-dk Mar 20, 2026
ff7917f
refactor: remove cache-clearing side effects from model mutations
rboni-dk Mar 20, 2026
df62103
refactor: remove redundant cache clears from view callsites
rboni-dk Mar 20, 2026
671e702
fix: always clear cache in safe_rerun
rboni-dk Mar 20, 2026
adc7883
fix(TG-1005): correct Daily_Record_Ct operator and Email_Format looku…
rboni-dk Mar 21, 2026
5b2d69d
fix(TG-1005): make MSSQL calendar gap lookups consistent with other f…
rboni-dk Mar 23, 2026
b1100e0
fix(TG-1005): fix cross-flavor test type bugs found by validation suite
rboni-dk Mar 23, 2026
5732304
refactor: remove explicit commits from model mutations
rboni-dk Mar 20, 2026
8122a06
Merge branch 'feat/TG-1005-source-data-validation' into 'enterprise'
Mar 24, 2026
3559ae0
Merge remote-tracking branch 'origin/enterprise' into feat/TG-1004-sa…
Mar 24, 2026
690fcfb
Merge branch 'feat/TG-1004-safe-rerun' into 'enterprise'
Mar 24, 2026
3b18a69
fix(ui): render portals/tooltips on top of Streamlit dialogs
luis-dk Mar 24, 2026
9e7ce46
fix(ui): add support for caption in select options
luis-dk Mar 24, 2026
24ef437
Merge branch 'ui-bug-fixes-151' into 'enterprise'
Mar 24, 2026
8e5e7b2
feat(mcp): sanitize errors at tool/resource/prompt boundary
rboni-dk Mar 25, 2026
44aaa3f
Merge branch 'feat/TG-1008-mcp-error-sanitizing' into 'enterprise'
Mar 25, 2026
8bdaa83
feat: pii masking, xde, hash fingerprints
aarthy-dk Mar 18, 2026
31c0361
fix(monitor): use excluded days from schedule if active
aarthy-dk Mar 24, 2026
36aa858
fix: remove summary from edit table group dialog
aarthy-dk Mar 24, 2026
a326e67
fix(data catalog): add help text
aarthy-dk Mar 24, 2026
0146128
fix: updates to pii masking and xdes
aarthy-dk Mar 24, 2026
4cdb3f2
fix: scheduler shutdown race — check _stopping before blocking on _re…
rboni-dk Mar 25, 2026
2ffaf19
Merge branch 'feat/TG-999-pii-masking' into 'enterprise'
Mar 25, 2026
9a0daf0
Merge remote-tracking branch 'origin/enterprise' into fix/scheduler-s…
Mar 25, 2026
285e70b
Merge branch 'fix/scheduler-shutdown-race' into 'enterprise'
Mar 25, 2026
592ce63
fix: database urls detected as emails
aarthy-dk Mar 25, 2026
39c82cf
fix(data catalog): improve flag styling
aarthy-dk Mar 25, 2026
8d3667b
fix: monitor generation fails to find test suite
aarthy-dk Mar 25, 2026
6c82a06
fix: edge case in column history dialog
aarthy-dk Mar 25, 2026
7f105d5
fix(schedules dialog): bug in pausing/deleting
aarthy-dk Mar 26, 2026
2ead7ac
fix(run tests): hide button in dialog after clicking link
aarthy-dk Mar 26, 2026
62e1b4c
fix(table group): remove stepper from edit dialog
aarthy-dk Mar 26, 2026
ffb41a9
feat(test suites): add search filter
aarthy-dk Mar 26, 2026
3c496c6
Merge branch 'qa-fixes' into 'enterprise'
Mar 26, 2026
1019526
fix: truncate timestamps to date in Daily_Record_Ct measure formula
rboni-dk Mar 25, 2026
25636ca
fix: cast timestamps to date in Daily_Record_Ct source data lookup (D…
rboni-dk Mar 25, 2026
d3d3be1
Merge branch 'feat/TG-1005-daily-record-ct-date-truncation' into 'ent…
aarthy-dk Mar 26, 2026
f5747aa
fix(ui): portals were closing when a nested portal opened
luis-dk Mar 26, 2026
a37b106
Merge branch 'fix-inoperable-croninput' into 'enterprise'
aarthy-dk Mar 27, 2026
9151f51
ci: disable pip cache in dockerfiles
aarthy-dk Mar 26, 2026
2512c9e
fix(emails): move app links to left
aarthy-dk Mar 26, 2026
14f5e8c
feat(wizards): make steppers clickable
aarthy-dk Mar 26, 2026
8d461c3
fix: update error text to be consistent
aarthy-dk Mar 27, 2026
04dbd1b
fix(test definitions): handle empty suite
aarthy-dk Mar 27, 2026
b94c506
fix(scorecard): error on adding notification
aarthy-dk Mar 27, 2026
1dfbfd9
fix(copy/move tests): unique key constraints
aarthy-dk Mar 27, 2026
e15d087
Merge branch 'qa-fixes' into 'enterprise'
Mar 27, 2026
7a1333e
ci: bump base image to v14
Mar 27, 2026
104e0b8
fix(data catalog): prefix icons disappear after saving
aarthy-dk Mar 27, 2026
66f244e
fix: missing imports
aarthy-dk Mar 27, 2026
71c974d
security: upgrade PyJWT library
aarthy-dk Mar 27, 2026
62201e6
Merge branch 'qa-fixes' into 'enterprise'
Mar 27, 2026
1a08184
release: 5.0.2 -> 5.9.4
aarthy-dk Mar 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ venv/
.ruff_cache/
deploy
!deploy/install_*.sh
!deploy/generate_third_party_notices.py
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# DataOps Data Quality TestGen
![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join)
![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/testgen/what-is-testgen/) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join)

*<p style="text-align: center;">DataOps Data Quality TestGen, or "TestGen" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.</p>*

## Documentation

[DataOps TestGen Overview](https://datakitchen.io/dataops-testgen-product/)

[DataOps TestGen Documentation](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help)
[DataOps TestGen Documentation](https://docs.datakitchen.io/testgen/what-is-testgen/)


## Features
Expand Down Expand Up @@ -68,7 +68,7 @@ Once the installation completes, verify that you can login to the UI with the UR

### Optional: Run the TestGen demo setup

The [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.
The [Data Observability quickstart](https://docs.datakitchen.io/tutorials/quickstart-demo/) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.

```shell
python3 dk-installer.py tg run-demo
Expand Down Expand Up @@ -110,7 +110,7 @@ Within the virtual environment, install the TestGen package with pip.
pip install dataops-testgen
```

Verify that the [_testgen_ command line](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-commands-and-details) works.
Verify that the [_testgen_ command line](https://docs.datakitchen.io/testgen/cli-reference/) works.
```shell
testgen --help
```
Expand Down Expand Up @@ -165,7 +165,7 @@ Verify that you can login to the UI with the `TESTGEN_USERNAME` and `TESTGEN_PAS

### Optional: Run the TestGen demo setup

The [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.
The [Data Observability quickstart](https://docs.datakitchen.io/tutorials/quickstart-demo/) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.

```shell
testgen quick-start
Expand All @@ -187,7 +187,7 @@ python3 dk-installer.py tg delete-demo

### Upgrade to latest version

New releases of TestGen are announced on the `#releases` channel on [Data Observability Slack](https://data-observability-slack.datakitchen.io/join), and release notes can be found on the [DataKitchen documentation portal](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-release-notes/a/h1_1691719522). Use the following command to upgrade to the latest released version.
New releases of TestGen are announced on the `#releases` channel on [Data Observability Slack](https://data-observability-slack.datakitchen.io/join), and release notes can be found on the [DataKitchen documentation portal](https://docs.datakitchen.io/testgen/release-notes/). Use the following command to upgrade to the latest released version.

```shell
python3 dk-installer.py tg upgrade
Expand All @@ -203,7 +203,7 @@ python3 dk-installer.py tg delete

### Access the _testgen_ CLI

The [_testgen_ command line](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container.
The [_testgen_ command line](https://docs.datakitchen.io/testgen/cli-reference/) can be accessed within the running container.

```shell
docker compose exec engine bash
Expand All @@ -226,13 +226,13 @@ docker compose up -d
## What Next?

### Getting started guide
We recommend you start by going through the [Data Observability Overview Demo](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview).
We recommend you start by going through the [Data Observability Overview Demo](https://docs.datakitchen.io/tutorials/quickstart-demo/).

### Support
For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and post on the `#support` channel.

### Connect to your database
Follow [these instructions](https://docs.datakitchen.io/articles/dataops-testgen-help/connect-your-database) to improve the quality of data in your database.
Follow [these instructions](https://docs.datakitchen.io/testgen/connect-your-database/) to improve the quality of data in your database.

### Community
Talk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.
Expand Down
278 changes: 278 additions & 0 deletions deploy/generate_third_party_notices.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
#!/usr/bin/env python3
"""Generate THIRD-PARTY-NOTICES from installed Python packages.

Runs pip-licenses to collect metadata, filters out dev/internal packages,
and outputs a formatted notices file with summary table and per-package details.

Usage:
python generate_third_party_notices.py [--output PATH]
"""

import argparse
import json
import re
import subprocess
import sys
from datetime import date
from pathlib import Path

# Packages installed temporarily during Docker build — never in pyproject.toml.
_BUILD_ONLY = {"pip-licenses", "prettytable"}

# Internal DK packages not discoverable from pyproject.toml structure.
_EXTRA_INTERNAL = {"requests-extensions", "requests_extensions"}

# Packages whose license is reported as UNKNOWN by pip-licenses (keys are normalized).
LICENSE_OVERRIDES = {
"google-crc32c": "Apache-2.0",
"streamlit-camera-input-live": "MIT",
"streamlit-embedcode": "MIT",
"streamlit-keyup": "MIT",
"streamlit-toggle-switch": "MIT",
"streamlit-vertical-slider": "MIT",
"streamlit-faker": "Apache-2.0",
}


def _normalize(name: str) -> str:
"""Normalize package name per PEP 503 (lowercase, hyphens/underscores/dots → hyphen)."""
return re.sub(r"[-_.]+", "-", name).lower()


def _parse_pkg_name(requirement: str) -> str:
"""Extract normalized package name from a PEP 508 requirement string."""
raw = re.split(r"[><=!~\[;@\s]", requirement, maxsplit=1)[0].strip()
return _normalize(raw)


def _load_pyproject(path: Path) -> dict:
if sys.version_info >= (3, 11):
import tomllib
else:
import tomli as tomllib # type: ignore[no-redef]
with open(path, "rb") as f:
return tomllib.load(f)


def _find_pyprojects(repo_root: Path) -> list[Path]:
"""Return pyproject.toml paths for root, submodule, and plugins."""
candidates = [repo_root / "pyproject.toml", repo_root / "testgen" / "pyproject.toml"]
for plugins_dir in [repo_root / "plugins", repo_root / "testgen" / "plugins"]:
if plugins_dir.is_dir():
candidates.extend(sorted(plugins_dir.glob("*/pyproject.toml")))
return [p for p in candidates if p.exists()]


def _resolve_transitive(names: set[str]) -> set[str]:
"""Expand a set of normalized package names to include all their transitive dependencies."""
from importlib.metadata import requires, PackageNotFoundError

resolved: set[str] = set()
queue = list(names)
while queue:
name = queue.pop()
norm = _normalize(name)
if norm in resolved:
continue
resolved.add(norm)
try:
reqs = requires(name) or []
except PackageNotFoundError:
try:
reqs = requires(norm) or []
except PackageNotFoundError:
continue
for req in reqs:
if "; extra ==" in req or "; " in req:
continue
dep_name = _parse_pkg_name(req)
if dep_name and dep_name not in resolved:
queue.append(dep_name)
return resolved


def _build_exclude_sets(repo_root: Path) -> tuple[set[str], set[str]]:
"""Read pyproject.toml files to build dev-only and internal package sets."""
dev_direct: set[str] = set(_BUILD_ONLY)
internal: set[str] = set(_EXTRA_INTERNAL)

for pyproject_path in _find_pyprojects(repo_root):
data = _load_pyproject(pyproject_path)

project_name = data.get("project", {}).get("name")
if project_name:
internal.add(project_name)

for deps in data.get("project", {}).get("optional-dependencies", {}).values():
for dep in deps:
dev_direct.add(_parse_pkg_name(dep))

# Expand dev deps transitively, then subtract anything reachable from the main
# package. This keeps shared deps (e.g. requests, urllib3) in the runtime set.
dev_all = _resolve_transitive(dev_direct)
runtime_all = _resolve_transitive(internal)
dev_only = dev_all - runtime_all
return dev_only, internal


def _find_repo_root() -> Path:
"""Walk up from this script to find the repo root (contains pyproject.toml with 'testgen' subdir)."""
# Script lives at <root>/testgen/deploy/ or is called from repo root
script_dir = Path(__file__).resolve().parent
for candidate in [script_dir.parent.parent, script_dir.parent, Path.cwd()]:
if (candidate / "pyproject.toml").exists() and (candidate / "testgen" / "pyproject.toml").exists():
return candidate
# Fallback: just use empty sets (Docker build context may not have root pyproject.toml)
return script_dir


def normalize_license(name: str, lic: str) -> str:
if _normalize(name) in LICENSE_OVERRIDES:
return LICENSE_OVERRIDES[_normalize(name)]
if not lic or lic == "UNKNOWN":
return "UNKNOWN"
if "Apache" in lic and len(lic) > 50:
return "Apache-2.0"
return lic


def extract_copyright(license_text: str) -> str | None:
if not license_text:
return None
lines: list[str] = []
seen: set[str] = set()
for line in license_text.split("\n"):
stripped = line.strip()
if re.match(r"(?i)copyright\s", stripped) and stripped not in seen:
lines.append(stripped)
seen.add(stripped)
return "\n".join(lines) if lines else None


def get_packages() -> list[dict]:
result = subprocess.run(
[
sys.executable, "-m", "piplicenses",
"--format=json",
"--with-urls",
"--with-license-file",
"--with-notice-file",
"--no-license-path",
],
capture_output=True,
text=True,
check=True,
)
return json.loads(result.stdout)


def generate(packages: list[dict], dev_only: set[str], internal: set[str]) -> str:
runtime = [
pkg for pkg in packages
if _normalize(pkg["Name"]) not in internal and _normalize(pkg["Name"]) not in dev_only
]
runtime.sort(key=lambda p: p["Name"].lower())

lines: list[str] = []

# Header
lines.append("THIRD-PARTY SOFTWARE NOTICES AND INFORMATION")
lines.append("=" * 60)
lines.append("")
lines.append("DataOps TestGen Enterprise")
lines.append(f"Copyright (c) {date.today().year} DataKitchen, Inc.")
lines.append("")
lines.append("This product includes software developed by third parties.")
lines.append("The following sets forth attribution notices for third-party")
lines.append("software that may be contained in portions of this product.")
lines.append("")
lines.append(f"Generated: {date.today().isoformat()}")
lines.append(f"Runtime dependencies: {len(runtime)}")
lines.append("")
lines.append("")

# Summary table
lines.append("-" * 60)
lines.append("SUMMARY")
lines.append("-" * 60)
lines.append("")
lines.append(f"{'Package':<40s} {'Version':<16s} {'License'}")
lines.append(f"{'-' * 40} {'-' * 16} {'-' * 30}")
for pkg in runtime:
lic = normalize_license(pkg["Name"], pkg["License"])
lines.append(f"{pkg['Name']:<40s} {pkg['Version']:<16s} {lic}")

lines.append("")
lines.append("")

# Detailed notices
lines.append("-" * 60)
lines.append("DETAILED NOTICES")
lines.append("-" * 60)

for pkg in runtime:
name = pkg["Name"]
version = pkg["Version"]
lic = normalize_license(name, pkg["License"])
url = pkg.get("URL", "")
license_text = pkg.get("LicenseText", "")
notice_text = pkg.get("NoticeText", "")

lines.append("")
lines.append("=" * 60)
lines.append(f"{name} {version}")
lines.append(f"License: {lic}")
if url and url != "UNKNOWN":
lines.append(f"URL: {url}")
lines.append("=" * 60)

copyright_line = extract_copyright(license_text)
if copyright_line:
lines.append("")
lines.append(copyright_line)

if notice_text and notice_text.strip() and notice_text.strip() != "UNKNOWN":
lines.append("")
lines.append("NOTICE:")
lines.append(notice_text.strip())

if license_text and license_text.strip() and license_text.strip() != "UNKNOWN":
text = license_text.strip()
# Abbreviate long Apache 2.0 boilerplate to the standard short form
if len(text) > 3000 and "apache" in text.lower():
lines.append("")
lines.append("Licensed under the Apache License, Version 2.0.")
lines.append("You may obtain a copy of the License at")
lines.append("")
lines.append(" http://www.apache.org/licenses/LICENSE-2.0")
lines.append("")
lines.append("Unless required by applicable law or agreed to in writing,")
lines.append("software distributed under the License is distributed on an")
lines.append('"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND.')
else:
lines.append("")
lines.append(text)

lines.append("")
return "\n".join(lines)


def main() -> None:
parser = argparse.ArgumentParser(description="Generate THIRD-PARTY-NOTICES")
parser.add_argument("--output", default=None, help="Output file path (default: stdout)")
args = parser.parse_args()

repo_root = _find_repo_root()
dev_only, internal = _build_exclude_sets(repo_root)
packages = get_packages()
content = generate(packages, dev_only, internal)

if args.output:
with open(args.output, "w") as f:
f.write(content)
else:
print(content)


if __name__ == "__main__":
main()
Loading
Loading