Compare Oracle PL/SQL package bodies with converted PostgreSQL or OceanBase SQL code.
The tool matches package files by filename, extracts PACKAGE BODY content, splits it into FUNCTION and PROCEDURE units, then reports word-level changes in CSV files.
- Batch compare two directories of SQL files.
- Extract regular and
EDITIONABLE PACKAGE BODYdefinitions. - Split top-level and nested functions/procedures, including no-parameter subprograms such as
PROCEDURE P ISandFUNCTION F RETURN NUMBER IS. - Match functions and procedures by name, nesting depth, and parent unit.
- Output summary and function/procedure-level detail CSV files.
- Generate normalized text files for manual review.
- Provide two comparison schemes:
- Semantic normalization: ignores common equivalent Oracle to PostgreSQL/OceanBase syntax changes.
- Whitespace normalization: removes comments and whitespace while preserving syntax changes.
- Python 3.8+
- No third-party Python dependencies
python3 sql_diff.py --old ./examples/old --new ./examples/new --output ./outputRequired arguments:
--old: directory containing original Oracle PL/SQL files--new: directory containing converted SQL files
Optional arguments:
--output: output directory, defaults tooutput
The output directory contains:
output/
├── diff_summary.csv
├── diff_detail.csv
├── normalized/
├── old_body/
└── new_body/
diff_summary.csv contains file-level statistics for both schemes.
diff_detail.csv contains function/procedure-level statistics. The 方案 column identifies whether a row belongs to semantic normalization or whitespace normalization.
Both schemes compare code at the FUNCTION/PROCEDURE level:
- Extract
PACKAGE BODYcontent from each matched file. - Split package bodies into independent top-level and nested functions/procedures.
- Match old and new units by function/procedure name, nesting depth, and parent unit.
- Normalize each matched unit.
- Run word-level diff on the normalized token sequences.
- Aggregate changed word counts and changed function/procedure counts.
Nested subprograms are counted as separate units. Parent units are compared using only their own lines, so nested child functions/procedures are not double-counted in the parent word totals.
Semantic normalization removes or rewrites common equivalent Oracle to PostgreSQL/OceanBase syntax differences before comparing tokens. It is intended to highlight business logic changes instead of migration syntax noise.
It currently treats these common conversions as equivalent:
NUMBERtoNUMERICDEFAULTto:=NVLtoCOALESCESYSDATEtoCURRENT_DATEROWNUMtoROW_NUMBER- trailing
AStoIS - removes
DECLARE, function names afterEND, and selected Oracle-only hints such asDETERMINISTIC
It also removes comments and blank lines, uppercases tokens, and tokenizes the normalized code before running word-level diff.
Whitespace normalization only removes comments and whitespace, uppercases tokens, and keeps syntax differences. It counts all word-level changes, including equivalent syntax conversions such as NVL to COALESCE.
This project uses rule-based parsing instead of a full SQL parser. It is intended for migration review metrics, not as proof of semantic equivalence. Complex PL/SQL formatting, dynamic SQL, or unusual package structures may require manual review.
Run the test suite:
python3 -m unittest discover -s testsRun a syntax check:
python3 -m py_compile sql_diff.py