Release notes

0.6.1

__ Breaking Changes__

Extract has been refactored to 3 different scripts: extract-schema, extract-data and extract-script

0.6.0

New Feature:

Support for Jinja templating everywhere
area property is now ignored in YAML files
Support for Amazon Redshift and Snowflake
Quickstart documentation upgraded
single command setup and run using starlake.sh / starlake.cmd
Updated quickstart with docker use
Infer schema now recognize date as date not timestamp

0.5.2

New Feature:

Domain & Jobs delivery in rest api

0.5.1

Bug Fix:

Support dynamic value for comet metadata through rest api.

0.5.0

New Feature:

Add Server mode Bug Fix:
Extensions may be defined at the domain level

0.4.2

Bug Fix:

Use Spark Project Jetty shaded class to remove extra jetty dependency in Starlake server

0.4.1

New feature:

Added "serve --port 7070" to start starlake in server mode and wait for requests

0.4.0

New feature:

Support any source to any sink using kafkaload including sink and source that are not kafka. This has been possible at the cost of a breaking change
Support table and column remarks extraction on DB2 iSeries databases

CI:

remove support of github registry
Remove scala 2.11 support

0.3.26

New feature:

Support JINJA in autojob
Support external views defined using JINJA
File Splitter allow to split file based on first column or position in line.

0.3.25

New feature:

Add ACL Graph generation

0.3.24

Bug Fix:

Improve GraphViz Generation

0.3.23

Bug Fix:

Generate final name in Graphiz diagram

0.3.22

New feature:

Improve cli doc generation. Extra doc can be added in docs/merge/cli folder
prepare to deprecate xml tag in metadata section.

Bug Fix:

Code improvement: JDBC is handled as a generic sink
add extra parenthesis in BQ queries only for SELECT and WITH requests

0.3.21

New feature:

Reduce assembly size
Update to sbt 1.7.1
Add interactive mode for transform with csv, json and table output formats
Improve FS Sink handling

Bug Fix:

Support empty env files

0.3.20

Bug Fix:

Keep retrocompatibility with scala 2.11

0.3.19

New feature:

Handle Mixed XSD / YML ingestion & validation
Support JSON / XML descriptions in XLS files
Support arrays in XLS files

Bug Fix:

Support file system sink options in autojob

0.3.18

New feature:

Enhance XLS support for escaping char
Support HTTP Stream Source
Support XSD Validation
Transform jobs now report on the number of affected rows.

Bug Fix:

Regression return value of an autojob

0.3.17

New feature:

Support extra dsv options in conf file
support any option stored in metadata.options as an option for the reader.
Support VSCode Development

0.3.16

New feature:

Upgrade Kafka libraries
Simplify removal of comments in autojobs SQL statements.

0.3.15

New feature:

deprecate usage of schema, schemaRefs in domains and dataset in autojobs. Prefer the use of table and tableRefs

Bug Fix:

fix regression on Merge mode without Timestamp option

0.3.14

Bug Fix:

Xls2Yml - Get a correct sheet name based on the schema name field

0.3.13

New feature:

Improve XLS support for long name
Handle rate limit exceeded by setting COMET_GROUPED_MAX to avoid HTTP 429 on some cloud providers.

0.3.12

Bug Fix: reorder transformation on attributes as follows:

rename columns
- run script fields
- apply transformations (privacy: "sql: ...")
- remove ignore fields
- remove input filename column

0.3.11

Bug Fix:

Handle field relaxation when in Append Mode and table does not exist.

0.3.9 / 0.3.10 / 0.3.11

Bug Fix:

Make fields in rejected table optional

0.3.8

New feature:

Rollback on support for kafka.properties files. It is useless since we already have a server-options properties.

0.3.7

New feature:

Improve XLS support for metadata

0.3.6

New feature:

Autoload kafka.properties file from metadata directory.

0.3.5

New feature:

Parallel copy of files when loading and archiving
Support renaming of domains and schemas in XLS

0.3.3 / 0.3.4

Fixing release process

0.3.2

New feature:

import step can be limited to one or more domains

0.3.1

New feature:

Update Kafka / BigQuery libraries
Add new preset env vars
Allow renaming of domains and schemas

0.3.0

New feature:

Vars in assertions are now substituted at load time
Support SQL statement in privacy phase
Support parameterized semantic types
Add support for generic sink
Allow use of custom deserializer on Kafka source

0.2.10

New feature:

Drop Java 1.8 prerequisite for compilation
Support custom database name for Hive compatible metastore
Support custom dataset name in BQ

0.2.9

New feature:

Drop support for Spark 2.3.X
Allow table renaming on write
Any Spark supported input is now allowed
Env vars in env.yml files

0.2.8

New feature:

Generate DDL from YML files with support for BigQuery, Snowflake, Synapse and Postgres #51 / #56
Improve XLS handling: Add support for presql / postsql, tags, primary and foreign keys #59
Add optional application of row & column level security
Databricks Support
Signification reduction of memory consumption
Support application.conf file in metadata folder (COMET_METADATA_FS and COMET_ROOT must still be passed as env variables)

Bug Fix:

Include env var and option when running presql in ingestion mode #58

0.2.7

New feature:

Support merging dataset with updated schema
Support publishing to github packages
Reduce number of dependencies
Allow Audit sink name configuration from environment variable
Dropped support for elasticsearch 6

Bug Fix:

Support timestamps as long in XML & JSOn FIles

0.2.6

New feature:

Support XML Schema inference
Support the ability to reject the whole file on error
Improve error reporting
Support engine on task SQL (query pushdown to BigQuery)
Support last(n) partition on merge
Added new env var to control parititioning COMET_SPARK_SQL_SOURCES_PARTITION_OVERWRITE_MODE
Added env var to control BigQuery materialization on pushdown queries COMET_SPARK_BIGQUERY_MATERIALIZATION_PROJECT, COMET_SPARK_BIGQUERY_MATERIALIZATION_DATASET (default to materalization)
Added env var to control BigQuery read data format COMET_SPARK_BIGQUERY_READ_DATA_FORMAT (default to AVRO)
When COMET_MERGE_OPTIMIZE_PARTITION_WRITE is set and dynamic partition is active, only write partition containing new records or records to be deleted or updated for BQ (handled by Spark by default for FS).
Add VALIDATE_ON_LOAD (comet-validate-on-load) property to raise an exception if one of the domain/job YML file is invalid. default to false
Add custom file extensions property in Domain import default-file-extensions and env var COMET_DEFAULT_FILE_EXTENSIONS Bug Fix:
Loading empty files when the schema contains script fields
Applying default value for an attribute when value in the input data is null
Transformation job with BQ engine fails when no views block is defined
XLS2YML : remove non-breaking spaces from Excel file cells to avoid parsing errors
Fix merge using timestamp option
Json ingestion fails with complex array of objects
Remove duplicates on incoming when existingDF does not exist or is empty
Parse Sink options correctly
Handle extreme cases where audit lock raise an exception on creation
Handle files without extension in the landing zone
Store audit log with batch priority on BigQuery

0.2.4 / 0.2.5

Bug Fix:

Handle Jackson bug

0.2.3

New feature:

Add ability to ignore some fields (only top level fields supported)
BREAKING CHANGE: Handle multiple schemas during extraction. Update your extract configurations before migrating to this version.
Improve InferSchemaJob
Include primary keys & foreign keys in JDBC2Yml

Bug Fix:

Handle rename in JSON / XML files
Handle timestamp fields in JSON / XML files
Do not partition rejected files
Add COMET_CSV_OUTPUT_EXT env var to customize filename extension after ingestion when CSV is active.

0.2.2

New feature:

Use the same variable for Lock timeout
Improve logging when locking file fails
File sink while still the default is now controlled by the sink tag in the YAML file. The option sink-to-file is removed and used for testing purpose only.
Allow custom topic name for comet_offsets
Add ability to coalesce(int) to kafka offloading feature
Attributes may now be declared as primary and or foreign keys even though no check is made.
Export schema and relations(PK / FK) as dot (graphviz) files.
Support saving comet offsets to filesystem instead of kafka using the new setting comet-offsets-mode = "STREAM"

Bug Fix:

Invalid YAML files produce now an error at startup instead of displaying a warning.

0.2.1

Version skipped

0.2.0

New feature:

Export all tables in JDBC2Yml generation
Include table & column names when meeting unknown column type in JDBC source schema
Better logging on forced conversion in JDBC2Yml
Compute Hive Statistics on Table & Partitions
DataGrip support with implementation of substitution for ${} in addition to {{}}
Improve logging
Add column type during for database extraction
The name attribute inside a job file should reflect the filename. This attribute will soon be deprecated
Allow Templating on jobs. Useful to generate Airflow / Oozie Dags from job.comet.yml/job.sql code
Switch from readthedocs to docusaurus
Add local and bigquery samples
Custom var pattern through sql-pattern-parameter in reference.conf

Bug Fix:

Avoid computing statistics on struct fields
Make database-extractor optional in application.conf

0.1.36

New feature:

Parameterize with Domain & Schema metadata in JDBC2Yml generation Bug Fix:

0.1.35

New feature:

Auto compile with scala 2.11 for Spark 2 and with scala 2.12 for Spark 3. [457]
Performance optimization when using Privacy Rules. [459]
Rejected area and audit logs support can have their own write format (default-rejected-write-format and default-audit-write-format properties)
Deep JSON & XML files are now validated against the schema
Privacy is applied on deep JSON & XML inputs [461]
Domains & Jobs may be defined in subdirectories allowing better metatdata files organization [462]
Substitute variables through CLI & env files in views, assertions, presql, main sql and post sql requests [462]
Semantic type Date supports dates with MMM month representation [463]
Split reference.conf into multiple files. [460]
Support kafka Source & Sink through Spark Streaming [460]
Add an alternative way for applying privacy on XML files.[466]
Generate Excel files from YML files
Generate YML file from Database Schema

Bug Fix:

Make Jackson lib provided. [457]
Support Spark 2.3. by not using Dataframe.isEmpty [457]
comet_input_file_name missing when ingesting Position files [466]
Apply postsql queries on the accepted DataFrame [466]
Check that scripted fields are defined at the end of the schema in the YML file [#384]

0.1.34

New feature:

Allow sink options to be defined in YML instead of Spark Submit. [#450] [#454]

Bug Fix:

Parse dates with yyyyMM format correctly [#451]
Fix error when saving a csv with an empty DataFrame [#451]
Keep column description in BQ tables when using Overwrite mode [#453]

0.1.29

Bug Fix:

Support correctly merge mode in BQ [#449]
Fix for sinking XML to BQ [#448]

0.1.27

New feature:

Kafka Support improved

0.1.26

New feature:

Optionally sink to file using property sink-to-file = ${?COMET_SINK_TO_FILE}

Bug Fix:

Sink name was ignored and always considered as None

0.1.23

New feature:

YML files are now renamed with the suffix .comet.yml
Comet Schema is now published on SchemaStore. This allows Intellisense in VSCode & Intellij
Assertions may now be executed as part of the Load and transform processes
Shared Assertions UDF may be defined and stored in COMET_ROOT/metadata/assertions
Views mays also be defined and shared in COMET_ROOT/metadata/views.
Views are accessible in the load and transform processes.
Domain may be now prefixed by the "load" tag. Defining a domain without the "load" tag is now deprecated
AutoJob may be now prefixed by the "transform" tag. Defining a autojob without the "transform" tag is now deprecated

Breaking Changes:

N.A.

Bug Fix:

Use Spark Application Id for JobID information to make auditing easier

0.1.22

New feature:

Expose a REST API to generate a Yaml Schema from an Excel file. [#387]
Support ingesting multiline complex JSON. [#391]
Support nested fields when generating schema for BigQuery tables. [#391]
Enhancements on Spark to BigQuery schema. [#395]
Support merging a part of a BigQuery Table, rather than all the Table. [#397]
Enable setting BigQuery intermediate format when sinking using ${?COMET_INTERMEDIATE_BQ_FORMAT}. [#398] [#400]
Enhancement on Merging mode: do not depend on parquet files when using BigQuery tables.

Dependencies:

Update sbt to 1.4.4 [#385]
Update scopt to 4.0.0 [#390]

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Release notes

0.6.1

0.6.0

0.5.2

0.5.1

0.5.0

0.4.2

0.4.1

0.4.0

0.3.26

0.3.25

0.3.24

0.3.23

0.3.22

0.3.21

0.3.20

0.3.19

0.3.18

0.3.17

0.3.16

0.3.15

0.3.14

0.3.13

0.3.12

0.3.11

0.3.9 / 0.3.10 / 0.3.11

0.3.8

0.3.7

0.3.6

0.3.5

0.3.3 / 0.3.4

0.3.2

0.3.1

0.3.0

0.2.10

0.2.9

0.2.8

0.2.7

0.2.6

0.2.4 / 0.2.5

0.2.3

0.2.2

0.2.1

0.2.0

0.1.36

0.1.35

0.1.34

0.1.29

0.1.27

0.1.26

0.1.23

0.1.22