Skip to content

Latest commit

 

History

History
392 lines (318 loc) · 14.2 KB

File metadata and controls

392 lines (318 loc) · 14.2 KB

Release notes

0.6.1

__ Breaking Changes__

  • Extract has been refactored to 3 different scripts: extract-schema, extract-data and extract-script

0.6.0

New Feature:

  • Support for Jinja templating everywhere
  • area property is now ignored in YAML files
  • Support for Amazon Redshift and Snowflake
  • Quickstart documentation upgraded
  • single command setup and run using starlake.sh / starlake.cmd
  • Updated quickstart with docker use
  • Infer schema now recognize date as date not timestamp

0.5.2

New Feature:

  • Domain & Jobs delivery in rest api

0.5.1

Bug Fix:

  • Support dynamic value for comet metadata through rest api.

0.5.0

New Feature:

  • Add Server mode Bug Fix:
  • Extensions may be defined at the domain level

0.4.2

Bug Fix:

  • Use Spark Project Jetty shaded class to remove extra jetty dependency in Starlake server

0.4.1

New feature:

  • Added "serve --port 7070" to start starlake in server mode and wait for requests

0.4.0

New feature:

  • Support any source to any sink using kafkaload including sink and source that are not kafka. This has been possible at the cost of a breaking change
  • Support table and column remarks extraction on DB2 iSeries databases

CI:

  • remove support of github registry
  • Remove scala 2.11 support

0.3.26

New feature:

  • Support JINJA in autojob
  • Support external views defined using JINJA
  • File Splitter allow to split file based on first column or position in line.

0.3.25

New feature:

  • Add ACL Graph generation

0.3.24

Bug Fix:

  • Improve GraphViz Generation

0.3.23

Bug Fix:

  • Generate final name in Graphiz diagram

0.3.22

New feature:

  • Improve cli doc generation. Extra doc can be added in docs/merge/cli folder
  • prepare to deprecate xml tag in metadata section.

Bug Fix:

  • Code improvement: JDBC is handled as a generic sink
  • add extra parenthesis in BQ queries only for SELECT and WITH requests

0.3.21

New feature:

  • Reduce assembly size
  • Update to sbt 1.7.1
  • Add interactive mode for transform with csv, json and table output formats
  • Improve FS Sink handling

Bug Fix:

  • Support empty env files

0.3.20

Bug Fix:

  • Keep retrocompatibility with scala 2.11

0.3.19

New feature:

  • Handle Mixed XSD / YML ingestion & validation
  • Support JSON / XML descriptions in XLS files
  • Support arrays in XLS files

Bug Fix:

  • Support file system sink options in autojob

0.3.18

New feature:

  • Enhance XLS support for escaping char
  • Support HTTP Stream Source
  • Support XSD Validation
  • Transform jobs now report on the number of affected rows.

Bug Fix:

  • Regression return value of an autojob

0.3.17

New feature:

  • Support extra dsv options in conf file
  • support any option stored in metadata.options as an option for the reader.
  • Support VSCode Development

0.3.16

New feature:

  • Upgrade Kafka libraries
  • Simplify removal of comments in autojobs SQL statements.

0.3.15

New feature:

  • deprecate usage of schema, schemaRefs in domains and dataset in autojobs. Prefer the use of table and tableRefs

Bug Fix:

  • fix regression on Merge mode without Timestamp option

0.3.14

Bug Fix:

  • Xls2Yml - Get a correct sheet name based on the schema name field

0.3.13

New feature:

  • Improve XLS support for long name
  • Handle rate limit exceeded by setting COMET_GROUPED_MAX to avoid HTTP 429 on some cloud providers.

0.3.12

Bug Fix: reorder transformation on attributes as follows:

  • rename columns
    • run script fields
    • apply transformations (privacy: "sql: ...")
    • remove ignore fields
    • remove input filename column

0.3.11

Bug Fix:

  • Handle field relaxation when in Append Mode and table does not exist.

0.3.9 / 0.3.10 / 0.3.11

Bug Fix:

  • Make fields in rejected table optional

0.3.8

New feature:

  • Rollback on support for kafka.properties files. It is useless since we already have a server-options properties.

0.3.7

New feature:

  • Improve XLS support for metadata

0.3.6

New feature:

  • Autoload kafka.properties file from metadata directory.

0.3.5

New feature:

  • Parallel copy of files when loading and archiving
  • Support renaming of domains and schemas in XLS

0.3.3 / 0.3.4

  • Fixing release process

0.3.2

New feature:

  • import step can be limited to one or more domains

0.3.1

New feature:

  • Update Kafka / BigQuery libraries
  • Add new preset env vars
  • Allow renaming of domains and schemas

0.3.0

New feature:

  • Vars in assertions are now substituted at load time
  • Support SQL statement in privacy phase
  • Support parameterized semantic types
  • Add support for generic sink
  • Allow use of custom deserializer on Kafka source

0.2.10

New feature:

  • Drop Java 1.8 prerequisite for compilation
  • Support custom database name for Hive compatible metastore
  • Support custom dataset name in BQ

0.2.9

New feature:

  • Drop support for Spark 2.3.X
  • Allow table renaming on write
  • Any Spark supported input is now allowed
  • Env vars in env.yml files

0.2.8

New feature:

  • Generate DDL from YML files with support for BigQuery, Snowflake, Synapse and Postgres #51 / #56
  • Improve XLS handling: Add support for presql / postsql, tags, primary and foreign keys #59
  • Add optional application of row & column level security
  • Databricks Support
  • Signification reduction of memory consumption
  • Support application.conf file in metadata folder (COMET_METADATA_FS and COMET_ROOT must still be passed as env variables)

Bug Fix:

  • Include env var and option when running presql in ingestion mode #58

0.2.7

New feature:

  • Support merging dataset with updated schema
  • Support publishing to github packages
  • Reduce number of dependencies
  • Allow Audit sink name configuration from environment variable
  • Dropped support for elasticsearch 6

Bug Fix:

  • Support timestamps as long in XML & JSOn FIles

0.2.6

New feature:

  • Support XML Schema inference
  • Support the ability to reject the whole file on error
  • Improve error reporting
  • Support engine on task SQL (query pushdown to BigQuery)
  • Support last(n) partition on merge
  • Added new env var to control parititioning COMET_SPARK_SQL_SOURCES_PARTITION_OVERWRITE_MODE
  • Added env var to control BigQuery materialization on pushdown queries COMET_SPARK_BIGQUERY_MATERIALIZATION_PROJECT, COMET_SPARK_BIGQUERY_MATERIALIZATION_DATASET (default to materalization)
  • Added env var to control BigQuery read data format COMET_SPARK_BIGQUERY_READ_DATA_FORMAT (default to AVRO)
  • When COMET_MERGE_OPTIMIZE_PARTITION_WRITE is set and dynamic partition is active, only write partition containing new records or records to be deleted or updated for BQ (handled by Spark by default for FS).
  • Add VALIDATE_ON_LOAD (comet-validate-on-load) property to raise an exception if one of the domain/job YML file is invalid. default to false
  • Add custom file extensions property in Domain import default-file-extensions and env var COMET_DEFAULT_FILE_EXTENSIONS Bug Fix:
  • Loading empty files when the schema contains script fields
  • Applying default value for an attribute when value in the input data is null
  • Transformation job with BQ engine fails when no views block is defined
  • XLS2YML : remove non-breaking spaces from Excel file cells to avoid parsing errors
  • Fix merge using timestamp option
  • Json ingestion fails with complex array of objects
  • Remove duplicates on incoming when existingDF does not exist or is empty
  • Parse Sink options correctly
  • Handle extreme cases where audit lock raise an exception on creation
  • Handle files without extension in the landing zone
  • Store audit log with batch priority on BigQuery

0.2.4 / 0.2.5

Bug Fix:

0.2.3

New feature:

  • Add ability to ignore some fields (only top level fields supported)
  • BREAKING CHANGE: Handle multiple schemas during extraction. Update your extract configurations before migrating to this version.
  • Improve InferSchemaJob
  • Include primary keys & foreign keys in JDBC2Yml

Bug Fix:

  • Handle rename in JSON / XML files
  • Handle timestamp fields in JSON / XML files
  • Do not partition rejected files
  • Add COMET_CSV_OUTPUT_EXT env var to customize filename extension after ingestion when CSV is active.

0.2.2

New feature:

  • Use the same variable for Lock timeout
  • Improve logging when locking file fails
  • File sink while still the default is now controlled by the sink tag in the YAML file. The option sink-to-file is removed and used for testing purpose only.
  • Allow custom topic name for comet_offsets
  • Add ability to coalesce(int) to kafka offloading feature
  • Attributes may now be declared as primary and or foreign keys even though no check is made.
  • Export schema and relations(PK / FK) as dot (graphviz) files.
  • Support saving comet offsets to filesystem instead of kafka using the new setting comet-offsets-mode = "STREAM"

Bug Fix:

  • Invalid YAML files produce now an error at startup instead of displaying a warning.

0.2.1

  • Version skipped

0.2.0

New feature:

  • Export all tables in JDBC2Yml generation
  • Include table & column names when meeting unknown column type in JDBC source schema
  • Better logging on forced conversion in JDBC2Yml
  • Compute Hive Statistics on Table & Partitions
  • DataGrip support with implementation of substitution for ${} in addition to {{}}
  • Improve logging
  • Add column type during for database extraction
  • The name attribute inside a job file should reflect the filename. This attribute will soon be deprecated
  • Allow Templating on jobs. Useful to generate Airflow / Oozie Dags from job.comet.yml/job.sql code
  • Switch from readthedocs to docusaurus
  • Add local and bigquery samples
  • Custom var pattern through sql-pattern-parameter in reference.conf

Bug Fix:

  • Avoid computing statistics on struct fields
  • Make database-extractor optional in application.conf

0.1.36

New feature:

  • Parameterize with Domain & Schema metadata in JDBC2Yml generation Bug Fix:

0.1.35

New feature:

  • Auto compile with scala 2.11 for Spark 2 and with scala 2.12 for Spark 3. [457]
  • Performance optimization when using Privacy Rules. [459]
  • Rejected area and audit logs support can have their own write format (default-rejected-write-format and default-audit-write-format properties)
  • Deep JSON & XML files are now validated against the schema
  • Privacy is applied on deep JSON & XML inputs [461]
  • Domains & Jobs may be defined in subdirectories allowing better metatdata files organization [462]
  • Substitute variables through CLI & env files in views, assertions, presql, main sql and post sql requests [462]
  • Semantic type Date supports dates with MMM month representation [463]
  • Split reference.conf into multiple files. [460]
  • Support kafka Source & Sink through Spark Streaming [460]
  • Add an alternative way for applying privacy on XML files.[466]
  • Generate Excel files from YML files
  • Generate YML file from Database Schema

Bug Fix:

  • Make Jackson lib provided. [457]
  • Support Spark 2.3. by not using Dataframe.isEmpty [457]
  • comet_input_file_name missing when ingesting Position files [466]
  • Apply postsql queries on the accepted DataFrame [466]
  • Check that scripted fields are defined at the end of the schema in the YML file [#384]

0.1.34

New feature:

  • Allow sink options to be defined in YML instead of Spark Submit. [#450] [#454]

Bug Fix:

  • Parse dates with yyyyMM format correctly [#451]
  • Fix error when saving a csv with an empty DataFrame [#451]
  • Keep column description in BQ tables when using Overwrite mode [#453]

0.1.29

Bug Fix:

  • Support correctly merge mode in BQ [#449]
  • Fix for sinking XML to BQ [#448]

0.1.27

New feature:

  • Kafka Support improved

0.1.26

New feature:

  • Optionally sink to file using property sink-to-file = ${?COMET_SINK_TO_FILE}

Bug Fix:

  • Sink name was ignored and always considered as None

0.1.23

New feature:

  • YML files are now renamed with the suffix .comet.yml
  • Comet Schema is now published on SchemaStore. This allows Intellisense in VSCode & Intellij
  • Assertions may now be executed as part of the Load and transform processes
  • Shared Assertions UDF may be defined and stored in COMET_ROOT/metadata/assertions
  • Views mays also be defined and shared in COMET_ROOT/metadata/views.
  • Views are accessible in the load and transform processes.
  • Domain may be now prefixed by the "load" tag. Defining a domain without the "load" tag is now deprecated
  • AutoJob may be now prefixed by the "transform" tag. Defining a autojob without the "transform" tag is now deprecated

Breaking Changes:

  • N.A.

Bug Fix:

  • Use Spark Application Id for JobID information to make auditing easier

0.1.22

New feature:

  • Expose a REST API to generate a Yaml Schema from an Excel file. [#387]
  • Support ingesting multiline complex JSON. [#391]
  • Support nested fields when generating schema for BigQuery tables. [#391]
  • Enhancements on Spark to BigQuery schema. [#395]
  • Support merging a part of a BigQuery Table, rather than all the Table. [#397]
  • Enable setting BigQuery intermediate format when sinking using ${?COMET_INTERMEDIATE_BQ_FORMAT}. [#398] [#400]
  • Enhancement on Merging mode: do not depend on parquet files when using BigQuery tables.

Dependencies:

  • Update sbt to 1.4.4 [#385]
  • Update scopt to 4.0.0 [#390]