Releases: tensorflow/transform
TensorFlow Transform 1.1.0
Major Features and Improvements
- Improved resource usage for
tft.vocabularywhentop_kis set by removing
stages performing repetitive sorting.
Bug Fixes and Other Changes
- Support invoking Keras models inside the
preprocessing_fnusing
tft.make_and_track_objectwhenforce_tf_compat_v1=Falsewith TF2
behaviors enabled. - Fix an issue when computing the metadata for a function with automatic
control dependencies added where dependencies on inputs which should not be
evaluated was being retained. - Census TFT example: wrapped table initialization with a tf.init_scope() in
order to avoid reinitializing the table for each batch of data. - Stopped depending on
six. - Depends on
protobuf>=3.13,<4. - Depends on
tensorflow-metadata>=1.1.0,<1.2.0. - Depends on
tfx-bsl>=1.1.0,<1.2.0.
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Transform 1.0.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.29,<3. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<2.6. - Depends on
tensorflow-metadata>=1.0.0,<1.1.0. - Depends on
tfx-bsl>=1.0.0,<1.1.0.
Breaking Changes
tft.ptransform_analyzerhas been moved undertft.experimental. The order
of args in the API has also been changed.tft_beam.PTransformAnalyzerhas been moved undertft_beam.experimental.- The default value of the
drop_unused_featuresparameter to
TFTransformOutput.transform_raw_featuresis now True.
Deprecations
- N/A
TensorFlow Transform 0.30.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Removed the
dataset_schemamodule, most methods in it have been deprecated
since version 0.14. - Fix a bug where having an analyzer operate on the output of
tft.vocabulary
would cause it to evaluate incorrectly whenforce_tf_compat_v1=Falsewith
TF2 behaviors enabled. - Depends on
tensorflow-metadata>=0.30.0,<0.31.0. - Depends on
tfx-bsl>=0.30.0,<0.31.0.
Breaking Changes
DatasetMetadatano longer accepts a dict as its input schema.schemais
expected to be aSchemaproto now.- TF 1.15 specific APIs
apply_saved_modeland
apply_function_with_checkpointwere removed from thetftnamespace. They
are still available under thepretrained_modelsmodule. tft.AnalyzeDataset,tft.AnalyzeDatasetWithCache,
tft.AnalyzeAndTransformDatasetandtft.TransformDatasetwill use the
native TF2 implementation of tf.transform unless TF2 behaviors are
explicitly disabled. The previous behaviour can still be obtained by setting
tft.Context.force_tf_compat_v1=True.
Deprecations
- N/A
TensorFlow Transform 0.29.0
Major Features and Improvements
tft.AnalyzeAndTransformDatasetandtft.TransformDatasetcan now output
pyarrow.RecordBatches. This is controlled by a parameter
output_record_batcheswhich is set toFalseby default.
Bug Fixes and Other Changes
- Added
tft.make_and_track_objectto load and tracktf.Trackableobjects
created inside thepreprocessing_fn(for example, tf.hub models). This API
should only be used whenforce_tf_compat_v1=Falseand TF2 behavior is
enabled. - The
decodemethod of the available coders (tft.coders.CsvCoderand
tft.coders.ExampleProtoCoder) have been removed. These were deprecated in
the 0.25 release.
Canned TFXIO implementations
should be used to read and decode data instead. - Previously deprecated APIs were removed:
tft.uniques(replaced by
tft.vocabulary),tft.string_to_int(replaced by
tft.compute_and_apply_vocabulary),tft.apply_vocab(replaced by
tft.apply_vocabulary), andtft.apply_function(identity function). - Removed the
always_return_num_quantilesarg oftft.quantilesand
tft.bucketizewhich was deprecated in version 0.26. - Added support for
count_paramsmethod to theTransformFeaturesLayer.
This will allow to call Keras Model'ssummary()method if the model is
using theTransformFeaturesLayer. - Depends on
absl-py>=0.9,<0.13. - Depends on
tensorflow-metadata>=0.29.0,<0.30.0. - Depends on
tfx-bsl>=0.29.0,<0.30.0.
Breaking Changes
- Existing caches (for all analyzers) are automatically invalidated.
Deprecations
- N/A
TensorFlow Transform 0.28.0
Major Features and Improvements
- Large vocabularies are now computed faster due to partially parallelizing
VocabularyOrderAndWrite.
Bug Fixes and Other Changes
- Generic
tf.SparseTensorinput support has been added to
tft.scale_to_0_1,tft.scale_to_z_score,tft.scale_by_min_max,
tft.min,tft.max,tft.mean,tft.var,tft.sum,tft.sizeand
tft.word_count. - Optimize SavedModel written out by
tf.Transformwhen using native TF2 to
speed up loading it. - Added
tft_beam.PTransformAnalyzeras a base PTransform class for
tft.ptransform_analyzerusers who wish to have access to a base temporary
directory. - Fix an issue where >2D
SparseTensors may be incorrectly represented in
instance_dicts format. - Added support for out-of-vocabulary keys for per_key mappers.
- Added
tft.get_num_buckets_for_transformed_featurewhich provides the
number of buckets for a transformed feature if it is a direct output of
tft.bucketize,tft.apply_buckets,tft.compute_and_apply_vocabularyor
tft.apply_vocabulary. - Depends on
apache-beam[gcp]>=2.28,<3. - Depends on
numpy>=1.16,<1.20. - Depends on
tensorflow-metadata>=0.28.0,<0.29.0. - Depends on
tfx-bsl>=0.28.1,<0.29.0.
Breaking changes
- Autograph is disabled when the preprocessing fn is traced using tf.function
whenforce_tf_compat_v1=Falseand TF2 behavior is enabled.
Deprecations
- N/A
TensorFlow Transform 0.27.0
Major Features and Improvements
- Added
QuantilesCombiner.compactmethod that moves some amount of work done
bytft.quantilesfrom non-parallelizable to parallelizable stage of the
computation.
Bug Fixes and Other Changes
- Strip only newlines instead of all whitespace in the TFTransformOutput
vocabulary_by_name method. - Switch analyzers that output asset files to return an eager tensor
containing the asset file path instead of a tf.saved_model.Asset object when
force_tf_compat_v1=False. If this file is then used to initialize a table,
this ensures the input to thetf.lookup.TextFileInitializeris the file
path as the initializer handles wrapping this in atf.saved_model.Asset
object. - Added
tft.annotate_assetfor annotating asset files with a string key that
can be used to retrieve them intft.TFTransformOutput. - Depends on
apache-beam[gcp]>=2.27,<3. - Depends on
pyarrow>=1,<3. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<2.5. - Depends on
tensorflow-metadata>=0.27.0,<0.28.0. - Depends on
tfx-bsl>=0.27.0,<0.28.0.
Breaking changes
- N/A
Deprecations
- Parameter
use_tfxioin the initializer ofContextis removed (it was
deprecated in 0.24.0).
TensorFlow Transform 0.26.0
Major Features and Improvements
- Initial support added of >2D
SparseTensors as inputs and outputs of the
preprocessing_fn. Note that mappers and analyzers may not support those
yet, and output >2DSparseTensors will have an unkonwn dense shape.
Bug Fixes and Other Changes
- Switched to calling tables and initializers within
tf.init_scopewhen the
preprocessing_fnis traced usingtf.functionto avoid re-initializing
them on every invocation of the tracedtf.function. - Switched to a (notably) faster and more accurate implementation of
tft.quantilesanalyzer. - Fix an issue where graphs become non-hermetic if a TF2 transform_fn is
loaded in a TF1 Graph context, by making sure all assets are added to the
ASSET_FILEPATHScollection. - Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<3. - Depends on
pyarrow>=0.17,<0.18. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4. - Depends on
tensorflow-metadata>=0.26.0,<0.27.0. - Depends on
tfx-bsl>=0.26.0,<0.27.0.
Breaking changes
- Existing
tft.quantiles,tft.minandtft.maxcaches are invalidated.
Deprecations
- Parameter
always_return_num_quantilesoftft.quantilesand
tft.bucketizeis now deprecated. Both now always generate the requested
number of buckets. Settingalways_return_num_quantileswill have no effect
and it will be removed in the next version.
TensorFlow Transform 0.25.0
Major Features and Improvements
-
Updated the "Getting Started" guide and examples to demonstrate the support
for both the "instance dict" and the "TFXIO" format. Users are encouraged to
start using the "TFXIO" format, expecially in cases where
pre-canned TFXIO implementations
is available as it offers better performance. -
From this release TFT will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-transformNote: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFT available on PyPI by running the
commandpip install tensorflow-transform.
Bug Fixes and Other Changes
TFTransformOutput.transform_raw_featuresandTransformFeaturesLayercan
be used when a transform fn is exported as a TF2 SavedModel and imported in
graph mode.- Utility methods in
tft.inspect_preprocessing_fnnow take an optional
parameterforce_tf_compat_v1. If this is False, thepreprocessing_fnis
traced using tf.function in TF 2.x when TF 2 behaviors are enabled. - Switching to a wrapper for
collections.namedtupleto ensure compatibility
with PySpark which modifies classes produced by the factory. - Caching has been disabled for
tft.tukey_h_params,tft.tukey_locationand
tft.tukey_scaledue to the cached accumulator being non-deterministic. - Track variables created within the
preprocessing_fnin the native TF 2
implementation. TFTransformOutput.transform_raw_featuresreturns a wrapped python dict
that overrides pop to return None instead of raising a KeyError when called
with a key not found in the dictionary. This is done as preparation for
switching the default value ofdrop_unused_featuresto True.- Vocabularies written in
tfrecord_gzipformat no longer filter out entries
that are empty or that include a newline character. - Depends on
apache-beam[gcp]>=2.25,<3. - Depends on
tensorflow-metadata>=0.25,<0.26. - Depends on
tfx-bsl>=0.25,<0.26.
Breaking changes
- N/A
Deprecations
- The
decodemethod of the available coders (tft.coders.CsvCoderand
tft.coders.ExampleProtoCoder) has been deprecated and removed.
Canned TFXIO implementations
should be used to read and decode data instead.
TensorFlow Transform 0.24.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.24,<3. - Depends on
tfx-bsl>=0.24.1,<0.25.
Breaking changes
- N/A
Deprecations
- N/A
TensorFlow Transform 0.24.0
Major Features and Improvements
- Added native TF 2 implementation of Transform's Beam APIs -
tft.AnalyzeDataset,tft.AnalyzeDatasetWithCache,
tft.AnalyzeAndTransformDatasetandtft.TransformDataset. The default
behavior will continue to use Tensorflow's compat.v1 APIs. This can be
overriden by settingtft.Context.force_tf_compat_v1=False. The default
behavior for TF 2 users will be switched to the new native implementation in
a future release.
Bug Fixes and Other Changes
- Added a small fanout to analyzers'
CombineGloballyfor improved
performance. - Depends on
absl-py>=0.9,<0.11. - Depends on
protobuf>=3.9.2,<4. - Depends on
tensorflow-metadata>=0.24,<0.25. - Depends on
tfx-bsl>=0.24,<0.25.
Breaking changes
- N/A
Deprecations
- Deprecating Py3.5 support.
- Parameter
use_tfxioin the initializer ofContextis deprecated. TFT
Beam APIs now accepts both "instance dicts" and "TFXIO" input formats.
Setting it will have no effect and it will be removed in the next version.