Align master#7856
Open
vkarampudi wants to merge 161 commits into
Open
Conversation
…traint files to resolve pip installer conflicts
…ds on Python 3.13
…-build-isolation to fix build-isolation errors on Python 3.13
…che-beam wheels and transitive protobuf v6 conflict
…ps and using --no-build-isolation
…GHTLY and GIT_MASTER
…un on NIGHTLY and GIT_MASTER" This reverts commit 2a8d8a5.
…e-beam's setup script under --no-build-isolation
…build-isolation on Python 3.10
…ython 3.13 wheels
…els for Python 3.12/3.13
This reverts commit 1952f03.
… single RUN layer
…elease version constraints
dbea6fd to
5505684
Compare
…instead of master
…installed Google and TFX packages
…sts to resolve security scanner vulnerabilities
rwitcher
reviewed
Jun 15, 2026
| python-version: ['3.10', '3.11', '3.12', '3.13'] | ||
| which-tests: ["not e2e", "e2e"] | ||
| dependency-selector: ["NIGHTLY", "DEFAULT"] | ||
| dependency-selector: ["DEFAULT"] |
Contributor
There was a problem hiding this comment.
Why was NIGHTLY removed here?
Contributor
Author
There was a problem hiding this comment.
Nightly will run on tf 2.22.0.dev and failing with incompatability issues, since we are targeting tf 2.21 for this release, Default workflows would be fine, Nightly's are blocking rest of the workflows to build.
| | 'WritePredictionLogs' >> beam.io.WriteToTFRecord( | ||
| os.path.join(inference_result.uri, _PREDICTION_LOGS_FILE_NAME), | ||
| file_name_suffix='.gz', | ||
| num_shards=self._get_num_shards(self._beam_pipeline_args), |
Contributor
There was a problem hiding this comment.
How is "num_shards" used? It's not clear to my why this was added.
Contributor
Author
There was a problem hiding this comment.
num_shardswas added to configure the output sharding strategy dynamically based on the runner environment:
- Avoiding Local Runner Bugs: When executing the pipeline locally (e.g., using
DirectRunnerorPrismRunner), Apache Beam's dynamic sharding (num_shards=0) combined with local loopback worker setups can trigger file rename/cleanup bugs in Beam'sFileBasedSink. Forcingnum_shards=1for local runs avoids these issues by ensuring a single deterministic file is written and renamed.- Preserving Distributed Scaling: When running on distributed runners like Google Cloud Dataflow,
_get_num_shardsreturns0. This allows Apache Beam to dynamically scale the number of output shards based on the volume of data being processed, preventing write bottlenecks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.