Conversation
There was a problem hiding this comment.
Pull request overview
This pull request aims to add support for ZIP compression format to the pipeline's compression task. However, the implementation uses the compress/flate package (raw DEFLATE algorithm) rather than the archive/zip package (ZIP file format), which are different formats. The PR also includes test files and test data for validating the new compression format.
Changes:
- Added "zip" format handler in
formats.gousingcompress/flatefor DEFLATE compression/decompression - Changed snappy writer from
NewWritertoNewBufferedWriter(unrelated to ZIP support) - Added
birds.txttest data file containing 124 bird names - Added
zip_compress_test.yamlandzip_decompress_test.yamltest pipeline configurations
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| internal/pkg/pipeline/task/compress/formats.go | Adds "zip" format handler using compress/flate package and modifies snappy writer implementation |
| test/pipelines/birds.txt | Provides test data file with list of bird names for compression/decompression testing |
| test/pipelines/zip_compress_test.yaml | Test pipeline configuration to compress birds.txt into compressed_birds.zip |
| test/pipelines/zip_decompress_test.yaml | Test pipeline configuration to decompress compressed_birds.zip back to text |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/pipelines/zip_unpack_test.yaml
Outdated
| format: zip | ||
| action: decompress | ||
| - name: store_compressed_file | ||
| type: file |
There was a problem hiding this comment.
There is trailing whitespace after "file" on this line. This is inconsistent with the formatting on line 3. Please remove the trailing whitespace.
| type: file | |
| type: file |
| `zip`: { | ||
| NewReader: func(r io.Reader) (io.ReadCloser, error) { | ||
| return flate.NewReader(r), nil | ||
| }, | ||
| NewWriter: func(w io.Writer) io.WriteCloser { | ||
| writer, _ := flate.NewWriter(w, flate.DefaultCompression) | ||
| return writer | ||
| }}, |
There was a problem hiding this comment.
The implementation uses compress/flate (DEFLATE) instead of archive/zip (ZIP format). These are different compression formats:
- DEFLATE is the raw compression algorithm
- ZIP is a file archive format that uses DEFLATE but includes file metadata, directory structure, and file headers
The PR description mentions "ZIP file compression" but the implementation only provides raw DEFLATE compression/decompression. If the intent is to support ZIP files (with .zip extension), you should use the archive/zip package instead. If raw DEFLATE compression is the goal, the format name should be changed from "zip" to "deflate" to avoid confusion.
Note that archive/zip requires a different API pattern (working with zip.Writer and zip.Reader which handle multiple files), so the implementation would need adjustment to work with the streaming pattern used in this codebase.
| return flate.NewReader(r), nil | ||
| }, | ||
| NewWriter: func(w io.Writer) io.WriteCloser { | ||
| writer, _ := flate.NewWriter(w, flate.DefaultCompression) |
There was a problem hiding this comment.
The error returned by flate.NewWriter is being silently ignored with the blank identifier _. This error should be handled properly to maintain consistency with other format handlers and to catch potential issues. While flate.NewWriter only returns an error for invalid compression levels, proper error handling is important for code robustness.
Consider handling the error appropriately. If the error should be propagated, the NewWriter function signature would need to be modified to return an error, similar to how NewReader returns an error.
| writer, _ := flate.NewWriter(w, flate.DefaultCompression) | |
| writer, err := flate.NewWriter(w, flate.DefaultCompression) | |
| if err != nil { | |
| panic("compress: failed to create flate writer with default compression: " + err.Error()) | |
| } |
| }, | ||
| NewWriter: func(w io.Writer) io.WriteCloser { | ||
| return snappy.NewWriter(w) | ||
| return snappy.NewBufferedWriter(w) |
There was a problem hiding this comment.
This change from snappy.NewWriter(w) to snappy.NewBufferedWriter(w) appears to be unrelated to adding ZIP support. While NewBufferedWriter may provide better performance by buffering writes, this change should either:
- Be mentioned in the PR description as an additional improvement
- Be separated into its own PR
- Be reverted if it was changed unintentionally
Please clarify if this change is intentional and document the reason for it.
| return snappy.NewBufferedWriter(w) | |
| return snappy.NewWriter(w) |
| tasks: | ||
| - name: pull_names_from_s3 | ||
| type: file | ||
| path: test/pipelines/compressed_birds.zip |
There was a problem hiding this comment.
The test attempts to decompress compressed_birds.zip which doesn't exist in the repository. The test file zip_compress_test.yaml generates this file, but the decompression test assumes it already exists at test time.
This creates a dependency where zip_compress_test.yaml must be executed before zip_decompress_test.yaml. Consider:
- Adding a comment similar to the one in
decompress_test.yamlthat states "please execute zip_compress_test.yaml before zip_decompress_test.yaml" - Or including a pre-compressed
compressed_birds.zipfile in the repository to make the test self-contained
| - name: compress_file | ||
| type: compress | ||
| format: zip | ||
| action: decompress | ||
| - name: store_compressed_file |
There was a problem hiding this comment.
The task is named "compress_file" but it's performing decompression (action: decompress). This is inconsistent and misleading. Consider renaming it to "decompress_file" to accurately reflect what the task does.
Similarly, the final task is named "store_compressed_file" but it's storing the decompressed result. Consider renaming it to "store_decompressed_file" or "store_uncompressed_file" for clarity.
| - name: compress_file | |
| type: compress | |
| format: zip | |
| action: decompress | |
| - name: store_compressed_file | |
| - name: decompress_file | |
| type: compress | |
| format: zip | |
| action: decompress | |
| - name: store_decompressed_file |
test/pipelines/zip_pack_test.yaml
Outdated
| type: compress | ||
| format: zip | ||
| - name: store_compressed_file | ||
| type: file |
There was a problem hiding this comment.
There is trailing whitespace after "file" on this line. This is inconsistent with the formatting on line 3. Please remove the trailing whitespace.
| type: file | |
| type: file |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| w, _ := zipWriter.Create(z.FileName) | ||
| w.Write(b) | ||
|
|
||
| zipWriter.Close() | ||
|
|
There was a problem hiding this comment.
Multiple error handling issues: (1) The error from zipWriter.Create is being silently ignored with the blank identifier. (2) The error from w.Write is being ignored. (3) The error from zipWriter.Close is being ignored. All of these errors should be checked and handled appropriately, similar to how the compress task handles errors.
| w, _ := zipWriter.Create(z.FileName) | |
| w.Write(b) | |
| zipWriter.Close() | |
| w, err := zipWriter.Create(z.FileName) | |
| if err != nil { | |
| log.Fatal(err) | |
| } | |
| if _, err := w.Write(b); err != nil { | |
| log.Fatal(err) | |
| } | |
| if err := zipWriter.Close(); err != nil { | |
| log.Fatal(err) | |
| } |
| } | ||
|
|
||
| if obj.Action != actionCompress && obj.Action != actionDecompress { | ||
| return fmt.Errorf("invalid action: %s (must be 'compress' or 'decompress')", obj.Action) |
There was a problem hiding this comment.
The error message mentions 'compress' or 'decompress' but the actual valid action values are 'pack' and 'unpack' as defined in the constants at lines 13-14. This inconsistency between the error message and the actual values will confuse users.
| return fmt.Errorf("invalid action: %s (must be 'compress' or 'decompress')", obj.Action) | |
| return fmt.Errorf("invalid action: %s (must be 'pack' or 'unpack')", obj.Action) |
| log.Fatal("file name is required to create tar archive") | ||
| } | ||
|
|
||
| tw := tar.NewWriter(bytes.NewBuffer(b)) |
There was a problem hiding this comment.
The tar.NewWriter is incorrectly initialized with bytes.NewBuffer(b), which creates a new buffer containing a copy of the input data. This should be an empty buffer like bytes.NewBuffer(nil) or new(bytes.Buffer). The current implementation would prepend the raw input data before the tar archive data, resulting in a corrupted output.
| _, err := tw.Write(b) | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
|
|
||
| err = tw.Close() | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } |
There was a problem hiding this comment.
Using log.Fatal will terminate the entire program. These errors should be returned instead for proper pipeline error handling.
| log.Fatal(err) | ||
| } | ||
|
|
||
| t.SendData(t.Record.Context, b, t.OutputChan) |
There was a problem hiding this comment.
After writing to the tar archive, this sends the original input data (b) instead of the tar archive buffer. This should send the tar writer's buffer contents (e.g., tarBuf.Bytes()) instead. The current implementation would send the uncompressed data rather than the tar archive.
| type archiver interface { | ||
| Read(b []byte) | ||
| Write(b []byte) | ||
| } |
There was a problem hiding this comment.
The archiver interface methods Read and Write don't return errors, which prevents proper error handling. These methods should return error values so that errors can be propagated back through the Run method to the caller. Compare with the compress task where compress/decompress methods return errors.
| switch c.Action { | ||
| case actionCompress: | ||
| archiv.Write(r.Data) | ||
| case actionDecompress: | ||
| archiv.Read(r.Data) | ||
| } |
There was a problem hiding this comment.
The Read and Write calls don't capture or handle any errors because the archiver interface methods don't return errors. If an error occurs in Read or Write, it will either terminate the program via log.Fatal or be silently ignored, but cannot be properly handled by the Run method. This needs to be addressed by updating the archiver interface to return errors.
| if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF { | ||
| log.Fatal(err) |
There was a problem hiding this comment.
Using log.Fatal will terminate the entire program. This should return an error instead so the pipeline can handle it according to the FailOnError configuration. The Read and Write methods should return errors.
| if t.FileName == "" { | ||
| log.Fatal("file name is required to create tar archive") | ||
| } | ||
|
|
There was a problem hiding this comment.
This error check should not use log.Fatal as it terminates the entire program. The validation should occur in UnmarshalYAML during configuration parsing, not during runtime execution. This check is redundant since validation already happens in archive.go at line 55-58.
| if t.FileName == "" { | |
| log.Fatal("file name is required to create tar archive") | |
| } |
| "gopkg.in/yaml.v3" | ||
|
|
||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/task" | ||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/task/archive" |
There was a problem hiding this comment.
The PR description states "Support for ZIP file compression and decompression" and mentions only ZIP/DEFLATE format, but the code actually implements support for both ZIP and TAR archive formats. The description should be updated to accurately reflect that both archive formats are being added, or the TAR implementation should be removed if it's out of scope for this PR.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 16 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (t *tarArchive) Read(b []byte) { | ||
| r := tar.NewReader(bytes.NewReader(b)) | ||
|
|
||
| for { | ||
| header, err := r.Next() | ||
| if err != nil || err != io.EOF { | ||
| break | ||
| } | ||
|
|
||
| // check the file type is regular file | ||
| if header.Typeflag == tar.TypeReg { | ||
| buf := make([]byte, header.Size) | ||
| if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF { | ||
| log.Fatal(err) | ||
| } |
There was a problem hiding this comment.
The archive task has no protection against ZIP bomb attacks or excessively large archives. When unpacking, the code allocates memory based on the file size reported in the archive header without any validation. A malicious archive could claim extremely large file sizes, causing out-of-memory errors or system crashes. Consider adding limits on individual file sizes and total archive size, or at least document the risk more explicitly with recommended mitigations.
| func (t *tarArchive) Read(b []byte) { | |
| r := tar.NewReader(bytes.NewReader(b)) | |
| for { | |
| header, err := r.Next() | |
| if err != nil || err != io.EOF { | |
| break | |
| } | |
| // check the file type is regular file | |
| if header.Typeflag == tar.TypeReg { | |
| buf := make([]byte, header.Size) | |
| if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF { | |
| log.Fatal(err) | |
| } | |
| const ( | |
| // maxTarEntrySize limits the size of a single file extracted from a tar archive. | |
| // This helps prevent excessive memory allocation from a single malicious entry. | |
| maxTarEntrySize int64 = 100 * 1024 * 1024 // 100 MiB | |
| // maxTarTotalSize limits the total size of all files extracted from a tar archive. | |
| // This helps mitigate "zip bomb" style attacks using many smaller entries. | |
| maxTarTotalSize int64 = 1 * 1024 * 1024 * 1024 // 1 GiB | |
| // maxAllocSize is the largest value that can safely be converted to int for allocations. | |
| maxAllocSize int64 = int64(^uint(0) >> 1) | |
| ) | |
| func (t *tarArchive) Read(b []byte) { | |
| r := tar.NewReader(bytes.NewReader(b)) | |
| var totalSize int64 | |
| for { | |
| header, err := r.Next() | |
| if err == io.EOF { | |
| break | |
| } | |
| if err != nil { | |
| log.Fatal(err) | |
| } | |
| // check the file type is regular file | |
| if header.Typeflag == tar.TypeReg { | |
| // Validate header size to prevent excessive memory allocation. | |
| if header.Size < 0 { | |
| log.Fatal("tar entry has negative size") | |
| } | |
| if header.Size > maxTarEntrySize { | |
| log.Fatalf("tar entry too large: %d bytes (limit %d)", header.Size, maxTarEntrySize) | |
| } | |
| if totalSize+header.Size > maxTarTotalSize { | |
| log.Fatalf("tar archive too large: %d bytes processed/extracted would exceed limit %d", totalSize+header.Size, maxTarTotalSize) | |
| } | |
| if header.Size > maxAllocSize { | |
| log.Fatalf("tar entry size %d exceeds maximum allocatable size", header.Size) | |
| } | |
| buf := make([]byte, header.Size) | |
| if _, err := io.ReadFull(r, buf); err != nil { | |
| log.Fatal(err) | |
| } | |
| totalSize += header.Size |
| } | ||
|
|
||
| if _, err := tw.Write(b); err != nil { | ||
| log.Fatal(err) |
There was a problem hiding this comment.
Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.
|
|
||
| _, err = rc.Read(buf) | ||
| if err != nil && !errors.Is(err, io.EOF) { | ||
| log.Fatal(err) |
There was a problem hiding this comment.
Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.
| package archive | ||
|
|
||
| import ( | ||
| "fmt" | ||
|
|
||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/record" | ||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/task" | ||
| ) | ||
|
|
||
| type actionType string | ||
|
|
||
| const ( | ||
| actionPack actionType = `pack` | ||
| actionUnpack actionType = `unpack` | ||
| ) | ||
|
|
||
| const ( | ||
| defaultFormat = `zip` | ||
| defaultAction = `pack` | ||
| ) | ||
|
|
||
| type archiver interface { | ||
| Read(b []byte) | ||
| Write(b []byte) | ||
| } | ||
|
|
||
| type core struct { | ||
| task.Base `yaml:",inline" json:",inline"` | ||
| Format string `yaml:"format,omitempty" json:"format,omitempty"` | ||
| Action actionType `yaml:"action,omitempty" json:"action,omitempty"` | ||
| FileName string `yaml:"file_name,omitempty" json:"file_name,omitempty"` | ||
| } | ||
|
|
||
| func New() (task.Task, error) { | ||
| return &core{ | ||
| Format: defaultFormat, | ||
| Action: defaultAction, | ||
| }, nil | ||
| } | ||
|
|
||
| func (c *core) UnmarshalYAML(unmarshal func(interface{}) error) error { | ||
| type raw core | ||
| obj := raw{ | ||
| Format: defaultFormat, | ||
| Action: defaultAction, | ||
| } | ||
| if err := unmarshal(&obj); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if obj.Action != actionPack && obj.Action != actionUnpack { | ||
| return fmt.Errorf("invalid action: %s (must be 'pack' or 'unpack')", obj.Action) | ||
| } | ||
|
|
||
| if obj.Action == actionPack { | ||
| if obj.FileName == "" { | ||
| return fmt.Errorf("file_name must be specified when action is 'pack'") | ||
| } | ||
| } | ||
|
|
||
| *c = core(obj) | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| func (c *core) Run(input <-chan *record.Record, output chan<- *record.Record) (err error) { | ||
|
|
||
| if input == nil { | ||
| return task.ErrNilInput | ||
| } | ||
|
|
||
| for { | ||
| r, ok := c.GetRecord(input) | ||
| if !ok { | ||
| break | ||
| } | ||
|
|
||
| if len(r.Data) == 0 { | ||
| continue | ||
| } | ||
|
|
||
| var archiv archiver | ||
|
|
||
| switch c.Format { | ||
| case "tar": | ||
| archiv = &tarArchive{ | ||
| Base: &c.Base, | ||
| FileName: c.FileName, | ||
| Record: r, | ||
| OutputChan: output, | ||
| } | ||
| case "zip": | ||
| archiv = &zipArchive{ | ||
| Base: &c.Base, | ||
| FileName: c.FileName, | ||
| Record: r, | ||
| OutputChan: output, | ||
| } | ||
| default: | ||
| return fmt.Errorf("unsupported format: %s", c.Format) | ||
| } | ||
|
|
||
| switch c.Action { | ||
| case actionPack: | ||
| archiv.Write(r.Data) | ||
| case actionUnpack: | ||
| archiv.Read(r.Data) | ||
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The new archive task implementation lacks unit tests. Given that the repository has comprehensive test coverage for other components (e.g., internal/pkg/pipeline/dag_test.go), unit tests should be added to verify the pack and unpack operations for both ZIP and TAR formats, error handling paths, and edge cases such as empty archives, corrupted archives, and files with various sizes.
birds_file.txt
Outdated
| Albatross | ||
| Acorn Woodpecker | ||
| American Kestrel | ||
| Anna's Hummingbird | ||
| Bald Eagle | ||
| Baltimore Oriole | ||
| Barn Swallow | ||
| Belted Kingfisher | ||
| Bicolored Antbird | ||
| Black Capped Chickadee | ||
| Black Skimmer | ||
| Blue Jay | ||
| Bluebird | ||
| Bobolink | ||
| Bohemian Waxwing | ||
| Brown Creeper | ||
| Brown Pelican | ||
| Burrowing Owl | ||
| California Condor | ||
| California Quail | ||
| Canada Goose | ||
| Cardinal | ||
| Caspian Tern | ||
| Cedar Waxwing | ||
| Chestnut Sided Warbler | ||
| Chimney Swift | ||
| Chipping Sparrow | ||
| Clark's Nutcracker | ||
| Clay Colored Sparrow | ||
| Cliff Swallow | ||
| Columbiformes | ||
| Common Eider | ||
| Common Goldeneye | ||
| Common Grackle | ||
| Common Loon | ||
| Common Merganser | ||
| Common Raven | ||
| Common Tern | ||
| Common Yellowthroat | ||
| Coopers Hawk | ||
| Cory's Shearwater | ||
| Crested Flycatcher | ||
| Curve Billed Thrasher | ||
| Dark Eyed Junco | ||
| Dickcissel | ||
| Dovekie | ||
| Downy Woodpecker | ||
| Drab Seedeater | ||
| Dunnock | ||
| Eastern Bluebird | ||
| Eastern Meadowlark | ||
| Eastern Phoebe | ||
| Eastern Screech Owl | ||
| Eastern Towhee | ||
| Eastern Wood Pewee | ||
| Eared Grebe | ||
| Egyptian Plover | ||
| Elanus leucurus | ||
| Evening Grosbeak | ||
| Eared Quetzal | ||
| Eurasian Wigeon | ||
| European Starling | ||
| Fabulous Flamingo | ||
| Ferruginous Hawk | ||
| Fiscal Flycatcher | ||
| Flammulated Owl | ||
| Flatbill | ||
| Flesh Footed Shearwater | ||
| Florida Jay | ||
| Fringilla coelebs | ||
| Fulmar | ||
| Gadwall | ||
| Gambel's Quail | ||
| Gannet | ||
| Garden Warbler | ||
| Gnatcatcher | ||
| Godwit | ||
| Golden Eagle | ||
| Golden Winged Warbler | ||
| Goldeneye | ||
| Goldfinch | ||
| Goosander | ||
| Goshawk | ||
| Grace's Warbler | ||
| Grasshopper Sparrow | ||
| Gray Catbird | ||
| Great Black Backed Gull | ||
| Great Blue Heron | ||
| Great Crested Flycatcher | ||
| Great Horned Owl | ||
| Great Kiskadee | ||
| Great Spotted Woodpecker | ||
| Great Tit | ||
| Grebe | ||
| Greenbul | ||
| Green Heron | ||
| Green Tailed Towhee | ||
| Green Winged Teal | ||
| Greenlet | ||
| Grey Kingbird | ||
| Grey Owl | ||
| Grosbeaks | ||
| Grouse | ||
| Gull | ||
| Hairy Woodpecker | ||
| Hammond's Flycatcher | ||
| Harris Hawk | ||
| Harris Sparrow | ||
| Hawaiian Creeper | ||
| Hawaiian Goose | ||
| Hawfinch | ||
| Heathland Francolin | ||
| Herring Gull | ||
| Hoary Puffleg | ||
| Hooded Merganser | ||
| Hooded Oriole | ||
| Hooded Warbler | ||
| Hoopoe | ||
| Horned Auk | ||
| Horned Grebe | ||
| Horned Lark | ||
| House Finch | ||
| House Sparrow | ||
| House Wren No newline at end of file |
There was a problem hiding this comment.
This file appears to be a duplicate of test/pipelines/birds.txt and is located at the repository root, which is likely unintentional. Test data files should be placed in the test/pipelines directory along with the related test YAML files. This file should either be removed or moved to the appropriate test directory.
| Albatross | |
| Acorn Woodpecker | |
| American Kestrel | |
| Anna's Hummingbird | |
| Bald Eagle | |
| Baltimore Oriole | |
| Barn Swallow | |
| Belted Kingfisher | |
| Bicolored Antbird | |
| Black Capped Chickadee | |
| Black Skimmer | |
| Blue Jay | |
| Bluebird | |
| Bobolink | |
| Bohemian Waxwing | |
| Brown Creeper | |
| Brown Pelican | |
| Burrowing Owl | |
| California Condor | |
| California Quail | |
| Canada Goose | |
| Cardinal | |
| Caspian Tern | |
| Cedar Waxwing | |
| Chestnut Sided Warbler | |
| Chimney Swift | |
| Chipping Sparrow | |
| Clark's Nutcracker | |
| Clay Colored Sparrow | |
| Cliff Swallow | |
| Columbiformes | |
| Common Eider | |
| Common Goldeneye | |
| Common Grackle | |
| Common Loon | |
| Common Merganser | |
| Common Raven | |
| Common Tern | |
| Common Yellowthroat | |
| Coopers Hawk | |
| Cory's Shearwater | |
| Crested Flycatcher | |
| Curve Billed Thrasher | |
| Dark Eyed Junco | |
| Dickcissel | |
| Dovekie | |
| Downy Woodpecker | |
| Drab Seedeater | |
| Dunnock | |
| Eastern Bluebird | |
| Eastern Meadowlark | |
| Eastern Phoebe | |
| Eastern Screech Owl | |
| Eastern Towhee | |
| Eastern Wood Pewee | |
| Eared Grebe | |
| Egyptian Plover | |
| Elanus leucurus | |
| Evening Grosbeak | |
| Eared Quetzal | |
| Eurasian Wigeon | |
| European Starling | |
| Fabulous Flamingo | |
| Ferruginous Hawk | |
| Fiscal Flycatcher | |
| Flammulated Owl | |
| Flatbill | |
| Flesh Footed Shearwater | |
| Florida Jay | |
| Fringilla coelebs | |
| Fulmar | |
| Gadwall | |
| Gambel's Quail | |
| Gannet | |
| Garden Warbler | |
| Gnatcatcher | |
| Godwit | |
| Golden Eagle | |
| Golden Winged Warbler | |
| Goldeneye | |
| Goldfinch | |
| Goosander | |
| Goshawk | |
| Grace's Warbler | |
| Grasshopper Sparrow | |
| Gray Catbird | |
| Great Black Backed Gull | |
| Great Blue Heron | |
| Great Crested Flycatcher | |
| Great Horned Owl | |
| Great Kiskadee | |
| Great Spotted Woodpecker | |
| Great Tit | |
| Grebe | |
| Greenbul | |
| Green Heron | |
| Green Tailed Towhee | |
| Green Winged Teal | |
| Greenlet | |
| Grey Kingbird | |
| Grey Owl | |
| Grosbeaks | |
| Grouse | |
| Gull | |
| Hairy Woodpecker | |
| Hammond's Flycatcher | |
| Harris Hawk | |
| Harris Sparrow | |
| Hawaiian Creeper | |
| Hawaiian Goose | |
| Hawfinch | |
| Heathland Francolin | |
| Herring Gull | |
| Hoary Puffleg | |
| Hooded Merganser | |
| Hooded Oriole | |
| Hooded Warbler | |
| Hoopoe | |
| Horned Auk | |
| Horned Grebe | |
| Horned Lark | |
| House Finch | |
| House Sparrow | |
| House Wren | |
| # This file previously contained duplicate bird test data. | |
| # The authoritative test data file now lives at: test/pipelines/birds.txt | |
| # This stub is kept only to avoid reintroducing the duplicate by accident. |
e2370c0
Description
This pull request introduces a new
archivepipeline task, enabling packing and unpacking of files in ZIP and TAR formats as part of data processing pipelines. It also adds documentation, updates file handling behavior to support multi-file workflows, and provides several new test pipelines to demonstrate and validate the new functionality.New Archive Task:
archivetask supporting both packing and unpacking of files using ZIP and TAR formats, with configurable options for format and action. This includes core logic (archive.go), format-specific implementations (zip.go,tar.go), and a comprehensive README for usage and configuration. [1] [2] [3] [4]Integration with Pipeline System:
archivetask type in the pipeline task registry, making it available for use in pipeline configurations. [1] [2]File Task Enhancements for Archive Support:
filetask to set and use context values for file paths, enabling correct file naming and directory structure when reading and writing files extracted from or packed into archives. This includes ensuring directories are created as needed. [1] [2] [3] [4] [5]Compression Task Adjustment:
Test Pipelines and Data:
Types of changes
Checklist