Skip to content

[FLUSS-2686] Add COS filesystem support#2836

Open
XuQianJin-Stars wants to merge 10 commits into
apache:mainfrom
XuQianJin-Stars:feature/cos-filesystem-integration
Open

[FLUSS-2686] Add COS filesystem support#2836
XuQianJin-Stars wants to merge 10 commits into
apache:mainfrom
XuQianJin-Stars:feature/cos-filesystem-integration

Conversation

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #2686

This PR adds support for Tencent Cloud Object Storage (COS) as a remote filesystem for Fluss. COS is one of the most widely used cloud storage services in China and this integration enables users to store Fluss remote data on COS, similar to the existing OSS, OBS, S3, and GCS filesystem integrations.

Brief change log

  • fluss-fs-cos/pom.xml: Add module configuration with dependencies on hadoop-cos (3.3.5), cos_api (5.6.139), and fluss-fs-hadoop for Hadoop filesystem abstraction.
  • COSFileSystem: Extends HadoopFileSystem to wrap Hadoop's CosNFileSystem, with lazy-initialized security token provider support for obtaining temporary credentials.
  • COSFileSystemPlugin: Implements FileSystemPlugin SPI for the cosn scheme. Handles Hadoop configuration translation from Fluss config (prefix fs.cosn.*), supports three credential modes: static secret key, custom credentials provider, and dynamic security token via COSSecurityTokenReceiver.
  • COSSecurityTokenProvider: Generates temporary security tokens using Tencent Cloud STS (Security Token Service) based on existing secretId/secretKey configuration.
  • COSSecurityTokenReceiver: Implements SecurityTokenReceiver SPI to receive and apply security tokens, configuring DynamicTemporaryCOSCredentialsProvider as the Hadoop credentials provider.
  • DynamicTemporaryCOSCredentialsProvider: Implements COS SDK's COSCredentialsProvider interface, providing BasicSessionCredentials from tokens received via COSSecurityTokenReceiver.
  • fluss-filesystems/pom.xml: Register fluss-fs-cos as a sub-module.

Tests

  • COSFileSystemBehaviorITCase: Integration test for basic COS filesystem behavior (create, read, write, delete) using static credentials (secretId/secretKey).
  • COSWithTokenFileSystemBehaviorITCase: Integration test for COS filesystem behavior using dynamic security tokens obtained via STS.
  • COSWithTokenFileSystemBehaviorBaseITCase: Base class for token-based filesystem tests, handling filesystem initialization with both static credentials and security tokens.
  • COSTestCredentials: Test utility that reads COS credentials and endpoint from environment variables (COSN_SECRET_ID, COSN_SECRET_KEY, COSN_ENDPOINT, COSN_BUCKET).

Note: These are IT (integration tests) that require actual COS credentials and bucket access. They are skipped automatically when credentials are not available via Assumptions.assumeTrue.

API and Format

No existing API or storage format changes. This PR only adds a new filesystem plugin that registers via SPI (META-INF/services/org.apache.fluss.fs.FileSystemPlugin).

New configuration keys introduced (all following standard Hadoop COS conventions):

  • fs.cosn.endpoint — COS region endpoint (e.g., ap-guangzhou)
  • fs.cosn.userinfo.secretId — Tencent Cloud secret ID
  • fs.cosn.userinfo.secretKey — Tencent Cloud secret key
  • fs.cosn.credentials.provider — Custom credentials provider class

Documentation

This PR introduces a new feature. Documentation for COS filesystem configuration should be added to the Fluss documentation site in a follow-up. Usage example:

remote.data.dir: cosn://bucket-name/fluss-data
fs.cosn.endpoint: ap-guangzhou
fs.cosn.userinfo.secretId: <your-secret-id>
fs.cosn.userinfo.secretKey: <your-secret-key>

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch 2 times, most recently from 8f2358a to 186ef5d Compare March 11, 2026 02:15
@XuQianJin-Stars XuQianJin-Stars changed the title [FLUSS-2686] Add COS filesystem support (fluss-fs-cos module) [FLUSS-2686] Add COS filesystem support Mar 11, 2026
@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch from 186ef5d to e1151e8 Compare March 11, 2026 05:14
@luoyuxia luoyuxia requested a review from Copilot March 11, 2026 14:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Fluss filesystem plugin module to support Tencent Cloud Object Storage (COS) via Hadoop’s CosNFileSystem, including SPI registration and integration tests for COS access.

Changes:

  • Introduces new fluss-fs-cos module with COS filesystem implementation, token receiver/provider, and SPI service registrations.
  • Adds COS integration tests (static credentials + token-based flow) gated by environment-provided credentials.
  • Updates parent filesystem build to include the COS module and adjusts test-coverage exclusions.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
fluss-test-coverage/pom.xml Excludes COS filesystem packages from coverage aggregation (consistent with other FS plugins).
fluss-filesystems/pom.xml Registers fluss-fs-cos as a submodule.
fluss-filesystems/fluss-fs-cos/pom.xml Adds COS module dependencies, shading, and multi-release JAXB handling.
fluss-filesystems/fluss-fs-cos/src/main/java/org/apache/fluss/fs/cos/COSFileSystem.java Wraps Hadoop FS and adds Fluss security-token acquisition hook.
fluss-filesystems/fluss-fs-cos/src/main/java/org/apache/fluss/fs/cos/COSFileSystemPlugin.java Implements FileSystemPlugin for cosn and translates Fluss config to Hadoop config.
fluss-filesystems/fluss-fs-cos/src/main/java/org/apache/fluss/fs/cos/token/COSSecurityTokenProvider.java Provides a Fluss ObtainedSecurityToken intended for COS credential propagation.
fluss-filesystems/fluss-fs-cos/src/main/java/org/apache/fluss/fs/cos/token/COSSecurityTokenReceiver.java Receives Fluss tokens and configures Hadoop COS credential provider + extra config.
fluss-filesystems/fluss-fs-cos/src/main/java/org/apache/fluss/fs/cos/token/DynamicTemporaryCOSCredentialsProvider.java Supplies COS SDK credentials dynamically from received tokens.
fluss-filesystems/fluss-fs-cos/src/main/resources/META-INF/services/org.apache.fluss.fs.FileSystemPlugin SPI registration of COSFileSystemPlugin.
fluss-filesystems/fluss-fs-cos/src/main/resources/META-INF/services/org.apache.fluss.fs.token.SecurityTokenReceiver SPI registration of COSSecurityTokenReceiver.
fluss-filesystems/fluss-fs-cos/src/main/resources/META-INF/licenses/LICENSE.jaxb Bundled CDDL license text for JAXB dependency.
fluss-filesystems/fluss-fs-cos/src/main/resources/META-INF/NOTICE Declares bundled dependencies and licensing notice for the module.
fluss-filesystems/fluss-fs-cos/src/test/java/org/apache/fluss/fs/cos/COSFileSystemBehaviorITCase.java COS behavior IT using static secretId/secretKey.
fluss-filesystems/fluss-fs-cos/src/test/java/org/apache/fluss/fs/cos/COSWithTokenFileSystemBehaviorBaseITCase.java Base for token-based COS behavior tests; initializes FS with static creds first.
fluss-filesystems/fluss-fs-cos/src/test/java/org/apache/fluss/fs/cos/COSWithTokenFileSystemBehaviorITCase.java Token-based COS behavior IT wiring receiver + re-init.
fluss-filesystems/fluss-fs-cos/src/test/java/org/apache/fluss/fs/cos/COSTestCredentials.java Reads COS IT credentials from environment variables and gates tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +62
public ObtainedSecurityToken obtainSecurityToken(String scheme) {
// For COS, we directly use the configured secret id and secret key as the token.
// If STS temporary credentials are needed in the future, this can be extended
// to call Tencent Cloud STS API to get temporary credentials.
Map<String, String> additionInfo = new HashMap<>();
// we need to put endpoint as addition info
if (endpoint != null) {
additionInfo.put(ENDPOINT_KEY, endpoint);
}

Credentials credentials = new Credentials(secretId, secretKey, null);
byte[] tokenBytes = CredentialsJsonSerde.toJson(credentials);

// token does not expire when using static credentials
return new ObtainedSecurityToken(scheme, tokenBytes, Long.MAX_VALUE, additionInfo);
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This provider serializes secretId/secretKey with a null security token and sets Long.MAX_VALUE expiry, but the receiver converts tokens into BasicSessionCredentials (session-based) which generally expects a non-null session token. Either (a) actually obtain STS temporary credentials (accessKey/secretKey/sessionToken + real expiry) and serialize them, or (b) treat the propagated token as static credentials end-to-end (adjust receiver/provider to use non-session COS credentials) and rename/comment accordingly to avoid implying STS behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +51
LOG.debug("Providing session credentials");
return new BasicSessionCredentials(
credentials.getCOSAccessKeyId(),
credentials.getCOSSecretKey(),
((BasicSessionCredentials) credentials).getSessionToken());
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditionally casts credentials to BasicSessionCredentials, which will throw ClassCastException if the receiver ever stores a different COSCredentials implementation (e.g., static/basic credentials). Consider removing the cast by storing the session token separately, or branching with instanceof and returning the appropriate credential type based on what was received.

Suggested change
LOG.debug("Providing session credentials");
return new BasicSessionCredentials(
credentials.getCOSAccessKeyId(),
credentials.getCOSSecretKey(),
((BasicSessionCredentials) credentials).getSessionToken());
if (credentials instanceof BasicSessionCredentials) {
BasicSessionCredentials sessionCredentials = (BasicSessionCredentials) credentials;
LOG.debug("Providing session credentials");
return new BasicSessionCredentials(
sessionCredentials.getCOSAccessKeyId(),
sessionCredentials.getCOSSecretKey(),
sessionCredentials.getSessionToken());
} else {
LOG.debug("Providing non-session COS credentials");
return credentials;
}

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +102
LOG.info(
"Session credentials updated successfully with access key: {}.",
credentials.getCOSAccessKeyId());
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid logging credential identifiers (even access key IDs) at INFO level, as they are typically treated as sensitive and can leak via centralized logs. Consider removing the access key from the message, masking it (e.g., last 4 chars), and/or lowering the log level to DEBUG.

Suggested change
LOG.info(
"Session credentials updated successfully with access key: {}.",
credentials.getCOSAccessKeyId());
LOG.info("Session credentials updated successfully.");

Copilot uses AI. Check for mistakes.
// then, set addition info
if (additionInfos == null) {
// if addition info is null, it also means we have not received any token,
throw new RuntimeException("Credentials is not ready.");
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception message is grammatically incorrect and not very actionable. Consider using a more specific message that indicates the real cause (e.g., token/credentials not yet received via onNewTokensObtained, so the receiver cannot update Hadoop config) and—if applicable—include what to configure/call first.

Suggested change
throw new RuntimeException("Credentials is not ready.");
throw new RuntimeException(
"Cannot update Hadoop configuration: COS credentials and additional "
+ "information have not been received yet. Ensure "
+ "onNewTokensObtained(...) has been called with valid tokens "
+ "before invoking COSSecurityTokenReceiver.updateHadoopConfig().");

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +63
String providers = hadoopConfig.get(CREDENTIALS_PROVIDER, "");

if (!providers.contains(credentialsProviderName)) {
if (providers.isEmpty()) {
LOG.debug("Setting provider");
providers = credentialsProviderName;
} else {
providers = credentialsProviderName + "," + providers;
LOG.debug("Prepending provider, new providers value: {}", providers);
}
hadoopConfig.set(CREDENTIALS_PROVIDER, providers);
} else {
LOG.debug("Provider already exists");
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using String#contains on a comma-separated provider list can yield false positives (e.g., partial class-name matches). Consider parsing the list as comma-separated tokens (trimmed) and checking for exact equality before prepending.

Copilot uses AI. Check for mistakes.
private static final Logger LOG = LoggerFactory.getLogger(COSSecurityTokenReceiver.class);

static volatile COSCredentials credentials;
static volatile Map<String, String> additionInfos;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name additionInfos is awkward/unclear; additionalInfos (or additionalInfo if singular) is more standard and improves readability throughout this class.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +66
// then, set addition info
if (additionInfos == null) {
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name additionInfos is awkward/unclear; additionalInfos (or additionalInfo if singular) is more standard and improves readability throughout this class.

Copilot uses AI. Check for mistakes.
// if addition info is null, it also means we have not received any token,
throw new RuntimeException("Credentials is not ready.");
} else {
for (Map.Entry<String, String> entry : additionInfos.entrySet()) {
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name additionInfos is awkward/unclear; additionalInfos (or additionalInfo if singular) is more standard and improves readability throughout this class.

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +137
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifestEntries>
<!-- jaxb-api is packaged as an optional dependency that is only accessible on Java 11 -->
<Multi-Release>true</Multi-Release>
</manifestEntries>
</archive>
</configuration>
</plugin>
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same maven-jar-plugin is declared twice. This is easy to misread and can lead to configuration being overridden unexpectedly depending on Maven’s model merging. Consider merging the jar and test-jar executions into a single maven-jar-plugin declaration so the Multi-Release manifest configuration applies consistently.

Copilot uses AI. Check for mistakes.
Comment thread fluss-filesystems/fluss-fs-cos/pom.xml Outdated
Comment on lines +240 to +250
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same maven-jar-plugin is declared twice. This is easy to misread and can lead to configuration being overridden unexpectedly depending on Maven’s model merging. Consider merging the jar and test-jar executions into a single maven-jar-plugin declaration so the Multi-Release manifest configuration applies consistently.

Copilot uses AI. Check for mistakes.
@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

hi @wuchong @leekeiabstraction @luoyuxia Hi, i already updated the pr. Please help review when you got some time.

Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the PR. Left some comments.

</dependency>

<dependency>
<groupId>org.apache.fluss</groupId>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in test scope?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in test scope?

The test scope for fluss-test-utils is already defined in the parent pom's dependencyManagement section (line 336 in root pom.xml), so child modules inherit it automatically without needing to declare test explicitly. This is consistent with how other filesystem modules (oss, s3, gs, azure, obs, hdfs) declare this dependency.

/** A provider to provide Tencent Cloud COS security token. */
public class COSSecurityTokenProvider {

private static final String SECRET_ID = "fs.cosn.userinfo.secretId";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string constant seems to be declared in a few classes. Consider sharing single pub static final?

* An implementation of the {@link FileSystemBehaviorTestSuite} for the COS file system with Hadoop
* cos sdk.
*/
class COSFileSystemBehaviorITCase extends FileSystemBehaviorTestSuite {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the IT test run successful with creds set? Would be good to indicate in pr description

Copy link
Copy Markdown
Contributor Author

@XuQianJin-Stars XuQianJin-Stars Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the IT test run successful with creds set? Would be good to indicate in pr description

Yes, the IT tests have been run successfully with COS credentials configured.

Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks consistent to other filesystem support. Left comments on unrelated changes being included in this PR (looks like it crept back in), can approve once this is addresed.

I think we should have someone else who has access to COS to test this as well as I do not have access to COS to manually test. cc @luoyuxia

| `client.lookup.max-retries` | `2147483647` | Integer | Setting a value greater than zero will cause the client to resend any lookup request that fails with a potentially transient error. |
| `client.scanner.remote-log.prefetch-num` | `4` | Integer | The number of remote log segments to keep in local temp file for LogScanner, which download from remote storage. The default setting is 4. |
| `client.scanner.io.tmpdir` | `/var/folders/bp/v2l48kz51mx86d743qv0zhzh0000gn/T//fluss` | String | Local directory that is used by client for storing the data files (like kv snapshot, log segment files) to read temporarily |
| `client.scanner.io.tmpdir` | `/var/folders/7r/lwdsh9ms4gn0fnxs8c6fcfpm0000gn/T//fluss` | String | Local directory that is used by client for storing the data files (like kv snapshot, log segment files) to read temporarily |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like unrelated changes crept back into this file

Comment on lines +241 to +244
| `remote.data.dir` | `none` | String | The directory used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem. When upgrading to `remote.data.dirs`, please ensure this value is placed as the first entry in the new configuration.For new clusters, it is recommended to use `remote.data.dirs` instead. If `remote.data.dirs` is configured, this value will be ignored. |
| `remote.data.dirs` | `[]` | ArrayList | A comma-separated list of directories in Fluss supported filesystems for storing the kv snapshot data files and remote log files of tables/partitions. If configured, when a new table or a new partition is created, one of the directories from this list will be selected according to the strategy specified by `remote.data.dirs.strategy` (`ROUND_ROBIN` by default). If not configured, the system uses `remote.data.dir` as the sole remote data directory for all data. |
| `remote.data.dirs.strategy` | `ROUND_ROBIN` | RemoteDataDirStrategy | The strategy for selecting the remote data directory from `remote.data.dirs`. The candidate strategies are: [ROUND_ROBIN, WEIGHTED_ROUND_ROBIN], the default strategy is ROUND_ROBIN. ROUND_ROBIN: this strategy employs a round-robin approach to select one from the available remote directories. WEIGHTED_ROUND_ROBIN: this strategy selects one of the available remote directories based on the weights configured in `remote.data.dirs.weights`. |
| `remote.data.dirs.weights` | `[]` | ArrayList | The weights of the remote data directories. This is a list of weights corresponding to the `remote.data.dirs` in the same order. When `remote.data.dirs.strategy` is set to `WEIGHTED_ROUND_ROBIN`, this must be configured, and its size must be equal to `remote.data.dirs`; otherwise, it will be ignored. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

| `remote.data.dirs.weights` | `[]` | ArrayList | The weights of the remote data directories. This is a list of weights corresponding to the `remote.data.dirs` in the same order. When `remote.data.dirs.strategy` is set to `WEIGHTED_ROUND_ROBIN`, this must be configured, and its size must be equal to `remote.data.dirs`; otherwise, it will be ignored. |
| `remote.fs.write-buffer-size` | `4 kb` | MemorySize | The default size of the write buffer for writing the local files to remote file systems. |
| `remote.log.task-interval-duration` | `1 min` | Duration | Interval at which remote log manager runs the scheduled tasks like copy segments, clean up remote log segments, delete local log segments etc. If the value is set to 0, it means that the remote log storage is disabled. |
| `remote.log.task-max-upload-segments` | `5` | Integer | The maximum number of log segments to upload to remote storage per tiering task execution. This limits the upload batch size to prevent overwhelming the remote storage when there is a large backlog of segments to upload. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

| `table.auto-partition.key` | `none` | String | This configuration defines the time-based partition key to be used for auto-partitioning when a table is partitioned with multiple keys. Auto-partitioning utilizes a time-based partition key to handle partitions automatically, including creating new ones and removing outdated ones, by comparing the time value of the partition with the current system time. In the case of a table using multiple partition keys (such as a composite partitioning strategy), this feature determines which key should serve as the primary time dimension for making auto-partitioning decisions.And If the table has only one partition key, this config is not necessary. Otherwise, it must be specified. |
| `table.auto-partition.time-unit` | `DAY` | AutoPartitionTimeUnit | The time granularity for auto created partitions. The default value is `DAY`. Valid values are `HOUR`, `DAY`, `MONTH`, `QUARTER`, `YEAR`. If the value is `HOUR`, the partition format for auto created is yyyyMMddHH. If the value is `DAY`, the partition format for auto created is yyyyMMdd. If the value is `MONTH`, the partition format for auto created is yyyyMM. If the value is `QUARTER`, the partition format for auto created is yyyyQ. If the value is `YEAR`, the partition format for auto created is yyyy. |
| `table.auto-partition.time-zone` | `Europe/Paris` | String | The time zone for auto partitions, which is by default the same as the system time zone. |
| `table.auto-partition.time-zone` | `Asia/Shanghai` | String | The time zone for auto partitions, which is by default the same as the system time zone. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch from f3902d1 to 147fa96 Compare March 23, 2026 02:11
@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

XuQianJin-Stars commented Mar 23, 2026

Looks consistent to other filesystem support. Left comments on unrelated changes being included in this PR (looks like it crept back in), can approve once this is addresed.

I think we should have someone else who has access to COS to test this as well as I do not have access to COS to manually test. cc @luoyuxia

hi @leekeiabstraction @luoyuxia Thanks for the review!

I've tested the COS filesystem support with a real COS bucket (region: ap-guangzhou). Here are the test details:

Test classes executed:

  • COSFileSystemBehaviorITCase — Tests COS filesystem with SecretId/SecretKey authentication
  • COSWithTokenFileSystemBehaviorITCase — Tests COS filesystem with security token authentication

Both classes inherit from FileSystemBehaviorTestSuite, which includes the following 19 test cases (per class, 38 total):

# Test Method Description
1 testPathAndScheme Verify URI path and scheme correctness
2 testFileExists Check existing file detection
3 testFileDoesNotExist Check non-existing file detection
4 testExistingFileDeletion Delete an existing file
5 testExistingFileRecursiveDeletion Recursively delete an existing file
6 testNotExistingFileDeletion Delete a non-existing file
7 testNotExistingFileRecursiveDeletion Recursively delete a non-existing file
8 testExistingEmptyDirectoryDeletion Delete an existing empty directory
9 testExistingEmptyDirectoryRecursiveDeletion Recursively delete an existing empty directory
10 testExistingNonEmptyDirectoryDeletion Delete a non-empty directory
11 testExistingNonEmptyDirectoryRecursiveDeletion Recursively delete a non-empty directory
12 testExistingNonEmptyDirectoryWithSubDirRecursiveDeletion Recursively delete a non-empty directory with subdirectories
13 testMkdirsReturnsTrueWhenCreatingDirectory Create directory returns true
14 testMkdirsCreatesParentDirectories Create parent directories
15 testMkdirsReturnsTrueForExistingDirectory Mkdirs on existing directory returns true
16 testMkdirsFailsForExistingFile Mkdirs fails when path is an existing file
17 testMkdirsFailsWithExistingParentFile Mkdirs fails when parent is a file
18 testSimpleFileWriteAndRead Write and read file content
19 testDirectoryListing List directory contents

Result: All 38 tests passed ✅ (19 × 2 test classes, 0 failures, 0 errors)

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch from dd8858e to 4eab96a Compare March 28, 2026 15:21
@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

hi @leekeiabstraction @luoyuxia Hi, i already updated the pr. Please help review when you got some time.

Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for addressing PR comments. Left a couple more questions

public static final String SECRET_KEY = "fs.cosn.userinfo.secretKey";
public static final String CREDENTIALS_PROVIDER = "fs.cosn.credentials.provider";

public static final String ENDPOINT_KEY = "fs.cosn.bucket.endpoint_suffix";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that this is different to what's in the PR description fs.cosn.endpoint. Which is the configuration that we are exposing? fs.cosn.endpoint or fs.cosn.bucket.endpoint_suffix?

Comment on lines +47 to +49
public static final String SECRET_ID = "fs.cosn.userinfo.secretId";
public static final String SECRET_KEY = "fs.cosn.userinfo.secretKey";
public static final String CREDENTIALS_PROVIDER = "fs.cosn.credentials.provider";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like region is a required property, but we are not defining it here?

Ref: https://hadoop.apache.org/docs/stable/hadoop-cos/cloud-storage/index.html

fs.cosn.userinfo.region is an required property for Hadoop-COS. The reason is that Hadoop-COS must know the region of the using bucket in order to accurately construct a URL to access it.


public ObtainedSecurityToken obtainSecurityToken(String scheme) {
// For COS, we directly use the configured secret id and secret key as the token.
// If STS temporary credentials are needed in the future, this can be extended
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update PR description to be consistent with behaviour here

COSSecurityTokenProvider: Generates temporary security tokens using Tencent Cloud STS (Security Token Service) based on existing secretId/secretKey configuration.

Please also create issue to track implementation of using STS to generate token/

Actually, why not implement as part of this PR? I'd like to understand the reasoning.

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

TY for addressing PR comments. Left a couple more questions

Thanks for the review! I've addressed all three previous comments in the latest push:

  1. STS Implementation: COSSecurityTokenProvider now calls the Tencent Cloud STS GetFederationToken API to obtain real temporary credentials (tmpSecretId, tmpSecretKey, sessionToken), consistent with the S3/OSS/OBS implementations. Added tencentcloud-sdk-java-sts:3.1.678 dependency.
  2. REGION constant: Added public static final String REGION = "fs.cosn.userinfo.region" to COSFileSystemPlugin, and it's now used in COSSecurityTokenProvider for STS API calls and passed through additionInfos.
  3. Endpoint key consistency: The code uses fs.cosn.bucket.endpoint_suffix which is the standard Hadoop-COS configuration key. Updated PR description to match.

Also updated the COSSecurityTokenReceiver to always use BasicSessionCredentials (since STS always returns temporary credentials with a session token), updated the NOTICE file, and added test coverage for the REGION config.
Please let me know about the additional questions — happy to address them!

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch from 3fee0c6 to 872cb9f Compare April 15, 2026 03:23
Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the updates. Added further comments.

// set credential provider
if (hadoopConfig.get(SECRET_ID) == null) {
String credentialsProvider = hadoopConfig.get(CREDENTIALS_PROVIDER);
if (credentialsProvider != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have IT for this branch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have IT for this branch?

Good catch — I've added COSWithCredentialsProviderFileSystemBehaviorITCase in the latest commit to cover this CREDENTIALS_PROVIDER branch.
It reuses org.apache.hadoop.fs.cosn.auth.EnvironmentVariableCredentialsProvider shipped with hadoop-cos, which reads COSN_SECRET_ID / COSN_SECRET_KEY from the environment — exactly the same variables already required by COSTestCredentials. So no extra CI configuration is needed: whenever the existing COS ITs are enabled, this new one runs too. PTAL.

Comment thread fluss-filesystems/fluss-fs-cos/pom.xml Outdated
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-cos</artifactId>
<version>${fs.cosn.sdk.version}</version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be better named as hadoop cos version. It is not the sdk version.

private static final String FEDERATION_TOKEN_NAME = "fluss-cos-federation";

/** Default duration seconds for temporary credentials, 1800s = 30min. */
private static final long DEFAULT_DURATION_SECONDS = 1800L;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on why picking lower default than other implementation like OSS/OBS

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch from 09a902b to ab2f9ef Compare April 22, 2026 09:13
Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for addressing comments. Left one more question on one of the test case. Also, are the CI test failures related to changes within this PR?


/** Access to credentials to access COS buckets during integration tests. */
public class COSTestCredentials {
@Nullable private static final String ENDPOINT = System.getenv("COSN_ENDPOINT");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be endpoint suffix?

@simonsssu
Copy link
Copy Markdown

@leekeiabstraction Hi lee, I have tested this manually.

@luoyuxia
Copy link
Copy Markdown
Contributor

Thanks for adding COS support. I reviewed the implementation and found a few issues that should be addressed before merge.

  1. [major] STS tokens are too broadly scoped

COSSecurityTokenProvider currently uses this default policy:

"action": ["name/cos:*"],
"resource": ["*"]

That means the temporary credentials distributed to Fluss clients can access any COS resource allowed by the permanent server-side secret. This is broader than the Fluss remote storage path and could expose non-Fluss data if the permanent credential has wide permissions.

Can we scope the token policy to the configured remote.data.dir bucket/prefix, or require an explicit user-provided token policy instead of defaulting to resource: "*"?

  1. [major] The shaded plugin currently contains two COS SDK variants

hadoop-cos:3.3.5 brings in com.qcloud:cos_api-bundle:5.6.69, while this module also directly depends on com.qcloud:cos_api:5.6.139. During package, Maven Shade reports 902 overlapping com.qcloud.cos.* classes/resources between the two jars.

That makes the actual runtime COS SDK version depend on shade ordering and can produce subtle incompatibilities. Please keep only one COS SDK variant, for example by excluding cos_api-bundle from hadoop-cos if the direct cos_api dependency is required, or by dropping the direct cos_api dependency if the bundled one is sufficient.

  1. [minor] COS documentation uses the wrong environment provider class/env vars

The docs show:

fs.cosn.credentials.provider: org.apache.hadoop.fs.cosn.auth.EnvironmentVariableCredentialProvider

but the actual hadoop-cos class is org.apache.hadoop.fs.cosn.auth.EnvironmentVariableCredentialsProvider (plural Credentials). Also, the provider reads COSN_SECRET_ID and COSN_SECRET_KEY, while the docs say COS_SECRET_ID and COS_SECRET_KEY. Users following the docs would fail authentication.

  1. [minor] Token lifetime in docs does not match the code

The docs say the temporary credential is valid for 30 minutes, but COSSecurityTokenProvider.DEFAULT_DURATION_SECONDS is 3600L (1 hour). Please align the docs or the code.

  1. [nit] cos.md has trailing whitespace

git diff --check reports trailing whitespace in website/docs/maintenance/filesystems/cos.md lines 14 and 17.

Validation I ran:

./mvnw -pl fluss-filesystems/fluss-fs-cos -am -DskipTests compile
./mvnw -pl fluss-filesystems/fluss-fs-cos -am -DskipTests test-compile
./mvnw -pl fluss-filesystems/fluss-fs-cos -am -DskipTests package
git diff --check

The Maven commands passed; package emitted the duplicate COS SDK warnings mentioned above, and git diff --check failed on the trailing whitespace.

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

Thanks for adding COS support. I reviewed the implementation and found a few issues that should be addressed before merge.

  1. [major] STS tokens are too broadly scoped

COSSecurityTokenProvider currently uses this default policy:

"action": ["name/cos:*"],
"resource": ["*"]

That means the temporary credentials distributed to Fluss clients can access any COS resource allowed by the permanent server-side secret. This is broader than the Fluss remote storage path and could expose non-Fluss data if the permanent credential has wide permissions.

Can we scope the token policy to the configured remote.data.dir bucket/prefix, or require an explicit user-provided token policy instead of defaulting to resource: "*"?

  1. [major] The shaded plugin currently contains two COS SDK variants

hadoop-cos:3.3.5 brings in com.qcloud:cos_api-bundle:5.6.69, while this module also directly depends on com.qcloud:cos_api:5.6.139. During package, Maven Shade reports 902 overlapping com.qcloud.cos.* classes/resources between the two jars.

That makes the actual runtime COS SDK version depend on shade ordering and can produce subtle incompatibilities. Please keep only one COS SDK variant, for example by excluding cos_api-bundle from hadoop-cos if the direct cos_api dependency is required, or by dropping the direct cos_api dependency if the bundled one is sufficient.

  1. [minor] COS documentation uses the wrong environment provider class/env vars

The docs show:

fs.cosn.credentials.provider: org.apache.hadoop.fs.cosn.auth.EnvironmentVariableCredentialProvider

but the actual hadoop-cos class is org.apache.hadoop.fs.cosn.auth.EnvironmentVariableCredentialsProvider (plural Credentials). Also, the provider reads COSN_SECRET_ID and COSN_SECRET_KEY, while the docs say COS_SECRET_ID and COS_SECRET_KEY. Users following the docs would fail authentication.

  1. [minor] Token lifetime in docs does not match the code

The docs say the temporary credential is valid for 30 minutes, but COSSecurityTokenProvider.DEFAULT_DURATION_SECONDS is 3600L (1 hour). Please align the docs or the code.

  1. [nit] cos.md has trailing whitespace

git diff --check reports trailing whitespace in website/docs/maintenance/filesystems/cos.md lines 14 and 17.

Validation I ran:

./mvnw -pl fluss-filesystems/fluss-fs-cos -am -DskipTests compile
./mvnw -pl fluss-filesystems/fluss-fs-cos -am -DskipTests test-compile
./mvnw -pl fluss-filesystems/fluss-fs-cos -am -DskipTests package
git diff --check

The Maven commands passed; package emitted the duplicate COS SDK warnings mentioned above, and git diff --check failed on the trailing whitespace.

Thanks for the thorough review! The PR has been updated in 80c5435 to address all five points:

  1. STS policy scopingCOSSecurityTokenProvider now derives a default policy from remote.data.dir, restricting name/cos:* to the target bucket (and prefix). Users can override it via fs.cosn.security.token.policy.
  2. Duplicate COS SDK — excluded com.qcloud:cos_api-bundle from hadoop-cos so only the directly declared cos_api ends up on the classpath. NOTICE updated accordingly. The shade duplicate-class warnings are gone.
  3. Docs credential provider — fixed to EnvironmentVariableCredentialsProvider and the correct env vars COSN_SECRET_ID / COSN_SECRET_KEY.
  4. Token lifetime — docs now say 1 hour to match DEFAULT_DURATION_SECONDS = 3600L.
  5. Trailing whitespace — cleaned up in website/docs/maintenance/filesystems/cos.md; git diff --check is clean.
    PTAL when you have time.

This adds a new fluss-fs-cos module that integrates Tencent Cloud COS
(Cloud Object Storage) as a remote filesystem for Fluss.

Key changes:
- Add fluss-fs-cos module with hadoop-cos and cos_api dependencies
- Implement COSFileSystem extending HadoopFileSystem (scheme: cosn)
- Implement COSFileSystemPlugin as FileSystemPlugin SPI
- Add security token support (COSSecurityTokenProvider/Receiver)
- Add DynamicTemporaryCOSCredentialsProvider for temporary credentials
- Add integration tests for COS filesystem behavior
- Register module in fluss-filesystems parent pom
- Rename Maven property fs.cosn.sdk.version to fs.hadoop.cos.version to accurately reflect that it is the hadoop-cos version rather than the Tencent Cloud COS SDK version.

- Align COSSecurityTokenProvider default token duration with OSS/OBS (30min -> 1h).

- Add COSWithCredentialsProviderFileSystemBehaviorITCase to cover the CREDENTIALS_PROVIDER branch in COSFileSystemPlugin, reusing hadoop-cos's EnvironmentVariableCredentialsProvider.
…S SDK, fix docs

1. STS tokens scoped to remote.data.dir bucket: COSSecurityTokenProvider now derives a default policy that limits name/cos:* to the bucket (and prefix) parsed from fsUri. Users can override via fs.cosn.security.token.policy.

2. Drop duplicate COS SDK: exclude com.qcloud:cos_api-bundle from hadoop-cos so only the directly declared cos_api remains on the classpath, eliminating 902 overlapping com.qcloud.cos.* classes during shade. NOTICE updated accordingly.

3. Docs: use the correct hadoop-cos provider class EnvironmentVariableCredentialsProvider and the correct env vars COSN_SECRET_ID / COSN_SECRET_KEY.

4. Docs: align temporary credential lifetime with the code (1 hour, matches DEFAULT_DURATION_SECONDS=3600L).

5. Docs: remove trailing whitespace in website/docs/maintenance/filesystems/cos.md.
@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/cos-filesystem-integration branch from 80c5435 to 765310f Compare May 22, 2026 08:32
Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[File system] Add Tencent Cloud COS integration

5 participants