-
Notifications
You must be signed in to change notification settings - Fork 848
Fix origin_server_auth URL encoding for mixed-encoding URLs #12802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Added unit tests and autest to demonstrate the isUriEncoded() bug in the origin_server_auth plugin (s3_auth). Bug: When a URL has mixed encoding (some characters encoded, some not), isUriEncoded() incorrectly returns true upon finding the first %XX sequence, assuming the entire string is already encoded. This causes canonicalEncode() to skip re-encoding unencoded characters like parentheses. Example: Client sends: /app/(channel)/%5B%5Bparts%5D%5D/page.js isUriEncoded(): Returns true (finds %5B) canonicalEncode(): Returns as-is (thinks already encoded) Signature for: /app/(channel)/%5B%5Bparts%5D%5D/page.js <- WRONG S3 expects: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js This results in S3 returning 403 SignatureDoesNotMatch. Unit tests marked with [!mayfail] to document expected-to-fail behavior. Autest passes with mock server but would fail with real S3.
Fixed the isUriEncoded() and canonicalEncode() functions to properly handle URLs with mixed encoding (some characters encoded, some not). Bug: URLs like /app/(channel)/%5B%5Bparts%5D%5D/page.js would cause S3 signature mismatch because: - isUriEncoded() found %5B and returned true (incorrectly) - canonicalEncode() returned the string as-is - Signature was calculated for partially-encoded path - S3 expected signature for fully-encoded path Fix: 1. isUriEncoded() now checks the ENTIRE string and returns false if ANY character that should be encoded is found unencoded. 2. canonicalEncode() now always decodes first, then re-encodes using AWS canonical rules. This ensures consistent output regardless of input. Added uriDecode() helper function to decode percent-encoded characters. Updated unit tests to verify the correct behavior. This fixes YTSATS-4835.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug in the origin_server_auth plugin where mixed URL encoding (some characters encoded, others not) caused AWS SigV4 signature mismatches with S3/GCP.
Changes:
- Added
uriDecode()function to decode percent-encoded URLs - Modified
isUriEncoded()to check if ALL characters requiring encoding are encoded (not just if ANY encoded character exists) - Modified
canonicalEncode()to decode-then-reencode partially-encoded URLs for consistent canonical output
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/origin_server_auth/aws_auth_v4.cc | Added uriDecode() function and fixed isUriEncoded()/canonicalEncode() to handle mixed encoding |
| plugins/origin_server_auth/unit_tests/test_aws_auth_v4.h | Exposed uriDecode() and canonicalEncode() functions for unit testing |
| plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc | Added comprehensive unit tests for mixed encoding scenarios and S3 safe characters |
| tests/gold_tests/pluginTest/origin_server_auth/s3_url_encoding.test.py | Added integration test demonstrating the bug and verifying the fix |
| tests/gold_tests/pluginTest/origin_server_auth/rules/s3_url_encoding.test_input | Added test configuration with fake AWS credentials |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/gold_tests/pluginTest/origin_server_auth/s3_url_encoding.test.py
Outdated
Show resolved
Hide resolved
AWS SigV4 requires uppercase hex digits in percent-encoding (e.g., %2F not %2f). URLs with lowercase hex were incorrectly passing through isUriEncoded() without normalization, causing potential signature mismatch with S3. Changes: - isUriEncoded() now returns false for lowercase hex digits, triggering canonicalEncode() to normalize via decode/re-encode - Added comprehensive tests for lowercase hex handling - Improved documentation comments Fixes issue identified by Copilot review.
Simple URIs like /path/foo.jpg with only unreserved characters were unnecessarily being decoded and re-encoded. Now isUriEncoded() returns true for strings that are already in canonical form (no reserved chars needing encoding), avoiding the unnecessary processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Cast char to unsigned char before calling isxdigit/isalnum/islower to avoid undefined behavior with signed char - Update comments to past tense since the bug is now fixed - Fix bounds check order in isUriEncoded to check length before access
The function now returns true for strings that are already in canonical form, which includes both: - Properly percent-encoded strings with uppercase hex - Strings with only unreserved characters (no encoding needed) The old name 'isUriEncoded' was misleading since it returned true for strings that have NO encoding at all.
How do you know if it's partial/mixed encoding? The input might be unencoded and the expectation could be encoding |
Summary
Fixed the
isUriEncoded()andcanonicalEncode()functions in the origin_server_auth plugin to properly handle URLs with mixed encoding (some characters encoded, some not).Bug Description
When a URL has mixed encoding (e.g.,
/app/(channel)/%5B%5Bparts%5D%5D/page.jswhere parentheses are NOT encoded but brackets ARE encoded), the AWS v4 signature calculation was incorrect:isUriEncoded()found%5Band returnedtrue, incorrectly assuming the entire string was fully encodedcanonicalEncode()returned the string as-isAWS SigV4 Canonical URI Encoding
Per the AWS SigV4 spec:
This means characters like
(,),!,*,'must be percent-encoded in the canonical URI for signature calculation, even though they are listed as "safe" for S3 object key names.(%28))%29)!%21)*%2A)'%27)Fix
isUriEncoded(): Now checks the ENTIRE string and returnsfalseif ANY character that should be encoded is found unencodedcanonicalEncode(): For partially-encoded strings, decodes first then re-encodes to ensure consistent canonical outputuriDecode()helper functionTesting
!,*,',(,))References