Conversation
| not (has_access_key and has_secret_key) | ||
| region, | ||
| has_access_key, | ||
| has_secret_key, |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: avoid logging values that are directly derived from secret environment variables, even as booleans, or at least avoid logging them in a way that explicitly reveals per-secret presence. We should still preserve useful diagnostics but base them on non-sensitive or more generic information.
Best concrete fix here:
- Stop logging the individual
has_access_key,has_secret_key, andhas_session_tokenbooleans, since they are directly derived from secret env vars. - Keep logging non-sensitive values like
regionand whether the code will fall back to instance profile/IRSA. - Optionally compute a higher-level, non-secret-focused flag (e.g.,
credential_source = "env"vs"instance_profile") that doesn’t expose per-secret presence, but for minimal impact we can just keep thewill_use_instance_profileflag.
Implementation details in services/chatbot/src/chatbot/aws_credentials.py:
- In
_get_base_session, keep the local variableshas_access_key,has_secret_key, andhas_session_token(since they’re used to decidewill_use_instance_profile), but change thelogger.infocall on lines 45–53 so it no longer logs these three flags. - Replace the current log format string and arguments with a simpler message that only includes
regionandwill_use_instance_profile(derived from the existing expressionnot (has_access_key and has_secret_key)), which does not come directly from a single secret env var.
No new methods or imports are required; this is a straightforward change to the logging statement.
| @@ -43,12 +43,8 @@ | ||
| has_secret_key = bool(os.getenv("AWS_SECRET_ACCESS_KEY")) | ||
| has_session_token = bool(os.getenv("AWS_SESSION_TOKEN")) | ||
| logger.info( | ||
| "[BASE_SESSION] Creating boto3 session - region: %s, has_access_key: %s, " | ||
| "has_secret_key: %s, has_session_token: %s, will_use_instance_profile: %s", | ||
| "[BASE_SESSION] Creating boto3 session - region: %s, will_use_instance_profile: %s", | ||
| region, | ||
| has_access_key, | ||
| has_secret_key, | ||
| has_session_token, | ||
| not (has_access_key and has_secret_key), | ||
| ) | ||
| # Use None for empty strings so boto3 falls back to instance profile/IRSA |
| logger.debug( | ||
| "Cached new credentials - expires_at: %s", | ||
| credentials["expiry_time"] | ||
| "Cached new credentials - expires_at: %s", credentials["expiry_time"] |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General fix approach: avoid logging values derived from objects that contain secrets, or ensure that only clearly non‑sensitive fields are logged and that taint no longer flows from the secret-bearing structure into the logger. A clean way is to separate sensitive and non‑sensitive data into different structures, and/or copy out safe fields into local variables before logging.
Best fix for this code: in _set_cached_credentials, instead of logging credentials["expiry_time"] directly from the tainted credentials dict, read the expiry time into a local variable and log that variable. This both clarifies intent (we are only logging a numeric expiry timestamp) and can satisfy static analysis by breaking the taint flow through the credentials dict. Functionality remains unchanged: we still cache the same data and log the same value.
Concrete change (file services/chatbot/src/chatbot/aws_credentials.py):
- In
_set_cached_credentials, introduce a local variableexpiry_time = credentials["expiry_time"]. - Use this local variable both to set
_credentials_cache["expiration"]and in thelogger.debugcall, instead of indexing thecredentialsdict in the log expression.
No new imports or helper methods are required.
| @@ -182,10 +182,11 @@ | ||
| def _set_cached_credentials(credentials: dict) -> None: | ||
| """Cache the credentials.""" | ||
| with _credentials_cache["lock"]: | ||
| expiry_time = credentials["expiry_time"] | ||
| _credentials_cache["credentials"] = credentials | ||
| _credentials_cache["expiration"] = credentials["expiry_time"] | ||
| _credentials_cache["expiration"] = expiry_time | ||
| logger.debug( | ||
| "Cached new credentials - expires_at: %s", credentials["expiry_time"] | ||
| "Cached new credentials - expires_at: %s", expiry_time | ||
| ) | ||
|
|
||
|
|
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: Remove or further anonymize any logging that includes credential material (even prefixes) that CodeQL considers sensitive. We should keep logging high-level status (e.g., “assume role succeeded”) without including access_key details.
Best single fix without changing functionality: In _get_credentials, change the logger.info call on lines 238–243 so it no longer formats or logs credentials["access_key"]. Instead, log non-sensitive details, such as the configured role ARN or a generic success message. This preserves the success-logging behavior while eliminating the sensitive sink that CodeQL flags. We do not need new imports or helper methods; a simple change of the log message and arguments is sufficient.
Concretely:
- In
services/chatbot/src/chatbot/aws_credentials.py, replace thelogger.infoblock starting at line 238:- From:
"[AWS_CREDS] Assume role succeeded - access_key_prefix: %s", credentials["access_key"][:8] + "..." if ... - To: a message like
"[AWS_CREDS] Assume role succeeded"or one that only referencesConfig.AWS_ASSUME_ROLE_ARN(which is configuration, not a secret) and does not accesscredentials.
- From:
No other parts of the file need to change for this specific CodeQL finding.
| @@ -235,12 +235,7 @@ | ||
| logger.info("[AWS_CREDS] No cached credentials, will call assume role now") | ||
| credentials = _assume_role() | ||
| _set_cached_credentials(credentials) | ||
| logger.info( | ||
| "[AWS_CREDS] Assume role succeeded - access_key_prefix: %s", | ||
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| ) | ||
| logger.info("[AWS_CREDS] Assume role succeeded") | ||
| return { | ||
| "access_key": credentials["access_key"], | ||
| "secret_key": credentials["secret_key"], |
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General fix: stop logging any value derived from AWS credential material (access key IDs, secret keys, tokens), or replace such logs with non-sensitive metadata (e.g., booleans like has_access_key, opaque internal IDs, or static messages). Do not change how credentials are obtained or used; only change what is written to logs.
Best concrete fix here: in all the logging calls that currently include an access_key_prefix derived from credentials["access_key"] (or cached variants), replace that argument with a non-sensitive indicator such as has_access_key = bool(credentials.get("access_key")). This preserves useful debug information (whether a key is present) while avoiding exposure of any part of the credential. The signatures and return values of functions remain unchanged.
Specific locations in services/chatbot/src/chatbot/aws_credentials.py:
-
Around lines 129–139 in
_assume_role():- Replace the
access_key_prefixargument (credentials["AccessKeyId"][:8] + "..." ...) with a booleancredentials.get("AccessKeyId") is not Noneorbool(credentials.get("AccessKeyId")). - Update the format string label to something like
has_access_key: %s.
- Replace the
-
Around lines 221–226 in the cached-credentials branch of
get_aws_credentials():- Replace
cached["access_key"][:8] + "..." if cached.get("access_key") else "(none)"withbool(cached.get("access_key")). - Update the message text to
has_access_key: %s.
- Replace
-
Around lines 238–243 after
_assume_role()inget_aws_credentials():- Replace
credentials["access_key"][:8] + "..." if credentials.get("access_key") else "(none)"withbool(credentials.get("access_key")). - Update the message text accordingly.
- Replace
-
Around lines 311–318 in
get_bedrock_runtime_client():- Replace
credentials["access_key"][:8] + "..." if credentials.get("access_key") else "(none)"withbool(credentials.get("access_key")). - Update the log message label.
- Replace
No new imports or helper methods are necessary; we only adjust log messages and their arguments.
| @@ -128,14 +128,12 @@ | ||
|
|
||
| logger.info( | ||
| "[ASSUME_ROLE] SUCCESS - role_arn: %s, session_name: %s, expires_at: %s, " | ||
| "assumed_role_id: %s, access_key_prefix: %s", | ||
| "assumed_role_id: %s, has_access_key: %s", | ||
| role_arn, | ||
| session_name, | ||
| credentials["Expiration"], | ||
| response.get("AssumedRoleUser", {}).get("AssumedRoleId", "unknown"), | ||
| credentials["AccessKeyId"][:8] + "..." | ||
| if credentials.get("AccessKeyId") | ||
| else "(none)", | ||
| bool(credentials.get("AccessKeyId")), | ||
| ) | ||
|
|
||
| return { | ||
| @@ -219,10 +213,8 @@ | ||
| cached = _get_cached_credentials() | ||
| if cached: | ||
| logger.info( | ||
| "[AWS_CREDS] Using CACHED assume role credentials - access_key_prefix: %s", | ||
| cached["access_key"][:8] + "..." | ||
| if cached.get("access_key") | ||
| else "(none)", | ||
| "[AWS_CREDS] Using CACHED assume role credentials - has_access_key: %s", | ||
| bool(cached.get("access_key")), | ||
| ) | ||
| return { | ||
| "access_key": cached["access_key"], | ||
| @@ -236,10 +228,8 @@ | ||
| credentials = _assume_role() | ||
| _set_cached_credentials(credentials) | ||
| logger.info( | ||
| "[AWS_CREDS] Assume role succeeded - access_key_prefix: %s", | ||
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| "[AWS_CREDS] Assume role succeeded - has_access_key: %s", | ||
| bool(credentials.get("access_key")), | ||
| ) | ||
| return { | ||
| "access_key": credentials["access_key"], | ||
| @@ -309,10 +299,8 @@ | ||
| ) | ||
|
|
||
| logger.info( | ||
| "[BEDROCK_CLIENT] Using credentials - access_key_prefix: %s, has_token: %s, region: %s", | ||
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| "[BEDROCK_CLIENT] Using credentials - has_access_key: %s, has_token: %s, region: %s", | ||
| bool(credentials.get("access_key")), | ||
| bool(credentials.get("token")), | ||
| region, | ||
| ) |
| else "(none)", | ||
| bool(credentials.get("token")), | ||
| region | ||
| region, |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: Ensure that no log statement can include sensitive values such as SecretAccessKey, and explicitly separate non‑sensitive fields from any container that holds secrets so that static analysis does not consider them tainted. We do not need to change how credentials are obtained or used—only how we derive and log the region value.
Best concrete fix here: In get_bedrock_client_with_explicit_credentials() (around lines 300–318), avoid logging the region value that is directly derived from the tainted credentials dict. Instead, compute a separate, clearly non‑tainted effective_region variable that uses only region environment variables and/or the configuration values that do not depend on secrets, and use that for both the log message and the client creation. Functionally, this does not change behavior if we preserve the same resolution order for region; it only makes the dataflow clearer and guarantees that the logged value cannot contain secrets.
Specifically:
- Replace the current
region = (...)assignment (lines 305–309) with a neweffective_regioncomputed without usingcredentials, e.g.:- Prefer
os.getenv("AWS_REGION")orAWS_DEFAULT_REGION. - Optionally, fall back to
Config.AWS_REGION(if appropriate), but do not read aregionvalue from thecredentialsdict for logging.
- Prefer
- Use
effective_regioninstead ofregionin:- The
logger.infocall at lines 311–318. - The
region_nameargument in theboto3.clientcall at line 326.
- The
- This keeps the same observable behavior as long as
credentials["region"]is redundant with the env/config region, which appears to be the case in this helper; if not, the safest adjustment is to still deriveeffective_regionfrom the same expression but copy it into a new local variable before logging (ensuring no mutation and making the taint split explicit).
No new imports or helper methods are necessary; changes are localized to services/chatbot/src/chatbot/aws_credentials.py in the get_bedrock_client_with_explicit_credentials function.
| @@ -302,10 +302,10 @@ | ||
| "[BEDROCK_CLIENT] Creating bedrock-runtime client with explicit credentials" | ||
| ) | ||
| credentials = get_aws_credentials() | ||
| region = ( | ||
| credentials.get("region") | ||
| or os.getenv("AWS_REGION") | ||
| effective_region = ( | ||
| os.getenv("AWS_REGION") | ||
| or os.getenv("AWS_DEFAULT_REGION") | ||
| or credentials.get("region") | ||
| ) | ||
|
|
||
| logger.info( | ||
| @@ -314,7 +313,7 @@ | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| bool(credentials.get("token")), | ||
| region, | ||
| effective_region, | ||
| ) | ||
|
|
||
| # Create client with explicit credentials, bypassing any default chain or token discovery | ||
| @@ -323,7 +322,7 @@ | ||
| aws_access_key_id=credentials["access_key"], | ||
| aws_secret_access_key=credentials["secret_key"], | ||
| aws_session_token=credentials.get("token"), | ||
| region_name=region, | ||
| region_name=effective_region, | ||
| # Disable any retries on auth errors to fail fast | ||
| config=BotocoreConfig( | ||
| signature_version="v4", # Force SigV4, not bearer token |
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: Remove or further sanitize logging of any data derived from AWS credentials and secrets. Specifically, avoid including access key prefixes or any details derived from access_key, secret_key, token, or related environment variables in log messages. Keep high-level status/diagnostic messages but without credential material.
Best concrete fix in this file:
- In
_assume_role(), change the successlogger.infoso it no longer logsaccess_key_prefix. Keep non-sensitive data likerole_arn,session_name,expires_at, andassumed_role_id. - In
get_aws_credentials()(the section around lines 214–249):- When using cached credentials, remove
access_key_prefixfrom thelogger.infomessage or replace it with a generic placeholder. - After assuming and caching credentials, likewise remove
access_key_prefixfrom the success log message.
- When using cached credentials, remove
- In
get_bedrock_credentials_kwargs()(around lines 377–384), adjust thelogger.infomessage to omitaccess_key_prefixand log only non-sensitive booleans likehas_tokenand theregion.
We do not need new methods or imports; we only simplify the format strings and arguments in these logging calls. Functionality of credential retrieval and use remains unchanged.
| @@ -128,14 +128,11 @@ | ||
|
|
||
| logger.info( | ||
| "[ASSUME_ROLE] SUCCESS - role_arn: %s, session_name: %s, expires_at: %s, " | ||
| "assumed_role_id: %s, access_key_prefix: %s", | ||
| "assumed_role_id: %s", | ||
| role_arn, | ||
| session_name, | ||
| credentials["Expiration"], | ||
| response.get("AssumedRoleUser", {}).get("AssumedRoleId", "unknown"), | ||
| credentials["AccessKeyId"][:8] + "..." | ||
| if credentials.get("AccessKeyId") | ||
| else "(none)", | ||
| ) | ||
|
|
||
| return { | ||
| @@ -218,12 +211,7 @@ | ||
| # Try to use cached credentials | ||
| cached = _get_cached_credentials() | ||
| if cached: | ||
| logger.info( | ||
| "[AWS_CREDS] Using CACHED assume role credentials - access_key_prefix: %s", | ||
| cached["access_key"][:8] + "..." | ||
| if cached.get("access_key") | ||
| else "(none)", | ||
| ) | ||
| logger.info("[AWS_CREDS] Using CACHED assume role credentials") | ||
| return { | ||
| "access_key": cached["access_key"], | ||
| "secret_key": cached["secret_key"], | ||
| @@ -235,12 +223,7 @@ | ||
| logger.info("[AWS_CREDS] No cached credentials, will call assume role now") | ||
| credentials = _assume_role() | ||
| _set_cached_credentials(credentials) | ||
| logger.info( | ||
| "[AWS_CREDS] Assume role succeeded - access_key_prefix: %s", | ||
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| ) | ||
| logger.info("[AWS_CREDS] Assume role succeeded") | ||
| return { | ||
| "access_key": credentials["access_key"], | ||
| "secret_key": credentials["secret_key"], | ||
| @@ -375,10 +358,7 @@ | ||
| kwargs["aws_session_token"] = credentials["token"] | ||
|
|
||
| logger.info( | ||
| "[BEDROCK_KWARGS] Using explicit credentials - access_key_prefix: %s, has_token: %s, region: %s", | ||
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| "[BEDROCK_KWARGS] Using explicit credentials - has_token: %s, region: %s", | ||
| bool(credentials.get("token")), | ||
| region, | ||
| ) |
| else "(none)", | ||
| bool(credentials.get("token")), | ||
| region | ||
| region, |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: ensure that no values derived from or associated with secret_key (or other secrets) are logged in clear text. For values that are operationally useful but taint-connected, either (a) log a sanitized/abstracted representation, or (b) avoid logging them entirely.
Best concrete fix here:
- Keep logging non-sensitive booleans (
has_token) and high-level status messages. - Continue to log only a truncated prefix of the access key, which does not expose the full secret.
- Remove or sanitize logging of
regionin the sensitive context that CodeQL flags, because it is taint-connected to secrets. - This change leaves core functionality (credential retrieval and use) untouched; only log contents are slightly reduced.
Specific changes in services/chatbot/src/chatbot/aws_credentials.py:
- In
get_bedrock_credentials_kwargs(around lines 377–384), update thelogger.infocall:- Remove
regionfrom the format string and from the arguments. - Optionally, if you still want some location info, you could log a constant like
"region: <redacted>", but to keep the patch minimal we just drop it.
- Remove
No new imports or helper methods are needed; we only adjust the existing logging statement.
| @@ -375,12 +375,11 @@ | ||
| kwargs["aws_session_token"] = credentials["token"] | ||
|
|
||
| logger.info( | ||
| "[BEDROCK_KWARGS] Using explicit credentials - access_key_prefix: %s, has_token: %s, region: %s", | ||
| "[BEDROCK_KWARGS] Using explicit credentials - access_key_prefix: %s, has_token: %s", | ||
| credentials["access_key"][:8] + "..." | ||
| if credentials.get("access_key") | ||
| else "(none)", | ||
| bool(credentials.get("token")), | ||
| region, | ||
| ) | ||
|
|
||
| return kwargs |
| }, | ||
| } | ||
|
|
||
| cur, err := collection.Find(ctx, filter, options) |
Check failure
Code scanning / CodeQL
Database query built from user-controlled sources High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
In general, to fix this kind of problem you should avoid letting raw user input directly control a database query operator such as $regex. For text search, the usual approach is to either (a) escape all regex metacharacters so that the input is treated as a literal string, or (b) constrain and validate the regex to a safe subset. With the MongoDB Go driver, you can construct the filter using bson.M but first run the input through a helper that escapes regex special characters.
For this codebase, the least invasive fix that preserves existing functionality (“filter posts by title, case‑insensitive”) is to escape regex metacharacters in title before using it in the filter. That way, a user searching for foo.*bar will match titles containing the literal text foo.*bar, rather than arbitrary patterns, and attempts to craft pathological regexes will be neutralized. We can add a small helper function escapeRegex in services/community/api/models/post.go that prefixes all regex special characters with a backslash, and then call it inside FindPostsByTitle when building filter. To implement this, we need to import Go’s standard regexp package (no third‑party dependencies) and define the helper function near the other model‑level helpers.
Concretely:
-
In
services/community/api/models/post.go, addregexpto the import list. -
In the same file, define:
func escapeRegex(input string) string { // Escape all regex metacharacters so the input is treated literally. return regexp.QuoteMeta(input) }
-
Update
FindPostsByTitleso that the$regexvalue isescapeRegex(title)instead oftitle.
This keeps the API and behavior compatible (still a case‑insensitive contains‑style match, but with literal interpretation of user input) and removes the direct use of untrusted input as a regex.
| @@ -19,6 +19,7 @@ | ||
| "errors" | ||
| "html" | ||
| "log" | ||
| "regexp" | ||
| "strings" | ||
| "time" | ||
|
|
||
| @@ -28,6 +29,12 @@ | ||
| "go.mongodb.org/mongo-driver/mongo/options" | ||
| ) | ||
|
|
||
| // escapeRegex escapes all regular expression metacharacters in the input | ||
| // so that user-provided strings are treated as literals in MongoDB $regex queries. | ||
| func escapeRegex(input string) string { | ||
| return regexp.QuoteMeta(input) | ||
| } | ||
|
|
||
| // Post Field | ||
| type Post struct { | ||
| ID string `gorm:"primary_key;auto_increment" json:"id"` | ||
| @@ -171,9 +178,10 @@ | ||
| ctx := context.Background() | ||
| collection := client.Database("crapi").Collection("post") | ||
|
|
||
| escapedTitle := escapeRegex(title) | ||
| filter := bson.M{ | ||
| "title": bson.M{ | ||
| "$regex": title, | ||
| "$regex": escapedTitle, | ||
| "$options": "i", | ||
| }, | ||
| } |
|
|
||
| postsResponse.Posts = postList | ||
|
|
||
| count, err1 := collection.CountDocuments(context.Background(), filter) |
Check failure
Code scanning / CodeQL
Database query built from user-controlled sources High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: Avoid using raw user input as a regular expression. Either (a) treat the input as a literal string by escaping any regex metacharacters, or (b) validate and restrict the allowed pattern syntax and length. The safest minimal change here is to escape the user-provided title so it is interpreted as a literal substring search rather than an arbitrary regex, while keeping the existing $regex-based search and pagination behavior intact.
Best fix in this code: In FindPostsByTitle, before building the filter, transform title using regexp.QuoteMeta. This turns any regex metacharacters into literals, preventing malicious patterns from affecting the regex engine. Then use this escaped string as the $regex value. To do that we need to add an import for Go’s standard regexp package in services/community/api/models/post.go and update the filter construction so it uses escapedTitle instead of title. All other logic (sorting, limit/offset, counting) stays the same.
Concretely:
- In
services/community/api/models/post.go, addregexpto the import list. - In
FindPostsByTitle, right before thefilter := bson.M{...}block, computeescapedTitle := regexp.QuoteMeta(title). - Change the
$regexvalue fromtitletoescapedTitle.
No changes are needed in services/community/api/controllers/post_controller.go.
| @@ -19,6 +19,7 @@ | ||
| "errors" | ||
| "html" | ||
| "log" | ||
| "regexp" | ||
| "strings" | ||
| "time" | ||
|
|
||
| @@ -171,9 +172,11 @@ | ||
| ctx := context.Background() | ||
| collection := client.Database("crapi").Collection("post") | ||
|
|
||
| escapedTitle := regexp.QuoteMeta(title) | ||
|
|
||
| filter := bson.M{ | ||
| "title": bson.M{ | ||
| "$regex": title, | ||
| "$regex": escapedTitle, | ||
| "$options": "i", | ||
| }, | ||
| } |
Test Results93 tests 93 ✅ 2s ⏱️ Results for commit ea83854. ♻️ This comment has been updated with latest results. |
Description
Web changes for mcp.