From bfcb33d64858d9da43ed4ef1a1b0c05239b15f19 Mon Sep 17 00:00:00 2001 From: Stuart Corbishley Date: Fri, 29 May 2026 17:31:53 +0200 Subject: [PATCH 1/2] Reduce credential-resolution error logging noise (#4814) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Credential-resolution failures were logged with Logger.error from two sites (Resolver and RunChannel), and three of five conditions were logged twice. Because Sentry's LoggerHandler captures :error-level logs, each one became a Sentry event — mostly for user-actionable or transient conditions nobody can act on from an ops seat. Centralise logging in Lightning.Credentials.Resolver (the single run-credential-resolution layer) via a log_resolution_error/2 helper, and remove the duplicate Logger.error calls from RunChannel. Per condition: - project_not_found -> error (genuine invariant violation) - environment_not_configured -> warning (user-actionable config) - environment_mismatch -> warning (user-actionable config) - temporary_failure -> info (transient provider 429/503) - reauthorization_required -> info (user/IdP state, already audited) Only project_not_found now reaches Sentry. Also fix the user-facing "OAuth" casing in the two channel error replies, and add a logging guideline documenting the level policy and single-log-site rule. --- .claude/guidelines/logging.md | 30 ++++ CLAUDE.md | 1 + lib/lightning/credentials/resolver.ex | 35 ++++- lib/lightning_web/channels/run_channel.ex | 15 +- priv/gettext/en/LC_MESSAGES/errors.po | 2 +- test/lightning/credentials/resolver_test.exs | 141 +++++++++++++++++- .../channels/run_channel_test.exs | 2 +- 7 files changed, 199 insertions(+), 27 deletions(-) create mode 100644 .claude/guidelines/logging.md diff --git a/.claude/guidelines/logging.md b/.claude/guidelines/logging.md new file mode 100644 index 00000000000..5e37beaf19e --- /dev/null +++ b/.claude/guidelines/logging.md @@ -0,0 +1,30 @@ +# Logging - Logger Levels and Sentry Noise + +This guideline covers how to choose `Logger` levels in the Lightning Phoenix app so that real bugs stay visible and Sentry stays signal-rich. + +## Level Policy + +- **`Logger.info` / `Logger.warning`** - expected, user-actionable, or transient conditions. Examples: a user must re-authorise a credential, a project is misconfigured, an upstream provider returned 429/503. +- **`Logger.error`** - reserve for genuine application faults and invariant violations (the things you actually want to be paged about). + +## Why It Matters + +Sentry's `LoggerHandler` is configured with `level: :error` and `capture_log_messages: true` in `lib/lightning/application.ex`. That means **every `Logger.error` becomes a Sentry error event** - including bare log messages, not just exceptions. + +Using `:error` for non-faults creates Sentry noise that buries real bugs. Downgrading to `info`/`warning` keeps the line visible in the logs (important for self-hosted / logs-only operators - the plain-text formatter in `config/config.exs` includes `:run_id`) without paging anyone. + +## Single Log Site + +Log a given condition at exactly **one** site, not at every layer it propagates through. + +For credential resolution that site is `Lightning.Credentials.Resolver` (`lib/lightning/credentials/resolver.ex`) - the run-credential-resolution layer - **not** `LightningWeb.Channels.RunChannel`, which only formats the error and replies. + +## Worked Example: Credential Resolution + +| Reason | Level | Rationale | +| --- | --- | --- | +| `project_not_found` | `error` | Genuine invariant violation | +| `environment_not_configured` | `warning` | User-actionable config | +| `environment_mismatch` | `warning` | User-actionable config | +| `reauthorization_required` | `info` | User / IdP state | +| `temporary_failure` | `info` | Transient provider error | diff --git a/CLAUDE.md b/CLAUDE.md index fea632d0f8a..59b9934e443 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -210,4 +210,5 @@ Detailed guidelines in `.claude/guidelines/`: - `e2e-testing.md` - Playwright E2E testing - `yex-guidelines.md` - Critical Yex (Yjs/Elixir) usage rules - `toast-notifications.md` - Notification patterns +- `logging.md` - Logger level conventions and Sentry noise (info/warning vs error) - `ui-patterns.md` - Button variants, disabled states, Tailwind conventions \ No newline at end of file diff --git a/lib/lightning/credentials/resolver.ex b/lib/lightning/credentials/resolver.ex index a1df6ffdac3..080bdafd459 100644 --- a/lib/lightning/credentials/resolver.ex +++ b/lib/lightning/credentials/resolver.ex @@ -95,6 +95,7 @@ defmodule Lightning.Credentials.Resolver do {:ok, ResolvedCredential.from(credential, body)} {:error, reason} -> + log_resolution_error(reason) {:error, {reason, credential}} end end @@ -105,14 +106,11 @@ defmodule Lightning.Credentials.Resolver do {:ok, ResolvedCredential.from(credential, body)} {:error, :environment_not_found} -> - Logger.error( - "Credential environment does not match project environment", - project_env: project_env - ) - + log_resolution_error(:environment_mismatch, project_env: project_env) {:error, {:environment_mismatch, credential}} {:error, reason} -> + log_resolution_error(reason) {:error, {reason, credential}} end end @@ -147,11 +145,11 @@ defmodule Lightning.Credentials.Resolver do {:ok, env} %Project{env: nil} -> - Logger.error("Project has no environment configured") + log_resolution_error(:environment_not_configured) {:error, :environment_not_configured} nil -> - Logger.error("Project not found for run") + log_resolution_error(:project_not_found) {:error, :project_not_found} end end @@ -216,4 +214,27 @@ defmodule Lightning.Credentials.Resolver do nil end end + + defp log_resolution_error(reason, metadata \\ []) + + defp log_resolution_error(:project_not_found, meta), + do: Logger.error("Project not found for run", meta) + + defp log_resolution_error(:environment_not_configured, meta), + do: Logger.warning("Project has no environment configured", meta) + + defp log_resolution_error(:environment_mismatch, meta), + do: + Logger.warning( + "Credential environment does not match project environment", + meta + ) + + defp log_resolution_error(:reauthorization_required, meta), + do: Logger.info("OAuth refresh token has expired", meta) + + defp log_resolution_error(:temporary_failure, meta), + do: Logger.info("Could not reach the OAuth provider", meta) + + defp log_resolution_error(_reason, _meta), do: :ok end diff --git a/lib/lightning_web/channels/run_channel.ex b/lib/lightning_web/channels/run_channel.ex index 566600dceea..d36d2a133b9 100644 --- a/lib/lightning_web/channels/run_channel.ex +++ b/lib/lightning_web/channels/run_channel.ex @@ -486,8 +486,6 @@ defmodule LightningWeb.RunChannel do _project_id, _run_id ) do - Logger.error("Project has no environment configured") - error = LightningWeb.ErrorFormatter.format(:environment_not_configured, %{ project: socket.assigns.project_id @@ -503,8 +501,6 @@ defmodule LightningWeb.RunChannel do _project_id, _run_id ) do - Logger.error("Project not found for run") - error = LightningWeb.ErrorFormatter.format(:project_not_found, %{}) {:reply, {:error, error}, socket} end @@ -519,11 +515,6 @@ defmodule LightningWeb.RunChannel do project_env = Lightning.Projects.get_project!(socket.assigns.project_id).env || "unknown" - Logger.error( - "Credential environment does not match project environment", - project_env: project_env - ) - error = LightningWeb.ErrorFormatter.format( {:environment_mismatch, credential}, @@ -540,8 +531,6 @@ defmodule LightningWeb.RunChannel do _project_id, _run_id ) do - Logger.error("OAuth refresh token has expired") - error = LightningWeb.ErrorFormatter.format(reason, %{ project: socket.assigns.project_id @@ -557,9 +546,7 @@ defmodule LightningWeb.RunChannel do _project_id, _run_id ) do - Logger.error("Could not reach the oauth provider") - - {:reply, {:error, "Could not reach the oauth provider. Try again later"}, + {:reply, {:error, "Could not reach the OAuth provider. Try again later"}, socket} end end diff --git a/priv/gettext/en/LC_MESSAGES/errors.po b/priv/gettext/en/LC_MESSAGES/errors.po index 3dd2984ab19..8fed363a2c0 100644 --- a/priv/gettext/en/LC_MESSAGES/errors.po +++ b/priv/gettext/en/LC_MESSAGES/errors.po @@ -72,7 +72,7 @@ msgstr "Please change your email" msgid "oauth_reauth_required" msgstr "" -"Oauth token has expired. Reauthorize with your external system:" +"OAuth token has expired. Reauthorize with your external system:" " 1. Go to %{credentials_url}" " 2. Find %{credential_name}" " 3. Click \"Edit\" and then \"Reauthorize\"" diff --git a/test/lightning/credentials/resolver_test.exs b/test/lightning/credentials/resolver_test.exs index b5eff96fd3e..fe2af9fc885 100644 --- a/test/lightning/credentials/resolver_test.exs +++ b/test/lightning/credentials/resolver_test.exs @@ -3,7 +3,10 @@ defmodule ResolverTest do alias Lightning.Credentials.Resolver + require Logger + import Lightning.Factories + import ExUnit.CaptureLog describe "resolve_credential/1 with regular credential" do test "returns ResolvedCredential with credential body" do @@ -248,10 +251,14 @@ defmodule ResolverTest do credential = Repo.preload(credential, :oauth_client) - assert {:error, {:reauthorization_required, credential}} = - Resolver.resolve_credential(credential) + {result, log} = + capture_info_log(fn -> Resolver.resolve_credential(credential) end) + assert {:error, {:reauthorization_required, credential}} = result assert credential.name == "Test Googlesheets Credential" + + assert log =~ "[info]" + assert log =~ "OAuth refresh token has expired" end test "when refresh fails with rate limit returns temporary_failure error", %{ @@ -282,8 +289,13 @@ defmodule ResolverTest do credential = Repo.preload(credential, :oauth_client) - assert {:error, {:temporary_failure, _credential}} = - Resolver.resolve_credential(credential) + {result, log} = + capture_info_log(fn -> Resolver.resolve_credential(credential) end) + + assert {:error, {:temporary_failure, _credential}} = result + + assert log =~ "[info]" + assert log =~ "Could not reach the OAuth provider" end test "when refresh fails with other error returns generic error", %{ @@ -725,5 +737,126 @@ defmodule ResolverTest do assert {:error, :not_found} = Resolver.resolve_credential(run, other_credential.id) end + + test "logs environment_mismatch at warning level when credential lacks the project environment body", + %{user: user} do + project = + insert(:project, env: "staging", project_users: [%{user: user}]) + + credential = + insert(:credential, user: user, name: "Mismatch Credential") + |> with_body(%{name: "main", body: %{"key" => "value"}}) + + %{jobs: [job]} = + workflow = + build(:workflow, project: project) + |> with_job(%{ + project_credential: %{credential: credential, project: project} + }) + |> insert() + + dataclip = insert(:dataclip) + + %{runs: [run]} = + insert(:workorder, workflow: workflow) + |> with_run(%{dataclip: dataclip, starting_job: job}) + + {result, log} = + with_log(fn -> Resolver.resolve_credential(run, credential.id) end) + + assert {:error, {:environment_mismatch, _credential}} = result + + assert log =~ "[warning]" + assert log =~ "Credential environment does not match project environment" + end + + test "logs environment_not_configured at warning level for a non-root project with no env", + %{user: user} do + parent = insert(:project, env: "main") + + project = + insert(:project, + parent_id: parent.id, + env: nil, + project_users: [%{user: user}] + ) + + credential = + insert(:credential, user: user, name: "Unconfigured Env Credential") + |> with_body(%{name: "main", body: %{"key" => "value"}}) + + %{jobs: [job]} = + workflow = + build(:workflow, project: project) + |> with_job(%{ + project_credential: %{credential: credential, project: project} + }) + |> insert() + + dataclip = insert(:dataclip) + + %{runs: [run]} = + insert(:workorder, workflow: workflow) + |> with_run(%{dataclip: dataclip, starting_job: job}) + + {result, log} = + with_log(fn -> Resolver.resolve_credential(run, credential.id) end) + + assert {:error, {:environment_not_configured, nil}} = result + + assert log =~ "[warning]" + assert log =~ "Project has no environment configured" + end + + test "logs project_not_found at error level when the run's project is missing" do + # No project_users / project_credentials so the project can be removed + # without tripping restrict FKs; project lookup is what we exercise. + project = insert(:project) + + %{jobs: [job]} = + workflow = + build(:workflow, project: project) + |> with_job() + |> insert() + + dataclip = insert(:dataclip) + + %{runs: [run]} = + insert(:workorder, workflow: workflow) + |> with_run(%{dataclip: dataclip, starting_job: job}) + + # Remove the project so the run's in-memory struct resolves to a + # missing project (workflow/workorder/run cascade-delete, but the + # struct still drives get_project_for_run/1 -> nil). + Repo.delete_all( + from(p in Lightning.Projects.Project, where: p.id == ^project.id) + ) + + fake_credential_id = Ecto.UUID.generate() + + {result, log} = + with_log(fn -> Resolver.resolve_credential(run, fake_credential_id) end) + + assert {:error, {:project_not_found, nil}} = result + + assert log =~ "[error]" + assert log =~ "Project not found for run" + end + end + + # The test logger level is :warning (see config/test.exs), which gates + # :info messages at the primary :logger level before any capture handler + # sees them. Per-process levels (Logger.put_process_level/2) can only + # restrict below the primary level, not lift above it, so the primary level + # must be lowered for the duration of the capture and restored afterwards. + defp capture_info_log(fun) do + previous_level = Logger.level() + Logger.configure(level: :info) + + try do + with_log([level: :info], fun) + after + Logger.configure(level: previous_level) + end end end diff --git a/test/lightning_web/channels/run_channel_test.exs b/test/lightning_web/channels/run_channel_test.exs index 5288b0b3cae..a2c77eb1018 100644 --- a/test/lightning_web/channels/run_channel_test.exs +++ b/test/lightning_web/channels/run_channel_test.exs @@ -654,7 +654,7 @@ defmodule LightningWeb.RunChannelTest do assert_reply ref, :error, - "Could not reach the oauth provider. Try again later" + "Could not reach the OAuth provider. Try again later" end end From 5bdf6be97458945c13ba76476758d03bc506c180 Mon Sep 17 00:00:00 2001 From: Stuart Corbishley Date: Fri, 29 May 2026 17:32:31 +0200 Subject: [PATCH 2/2] Update changelog for #4814 --- CHANGELOG.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 535eb8d88f3..fc91eb59110 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,6 +19,12 @@ and this project adheres to ### Changed +- Stop reporting expected credential-resolution failures (OAuth re-auth needed, + misconfigured project environment, transient provider errors) to Sentry. These + are now logged once, in `Lightning.Credentials.Resolver`, at `info`/`warning` + instead of `error`; only a genuinely missing project still logs at `error`. + [#4814](https://github.com/OpenFn/lightning/issues/4814) + ### Fixed - Channel join crashes when multiple users open the same workflow concurrently