feat(observability): Add usage metrics, per-user tracking, dashboard, and local deploy script by AWS-fpenland · Pull Request #39 · ASUCICREPO/PDF_Accessibility

AWS-fpenland · 2026-03-02T18:10:37Z

Summary

Adds comprehensive observability to both PDF-to-PDF and PDF-to-HTML pipelines, including custom CloudWatch metrics, per-user usage attribution, a dedicated monitoring dashboard, and a local deployment script.

What's included

Observability (`metrics_helper.py`, `usage_metrics_stack.py`)

Shared Python metrics library emitting to the PDFAccessibility CloudWatch namespace
Tracks: pages processed, file size, Adobe API calls/Document Transactions, Bedrock invocations/tokens, processing duration, errors, and estimated cost
MetricsContext context manager for automatic duration and error tracking
Graceful degradation — metrics failures are caught and logged without interrupting processing

Per-user attribution (`s3_object_tagger/`, splitter + pdf2html Lambdas)

S3 object tagging with UserId from Cognito metadata on upload
All metrics include Service + UserId dimensions for per-user breakdown
Falls back to anonymous for direct uploads without Cognito

CloudWatch dashboard (`PDFAccessibilityUsageMetrics` stack)

Aggregate pages/files processed via SUM(SEARCH(...))
Per-user usage tables via Log Insights structured queries
Bedrock invocation and token usage graphs
Adobe Document Transaction tracking (quota visibility)
Processing performance and error monitoring

Instrumented components

Component	Metrics added
PDF Splitter Lambda	PagesProcessed, FileSize, structured logs
Adobe AutoTag (ECS)	AdobeAPICalls, AdobeDocTransactions
Alt Text Generator (ECS)	BedrockInvocations, token usage
Title Generator Lambda	BedrockInvocations, token usage
PDF-to-HTML Lambda	PagesProcessed, Bedrock usage, EstimatedCost

Deployment tooling

deploy-local.sh — deploy both pipelines from local repo without CodeBuild/GitHub
CORS policy added to buildspec-unified.yml for S3 bucket
cdk.json updated to use python3 explicitly

CDK changes (`app.py`)

Lambda Layer for metrics helper attached to all Python Lambdas
cloudwatch:PutMetricData added to ECS task role
s3:GetObjectTagging/s3:PutObjectTagging for splitter Lambda
USER_ID env var passed to ECS tasks via Step Functions
Bucket and log group names exposed for dashboard stack

Documentation

docs/OBSERVABILITY.md — full metrics reference, dimensions, per-user tracking flow, cost estimation, dashboard guide
README updated with observability section

What's NOT changed

No changes to deploy.sh IAM policies
No changes to pre/post remediation accessibility checkers
No changes to alt-text error handling or retry logic
All existing Bedrock model configurations preserved

…pendency

- Add detailed observability analysis document - Create UsageMetricsDashboard CDK stack with: * Pages processed tracking * Bedrock token usage metrics * Adobe API call tracking * Processing duration monitoring * Error tracking * Cost estimation per file/user - Add metrics_helper.py utility for CloudWatch metrics emission - Add integration guide for implementing metrics in existing code - Support per-user usage tracking via S3 object tagging - Include cost calculation formulas for both solutions

…ring

- Add metrics layer to all Lambda functions - Update split_pdf Lambda with page tracking and file size metrics - Update Adobe autotag script with API call tracking - Update PDF2HTML Lambda with Bedrock token tracking and cost estimation - Add user attribution via S3 object tags throughout pipeline - Deploy UsageMetricsDashboard as part of main stack - All metrics automatically tracked with MetricsContext for duration/errors

- Add self.bucket = bucket to make bucket accessible to UsageMetricsDashboard - Fixes AttributeError when deploying observability stack

…ics helper - Move metrics_helper.py to python/ subdirectory for proper Lambda layer structure - Lambda layers require python/ directory for Python packages to be importable

…gregation - Add metrics_helper.py to Docker container - Update dashboard to use SEARCH expression for aggregating metrics across dimensions - Fix import path in autotag.py for Docker environment - Resolves Adobe API metrics not being tracked in ECS tasks

…gging - Create S3 tagger Lambda to convert metadata to tags - Extract user-sub from Cognito UI uploads (metadata) - Apply UserId tag for consistent metrics tracking - Support both authenticated UI and direct S3 uploads - Fallback to 'anonymous' for untagged uploads - Add tagger to both PDF-to-PDF and PDF-to-HTML stacks - Document per-user tracking architecture and usage

- Docker build context cannot access parent directories - Copy metrics_helper.py directly into docker_autotag/ - Simplify Dockerfile to use COPY . /app

- Remove separate S3 tagger Lambda (caused S3 notification conflicts) - Add tagging logic directly to split_pdf and pdf2html Lambdas - Convert metadata to tags at start of processing - Avoids overlapping S3 event notification rules - Maintains same user attribution functionality

- Remove FileName from all metric dimensions - Aggregate at Service/UserId level only - Reduces metric streams from per-file to per-user - Fix undefined files_metric in dashboard - Simplifies dashboard queries

- Direct metric queries with Service dimension - Sum for pages, SampleCount for files - Avoids SEARCH expression array division errors

- Metrics have Service+UserId dimensions - Use SUM() to aggregate SEARCH results into single value - Fixes empty dashboard widgets

- ECS tasks were getting AccessDenied when emitting metrics - Add cloudwatch:PutMetricData to ecs_task_role policy - Enables Adobe API call metrics from ECS containers

- Lambda function names have generated suffixes - Expose log group names from main stack - Pass actual names to dashboard - Fixes 'log group does not exist' error in per-user widgets

- Log Insights queries failed (logs are plain text, not JSON) - Use CloudWatch Metrics with UserId dimension instead - Add Adobe API calls widget - Metrics already working and more efficient - Shows per-user breakdown with legend - See docs/METRICS_STATUS.md for full analysis

- Rewrite dashboard: remove broken/duplicate widgets, keep 6 focused sections - Per-user widgets use same SEARCH as totals (without SUM wrapper) - Remove FileName from all metric dimensions (adobe, bedrock, cost) - Pass USER_ID env var from Step Function Map state to ECS tasks - Add user_id to chunk metadata so Map state can access it - Sync metrics_helper.py to docker_autotag

- Emit JSON log line {event, userId, fileName, pageCount, service} from both Lambdas - Log Insights table: files & pages aggregated by userId - Log Insights table: recent processing activity with details - Replaces graph widgets with table format for per-user section

- SEARCH('{PDFAccessibility,Service,Operation}') matched ZERO metrics because all AdobeAPICalls have 3 dims (FileName or UserId added) - Use SEARCH('{PDFAccessibility} MetricName=...') to match any dim set - Add AdobeDocTransactions metric per Adobe licensing model: AutoTag = 10 doc transactions/page, ExtractPDF = ceil(pages/5) - Pass page_count from PdfReader to track_adobe_api_call - Add Document Transactions widget + quota info to dashboard

- pdf2html is a DockerImageFunction - Lambda layers don't work - Copy metrics_helper.py into pdf2html Docker build context - Add COPY to Dockerfile so it's included in the image - Remove /opt/python path hack (file is now in /var/task) - Add cloudwatch:PutMetricData permission to Lambda role - Always include pdf2html log group in dashboard queries

- Separate --init (first-time) from update (default) flows - --init creates secrets, BDA project, S3 bucket, ECR repo - Updates reuse existing BDA project from CloudFormation params - Always sync metrics_helper.py to Docker build contexts - pdf2html: build/push Docker + force Lambda image update - pdf2pdf: CDK handles Docker via DockerImageAsset automatically - Support --pdf2pdf, --pdf2html, --all, --profile, --region flags - No more duplicate BDA projects on every run

- 'with MetricsContext(...):' had no indented body (line 190) - All code after it was at same indent level, not inside the with - Python raised SyntaxError on import, Lambda couldn't start at all - Replace with explicit __enter__/__exit__ to avoid re-indenting 250 lines

- Profile region (us-west-2) differs from deployment region (us-east-1) - Script now checks if Pdf2HtmlStack exists and uses its region - Prevents pushing to wrong ECR region

…pace - SEARCH('{PDFAccessibility}') matches only metrics with ZERO dimensions - Must specify dimension names: '{PDFAccessibility,Service,Operation,UserId}' - Verified with get-metric-data: exact dims returns data, namespace-only returns empty

The lambda/add_title/venv/ directory (558 files including pip, pymupdf, and binary .so files) was accidentally committed despite being listed in .gitignore. Remove from git tracking while preserving the .gitignore entry to prevent recurrence.

File contained a real AWS account ID and is not needed for upstream contribution. Added to .gitignore to prevent accidental re-commit.

lambda/shared/metrics_helper.py was missing page_count param in track_adobe_api_call and still included FileName dimension. Now matches the other three copies.

Bare except catches KeyboardInterrupt and SystemExit which makes debugging harder. Both clauses are in tag retrieval fallback paths added by the dev branch.

Rename directories back to match main branch naming to minimize diff for upstream PR. All path references in app.py, deploy-local.sh, .gitignore, and docs updated accordingly. Observability features preserved.

Replace 7 separate observability docs with single OBSERVABILITY.md. Restore IAM_PERMISSIONS.md, MANUAL_DEPLOYMENT.md, and CONFIGURING_LIMITS.md from main branch. Remove hardcoded WSL paths.

Rename autotag.py, alt-text.js, and myapp.py to match main branch names. Update Dockerfiles, app.py handler, and docs references accordingly.

Use main's optimized multi-stage build for the alt-text-generator container (node:22-slim, separate builder/production stages, smaller final image).

Start from main's app.py and surgically add only observability features: metrics layer, CloudWatch PutMetricData permission, USER_ID env var, S3 tagging permissions, log group exports, and UsageMetrics stack. Preserves main's VPC endpoints, zstd compression, scoped IAM policies, and naming conventions.

Start from main's .gitignore, add only the existing-stack.json exclusion.

Start from main's Dockerfile and adobe_autotag_processor.py, add only metrics imports and tracking calls to autotag and extract_api functions.

- Add interactive solution selection when no flags given - Prompt for Adobe credentials when secret missing - Create ECR repo before Docker push (was --init only) - Create BDA project automatically if none exists - Create S3 bucket and CORS for pdf2html if missing - Use pip3 to match python3 interpreter - Add CDK bootstrap before every deploy - Add retry logic for CDK deploy - Add --app flag for pdf2html CDK deploy - Add Docker push retry with ECR login refresh - Remove --init flag (resources created on demand)

- Pass user_id to Adobe API tracking calls in autotag container - Add structured JSON logs to autotag container for log query widgets - Add Bedrock metrics tracking to title-generator and alt-text-generator - Add @aws-sdk/client-cloudwatch dependency to alt-text container - Fix dashboard Bedrock widgets to query PDFAccessibility namespace - Include all log groups (JS container, pdf2html) in dashboard queries

Remove structured log from autotag container since it processes chunks not files, causing duplicate entries with _chunk_1 suffix. The splitter already emits the correct file-level event. Move pdf2html structured log outside usage_data.json dependency with pypdf fallback so it always emits even without usage data.

Separate the structured log into its own try/except block so it fires even if metrics tracking (estimate_cost etc) throws. The previous code had both in the same try block, so any exception in cost estimation would skip the dashboard log entirely.

Revert deploy.sh IAM policies to main's scoped versions. The wildcard Resource:* on all policy statements was a security regression. Main's policies already include cloudwatch:PutMetricData and PutDashboard which is all our observability features need.

Restore main's scoped Bedrock model ARNs, BDA project permissions, and log group ARN. Keep new observability additions: s3:GetObjectTagging, s3:PutObjectTagging, and cloudwatch:PutMetricData.

These files had typo regressions (remidiation, accessability) from rebasing. Our observability work does not modify these Lambdas.

Restore MODEL_ID_ALT_TEXT/MODEL_ID_LINK_ALT_TEXT constants, modifyPDF throw-on-error, success/failure counting with all-failed exit guard, progress logging, and sleep(2000). Keep observability additions: CloudWatch metrics tracking for Bedrock invocations and token usage.

AWS-fpenland added 30 commits February 5, 2026 15:49

feat(deploy): add local deployment script without CodeBuild/GitHub de…

31757cc

…pendency

docs(observability): add executive summary of observability enhancements

1daab01

docs(observability): add quick reference guide for metrics and monito…

990951a

…ring

docs(observability): add deployment summary for integrated solution

65cd762

docs: add guide for updating existing deployments from new environment

226eb5e

fix(cdk): expose bucket attribute from PDFAccessibility stack

9d3c054

- Add self.bucket = bucket to make bucket accessible to UsageMetricsDashboard - Fixes AttributeError when deploying observability stack

fix(observability): correct Lambda layer directory structure for metr…

3550e7a

…ics helper - Move metrics_helper.py to python/ subdirectory for proper Lambda layer structure - Lambda layers require python/ directory for Python packages to be importable

fix(docker): copy metrics_helper into docker_autotag directory

740f48b

- Docker build context cannot access parent directories - Copy metrics_helper.py directly into docker_autotag/ - Simplify Dockerfile to use COPY . /app

fix(dashboard): use SEARCH expressions for metric aggregation

7905c4c

fix(metrics): remove FileName dimension for cleaner aggregation

104de2f

- Remove FileName from all metric dimensions - Aggregate at Service/UserId level only - Reduces metric streams from per-file to per-user - Fix undefined files_metric in dashboard - Simplifies dashboard queries

fix(dashboard): remove undefined pages_metric reference

21a4cea

fix(dashboard): use simple metric queries instead of SEARCH expressions

c874832

- Direct metric queries with Service dimension - Sum for pages, SampleCount for files - Avoids SEARCH expression array division errors

fix(dashboard): use SUM(SEARCH()) to aggregate metrics across all users

f9b67ec

- Metrics have Service+UserId dimensions - Use SUM() to aggregate SEARCH results into single value - Fixes empty dashboard widgets

fix(iam): add CloudWatch PutMetricData permission to ECS task role

6f3bc6a

- ECS tasks were getting AccessDenied when emitting metrics - Add cloudwatch:PutMetricData to ecs_task_role policy - Enables Adobe API call metrics from ECS containers

fix(dashboard): use actual log group names instead of hardcoded values

d75dde9

- Lambda function names have generated suffixes - Expose log group names from main stack - Pass actual names to dashboard - Fixes 'log group does not exist' error in per-user widgets

fix(deploy): auto-detect existing deployment region

dd47156

- Profile region (us-west-2) differs from deployment region (us-east-1) - Script now checks if Pdf2HtmlStack exists and uses its region - Prevents pushing to wrong ECR region

AWS-fpenland added 22 commits February 12, 2026 15:23

chore(gitignore): Remove existing-stack.json

b216ecd

File contained a real AWS account ID and is not needed for upstream contribution. Added to .gitignore to prevent accidental re-commit.

fix(metrics): Sync stale metrics_helper copy

b118c08

lambda/shared/metrics_helper.py was missing page_count param in track_adobe_api_call and still included FileName dimension. Now matches the other three copies.

style(autotag): Remove duplicate import

2366392

fix(lambda): Replace bare except with Exception

3e20370

Bare except catches KeyboardInterrupt and SystemExit which makes debugging harder. Both clauses are in tag retrieval fallback paths added by the dev branch.

refactor: Restore upstream directory structure

4b5c714

Rename directories back to match main branch naming to minimize diff for upstream PR. All path references in app.py, deploy-local.sh, .gitignore, and docs updated accordingly. Observability features preserved.

docs: Consolidate observability documentation

bffcc18

Replace 7 separate observability docs with single OBSERVABILITY.md. Restore IAM_PERMISSIONS.md, MANUAL_DEPLOYMENT.md, and CONFIGURING_LIMITS.md from main branch. Remove hardcoded WSL paths.

docs(readme): Add observability section

e1e42a0

refactor: Match upstream file naming

973c561

Rename autotag.py, alt-text.js, and myapp.py to match main branch names. Update Dockerfiles, app.py handler, and docs references accordingly.

fix(docker): Restore multi-stage Dockerfile

452dfb8

Use main's optimized multi-stage build for the alt-text-generator container (node:22-slim, separate builder/production stages, smaller final image).

chore(gitignore): Rebase on upstream main

9d86f13

Start from main's .gitignore, add only the existing-stack.json exclusion.

refactor(autotag): Rebase on upstream main

a88991b

Start from main's Dockerfile and adobe_autotag_processor.py, add only metrics imports and tracking calls to autotag and extract_api functions.

fix(cdk): Use python3 in cdk.json app command

adea1ed

adding agent markdown

1da516e

fix(iam): Restore scoped Bedrock/Logs in pdf2html-stack

06032fd

Restore main's scoped Bedrock model ARNs, BDA project permissions, and log group ARN. Keep new observability additions: s3:GetObjectTagging, s3:PutObjectTagging, and cloudwatch:PutMetricData.

revert: Restore pre/post checker files to main

37e5040

These files had typo regressions (remidiation, accessability) from rebasing. Our observability work does not modify these Lambdas.

AWS-fpenland marked this pull request as ready for review March 2, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): Add usage metrics, per-user tracking, dashboard, and local deploy script#39

feat(observability): Add usage metrics, per-user tracking, dashboard, and local deploy script#39
AWS-fpenland wants to merge 52 commits intoASUCICREPO:mainfrom
AWS-fpenland:dev

AWS-fpenland commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AWS-fpenland commented Mar 2, 2026

Summary

What's included

Observability (metrics_helper.py, usage_metrics_stack.py)

Per-user attribution (s3_object_tagger/, splitter + pdf2html Lambdas)

CloudWatch dashboard (PDFAccessibilityUsageMetrics stack)

Instrumented components

Deployment tooling

CDK changes (app.py)

Documentation

What's NOT changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Observability (`metrics_helper.py`, `usage_metrics_stack.py`)

Per-user attribution (`s3_object_tagger/`, splitter + pdf2html Lambdas)

CloudWatch dashboard (`PDFAccessibilityUsageMetrics` stack)

CDK changes (`app.py`)