Skip to content

[ContentUnderstanding] Add ContentRange samples for document, video, and audio.#45679

Open
changjian-wang wants to merge 2 commits intomainfrom
changjian-wang/add-contentrange-samples-doc-audio-video
Open

[ContentUnderstanding] Add ContentRange samples for document, video, and audio.#45679
changjian-wang wants to merge 2 commits intomainfrom
changjian-wang/add-contentrange-samples-doc-audio-video

Conversation

@changjian-wang
Copy link
Member

This pull request introduces a new ContentRange value type for specifying content ranges in analysis requests, and updates both the SDK and sample code to support and demonstrate its usage. The changes improve flexibility for users to restrict analysis to specific document pages or time ranges in audio/video content, and update documentation and sample code to showcase these features.

ContentRange value type introduction and integration

  • Added the new ContentRange class in models/_content_range.py, providing methods to construct ranges for document pages and audio/video time intervals, combine multiple ranges, and convert to string. This enables precise specification of content segments to analyze.
  • Integrated ContentRange into the SDK: updated begin_analyze_binary and its async variant to accept either a string or a ContentRange object for the content_range parameter, converting it to string as needed. Documentation for these methods was updated to reflect the new parameter type and usage. [1] [2] [3] [4] [5] [6]
  • Exported ContentRange in models/_patch.py for public access. [1] [2]

Sample code updates

  • Updated sample_analyze_binary_async.py to demonstrate analyzing specific pages and combined page ranges using the new ContentRange class, including example output for these scenarios. [1] [2]
  • Updated sample_analyze_url_async.py to show how to restrict analysis to a single page using ContentRange, and changed the document URL to a more complex sample. Also imported timedelta for potential time range usage. [1] [2] [3] [4]

Asset metadata update

  • Updated the asset tag in assets.json to reflect the new version, ensuring asset tracking aligns with these changes.

Changjian Wang added 2 commits March 12, 2026 18:36
- Implemented ContentRange functionality in sample scripts for analyzing binary documents and URLs.
- Added examples for analyzing specific pages and combined page ranges in `sample_analyze_binary.py`.
- Enhanced `sample_analyze_url.py` with ContentRange examples for documents, videos, and audio, including time-based ranges.
- Created unit tests for ContentRange functionality, covering various scenarios and edge cases.
- Updated existing tests to validate ContentRange behavior in document and media analysis.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new ContentRange value type for the Azure AI Content Understanding SDK, enabling users to specify content ranges (document pages or audio/video time intervals) when analyzing content. The class provides factory methods for constructing ranges and is integrated into the begin_analyze_binary API.

Changes:

  • Added ContentRange class with factory methods (page, pages, pages_from, time_range, time_range_from, combine) and exported it in the models namespace.
  • Updated begin_analyze_binary (sync and async) to accept ContentRange objects in addition to raw strings for the content_range parameter.
  • Added comprehensive sample code and tests demonstrating ContentRange usage for document, video, and audio analysis scenarios.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
models/_content_range.py New ContentRange class with factory methods, equality, hashing, and string conversion
models/_patch.py Exports ContentRange in __all__
_patch.py Updates sync begin_analyze_binary to accept ContentRange and convert to string
aio/_patch.py Updates async begin_analyze_binary to accept ContentRange and convert to string
tests/test_content_range.py Unit tests for ContentRange construction, validation, equality, and integration with AnalysisInput
tests/samples/test_sample_analyze_url.py Integration tests for ContentRange with document, video, and audio URL analysis
tests/samples/test_sample_analyze_binary.py Integration tests for ContentRange with binary document analysis
samples/sample_analyze_url.py Sync sample showing ContentRange usage for URL-based analysis
samples/async_samples/sample_analyze_url_async.py Async sample showing ContentRange usage for URL-based analysis
samples/sample_analyze_binary.py Sync sample showing ContentRange usage for binary analysis
samples/async_samples/sample_analyze_binary_async.py Async sample showing ContentRange usage for async binary analysis
assets.json Updated asset tag for new test recordings

tests_dir = os.path.dirname(os.path.dirname(__file__))
file_path = os.path.join(tests_dir, "test_data", "mixed_financial_docs.pdf")
if not os.path.exists(file_path):
file_path = os.path.join(tests_dir, "test_data", "sample_invoice.pdf")
Comment on lines +23 to +40
range = ContentRange.page(5) # "5"
range = ContentRange.pages(1, 3) # "1-3"
range = ContentRange.pages_from(9) # "9-"

# Audio/video time ranges
range = ContentRange.time_range(
timedelta(0), timedelta(seconds=5)) # "0-5000"
range = ContentRange.time_range_from(
timedelta(seconds=5)) # "5000-"

# Combine multiple ranges
range = ContentRange.combine(
ContentRange.pages(1, 3),
ContentRange.page(5),
ContentRange.pages_from(9)) # "1-3,5,9-"

# Or construct from a raw string
range = ContentRange("1-3,5,9-")
Comment on lines +50 to +51
if value is None:
raise ValueError("value cannot be None.")
"""ContentRange value type for specifying content ranges on AnalysisInput."""

from datetime import timedelta
from typing import Optional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants