This document provides detailed information about all the methods available in the Nutrient DWS Python Client.
The main client for interacting with the Nutrient DWS Processor API.
NutrientClient(api_key: str | Callable[[], Awaitable[str] | str], base_url: str | None = None, timeout: int | None = None)Parameters:
api_key(required): Your API key string or async function returning a tokenbase_url(optional): Custom API base URL (defaults tohttps://api.nutrient.io)timeout(optional): Request timeout in milliseconds
Provide your API key directly:
from nutrient_dws import NutrientClient
client = NutrientClient(api_key='your_api_key')Or use an async token provider to fetch tokens from a secure source:
import httpx
from nutrient_dws import NutrientClient
async def get_token():
async with httpx.AsyncClient() as http_client:
response = await http_client.get('/api/get-nutrient-token')
data = response.json()
return data['token']
client = NutrientClient(api_key=get_token)Gets account information for the current API key.
Returns: AccountInfo - Account information dictionary
account_info = await client.get_account_info()
# Access subscription information
print(account_info['subscriptionType'])Creates a new authentication token.
Parameters:
params: CreateAuthTokenParameters- Parameters for creating the token
Returns: CreateAuthTokenResponse - The created token information
token = await client.create_token({
'expirationTime': 3600
})
print(token['id'])
# Store the token for future use
token_id = token['id']
token_value = token['accessToken']Deletes an authentication token.
Parameters:
id: str- ID of the token to delete
Returns: None
await client.delete_token('token-id-123')
# Example in a token management function
async def revoke_user_token(token_id: str) -> bool:
try:
await client.delete_token(token_id)
print(f'Token {token_id} successfully revoked')
return True
except Exception as error:
print(f'Failed to revoke token: {error}')
return FalseSigns a PDF document.
Parameters:
file: FileInput- The PDF file to signdata: CreateDigitalSignature | None- Signature data (optional)options: SignRequestOptions | None- Additional options (image, graphicImage) (optional)
Returns: BufferOutput - The signed PDF file output
result = await client.sign('document.pdf', {
'signatureType': 'cms',
'flatten': False,
'cadesLevel': 'b-lt'
})
# Access the signed PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('signed-document.pdf', 'wb') as f:
f.write(pdf_buffer)Uses AI to redact sensitive information in a document.
Parameters:
file: FileInput- The PDF file to redactcriteria: str- AI redaction criteriaredaction_state: Literal['stage', 'apply']- Whether to stage or apply redactions (default: 'stage')pages: PageRange | None- Optional pages to redactoptions: RedactOptions | None- Optional redaction options
Returns: BufferOutput - The redacted document
# Stage redactions
result = await client.create_redactions_ai(
'document.pdf',
'Remove all emails'
)
# Apply redactions immediately
result = await client.create_redactions_ai(
'document.pdf',
'Remove all PII',
'apply'
)
# Redact only specific pages
result = await client.create_redactions_ai(
'document.pdf',
'Remove all emails',
'stage',
{'start': 0, 'end': 4} # Pages 0, 1, 2, 3, 4
)
# Redact only the last 3 pages
result = await client.create_redactions_ai(
'document.pdf',
'Remove all PII',
'stage',
{'start': -3, 'end': -1} # Last three pages
)
# Access the redacted PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('redacted-document.pdf', 'wb') as f:
f.write(pdf_buffer)Performs OCR (Optical Character Recognition) on a document.
Parameters:
file: FileInput- The input file to perform OCR onlanguage: OcrLanguage | list[OcrLanguage]- The language(s) to use for OCR
Returns: BufferOutput - The OCR result
result = await client.ocr('scanned-document.pdf', 'english')
# Access the OCR-processed PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('ocr-document.pdf', 'wb') as f:
f.write(pdf_buffer)Adds a text watermark to a document.
Parameters:
file: FileInput- The input file to watermarktext: str- The watermark textoptions: dict[str, Any] | None- Watermark options (optional)
Returns: BufferOutput - The watermarked document
result = await client.watermark_text('document.pdf', 'CONFIDENTIAL', {
'opacity': 0.5,
'fontSize': 24
})
# Access the watermarked PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('watermarked-document.pdf', 'wb') as f:
f.write(pdf_buffer)Adds an image watermark to a document.
Parameters:
file: FileInput- The input file to watermarkimage: FileInput- The watermark imageoptions: ImageWatermarkActionOptions | None- Watermark options (optional)
Returns: BufferOutput - The watermarked document
result = await client.watermark_image('document.pdf', 'watermark.jpg', {
'opacity': 0.5,
'width': {'value': 50, 'unit': "%"},
'height': {'value': 50, 'unit': "%"}
})
# Access the watermarked PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('image-watermarked-document.pdf', 'wb') as f:
f.write(pdf_buffer)Converts a document to a different format.
Parameters:
file: FileInput- The input file to converttarget_format: OutputFormat- The target format to convert to
Returns: BufferOutput | ContentOutput | JsonContentOutput - The specific output type based on the target format
# Convert DOCX to PDF
pdf_result = await client.convert('document.docx', 'pdf')
# Supports formats: pdf, pdfa, pdfua, docx, xlsx, pptx, png, jpeg, jpg, webp, html, markdown
# Access the PDF buffer
pdf_buffer = pdf_result['buffer']
print(pdf_result['mimeType']) # 'application/pdf'
# Save the PDF
with open('converted-document.pdf', 'wb') as f:
f.write(pdf_buffer)
# Convert PDF to image
image_result = await client.convert('document.pdf', 'png')
# Access the PNG buffer
png_buffer = image_result['buffer']
print(image_result['mimeType']) # 'image/png'
# Save the image
with open('document-page.png', 'wb') as f:
f.write(png_buffer)Merges multiple documents into one.
Parameters:
files: list[FileInput]- The files to merge
Returns: BufferOutput - The merged document
result = await client.merge([
'doc1.pdf',
'doc2.pdf',
'doc3.pdf'
])
# Access the merged PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('merged-document.pdf', 'wb') as f:
f.write(pdf_buffer)Extracts text content from a document.
Parameters:
file: FileInput- The file to extract text frompages: PageRange | None- Optional page range to extract text from
Returns: JsonContentOutput - The extracted text data
result = await client.extract_text('document.pdf')
# Extract text from specific pages
result = await client.extract_text('document.pdf', {'start': 0, 'end': 2}) # Pages 0, 1, 2
# Extract text from the last page
result = await client.extract_text('document.pdf', {'end': -1}) # Last page
# Extract text from the second-to-last page to the end
result = await client.extract_text('document.pdf', {'start': -2}) # Second-to-last and last page
# Access the extracted text content
text_content = result['data']['pages'][0]['plainText']
# Process the extracted text
word_count = len(text_content.split())
print(f'Document contains {word_count} words')
# Search for specific content
if 'confidential' in text_content:
print('Document contains confidential information')Extracts table content from a document.
Parameters:
file: FileInput- The file to extract tables frompages: PageRange | None- Optional page range to extract tables from
Returns: JsonContentOutput - The extracted table data
result = await client.extract_table('document.pdf')
# Extract tables from specific pages
result = await client.extract_table('document.pdf', {'start': 0, 'end': 2}) # Pages 0, 1, 2
# Extract tables from the last page
result = await client.extract_table('document.pdf', {'end': -1}) # Last page
# Extract tables from the second-to-last page to the end
result = await client.extract_table('document.pdf', {'start': -2}) # Second-to-last and last page
# Access the extracted tables
tables = result['data']['pages'][0]['tables']
# Process the first table if available
if tables and len(tables) > 0:
first_table = tables[0]
# Get table dimensions
print(f"Table has {len(first_table['rows'])} rows and {len(first_table['columns'])} columns")
# Access table cells
for i in range(len(first_table['rows'])):
for j in range(len(first_table['columns'])):
cell = next((cell for cell in first_table['cells']
if cell['rowIndex'] == i and cell['columnIndex'] == j), None)
cell_content = cell['text'] if cell else ''
print(f"Cell [{i}][{j}]: {cell_content}")
# Convert table to CSV
csv_content = ''
for i in range(len(first_table['rows'])):
row_data = []
for j in range(len(first_table['columns'])):
cell = next((cell for cell in first_table['cells']
if cell['rowIndex'] == i and cell['columnIndex'] == j), None)
row_data.append(cell['text'] if cell else '')
csv_content += ','.join(row_data) + '\n'
print(csv_content)Extracts key value pair content from a document.
Parameters:
file: FileInput- The file to extract KVPs frompages: PageRange | None- Optional page range to extract KVPs from
Returns: JsonContentOutput - The extracted KVPs data
result = await client.extract_key_value_pairs('document.pdf')
# Extract KVPs from specific pages
result = await client.extract_key_value_pairs('document.pdf', {'start': 0, 'end': 2}) # Pages 0, 1, 2
# Extract KVPs from the last page
result = await client.extract_key_value_pairs('document.pdf', {'end': -1}) # Last page
# Extract KVPs from the second-to-last page to the end
result = await client.extract_key_value_pairs('document.pdf', {'start': -2}) # Second-to-last and last page
# Access the extracted key-value pairs
kvps = result['data']['pages'][0]['keyValuePairs']
# Process the key-value pairs
if kvps and len(kvps) > 0:
# Iterate through all key-value pairs
for index, kvp in enumerate(kvps):
print(f'KVP {index + 1}:')
print(f' Key: {kvp["key"]}')
print(f' Value: {kvp["value"]}')
print(f' Confidence: {kvp["confidence"]}')
# Create a dictionary from the key-value pairs
dictionary = {}
for kvp in kvps:
dictionary[kvp['key']] = kvp['value']
# Look up specific values
print(f'Invoice Number: {dictionary.get("Invoice Number")}')
print(f'Date: {dictionary.get("Date")}')
print(f'Total Amount: {dictionary.get("Total")}')Flattens annotations in a PDF document.
Parameters:
file: FileInput- The PDF file to flattenannotation_ids: list[str | int] | None- Optional specific annotation IDs to flatten
Returns: BufferOutput - The flattened document
# Flatten all annotations
result = await client.flatten('annotated-document.pdf')
# Flatten specific annotations by ID
result = await client.flatten('annotated-document.pdf', ['annotation1', 'annotation2'])Password protects a PDF document.
Parameters:
file: FileInput- The file to protectuser_password: str- Password required to open the documentowner_password: str- Password required to modify the documentpermissions: list[PDFUserPermission] | None- Optional array of permissions granted when opened with user password
Returns: BufferOutput - The password-protected document
result = await client.password_protect('document.pdf', 'user123', 'owner456')
# Or with specific permissions:
result = await client.password_protect('document.pdf', 'user123', 'owner456',
['printing', 'extract_accessibility'])
# Access the password-protected PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('protected-document.pdf', 'wb') as f:
f.write(pdf_buffer)Sets metadata for a PDF document.
Parameters:
file: FileInput- The PDF file to modifymetadata: Metadata- The metadata to set (title and/or author)
Returns: BufferOutput - The document with updated metadata
result = await client.set_metadata('document.pdf', {
'title': 'My Document',
'author': 'John Doe'
})Sets page labels for a PDF document.
Parameters:
file: FileInput- The PDF file to modifylabels: list[Label]- Array of label objects with pages and label properties
Returns: BufferOutput - The document with updated page labels
result = await client.set_page_labels('document.pdf', [
{'pages': [0, 1, 2], 'label': 'Cover'},
{'pages': [3, 4, 5], 'label': 'Chapter 1'}
])Applies Instant JSON to a document.
Parameters:
file: FileInput- The PDF file to modifyinstant_json_file: FileInput- The Instant JSON file to apply
Returns: BufferOutput - The modified document
result = await client.apply_instant_json('document.pdf', 'annotations.json')Applies XFDF to a document.
Parameters:
file: FileInput- The PDF file to modifyxfdf_file: FileInput- The XFDF file to applyoptions: ApplyXfdfActionOptions | None- Optional settings for applying XFDF
Returns: BufferOutput - The modified document
result = await client.apply_xfdf('document.pdf', 'annotations.xfdf')
# Or with options:
result = await client.apply_xfdf(
'document.pdf', 'annotations.xfdf',
{'ignorePageRotation': True, 'richTextEnabled': False}
)Creates redaction annotations based on a preset pattern.
Parameters:
file: FileInput- The PDF file to create redactions inpreset: SearchPreset- The preset pattern to search for (e.g., 'email-address', 'social-security-number')redaction_state: Literal['stage', 'apply']- Whether to stage or apply redactions (default: 'stage')pages: PageRange | None- Optional page range to create redactions inpreset_options: CreateRedactionsStrategyOptionsPreset | None- Optional settings for the preset strategyoptions: BaseCreateRedactionsOptions | None- Optional settings for creating redactions
Returns: BufferOutput - The document with redaction annotations
result = await client.create_redactions_preset('document.pdf', 'email-address')
# With specific pages
result = await client.create_redactions_preset(
'document.pdf',
'email-address',
'stage',
{'start': 0, 'end': 4} # Pages 0, 1, 2, 3, 4
)
# With the last 3 pages
result = await client.create_redactions_preset(
'document.pdf',
'email-address',
'stage',
{'start': -3, 'end': -1} # Last three pages
)Creates redaction annotations based on a regular expression.
Parameters:
file: FileInput- The PDF file to create redactions inregex: str- The regular expression to search forredaction_state: Literal['stage', 'apply']- Whether to stage or apply redactions (default: 'stage')pages: PageRange | None- Optional page range to create redactions inregex_options: CreateRedactionsStrategyOptionsRegex | None- Optional settings for the regex strategyoptions: BaseCreateRedactionsOptions | None- Optional settings for creating redactions
Returns: BufferOutput - The document with redaction annotations
result = await client.create_redactions_regex('document.pdf', r'Account:\\s*\\d{8,12}')
# With specific pages
result = await client.create_redactions_regex(
'document.pdf',
r'Account:\\s*\\d{8,12}',
'stage',
{'start': 0, 'end': 4} # Pages 0, 1, 2, 3, 4
)
# With the last 3 pages
result = await client.create_redactions_regex(
'document.pdf',
r'Account:\\s*\\d{8,12}',
'stage',
{'start': -3, 'end': -1} # Last three pages
)Creates redaction annotations based on text.
Parameters:
file: FileInput- The PDF file to create redactions intext: str- The text to search forredaction_state: Literal['stage', 'apply']- Whether to stage or apply redactions (default: 'stage')pages: PageRange | None- Optional page range to create redactions intext_options: CreateRedactionsStrategyOptionsText | None- Optional settings for the text strategyoptions: BaseCreateRedactionsOptions | None- Optional settings for creating redactions
Returns: BufferOutput - The document with redaction annotations
result = await client.create_redactions_text('document.pdf', 'email@example.com')
# With specific pages and options
result = await client.create_redactions_text(
'document.pdf',
'email@example.com',
'stage',
{'start': 0, 'end': 4}, # Pages 0, 1, 2, 3, 4
{'caseSensitive': False, 'includeAnnotations': True}
)
# Create redactions on the last 3 pages
result = await client.create_redactions_text(
'document.pdf',
'email@example.com',
'stage',
{'start': -3, 'end': -1} # Last three pages
)Applies redaction annotations in a document.
Parameters:
file: FileInput- The PDF file with redaction annotations to apply
Returns: BufferOutput - The document with applied redactions
# Stage redactions from a createRedaction Method:
staged_result = await client.create_redactions_text(
'document.pdf',
'email@example.com',
'stage'
)
result = await client.apply_redactions(staged_result['buffer'])Rotates pages in a document.
Parameters:
file: FileInput- The PDF file to rotateangle: Literal[90, 180, 270]- Rotation angle (90, 180, or 270 degrees)pages: PageRange | None- Optional page range to rotate
Returns: BufferOutput - The entire document with specified pages rotated
result = await client.rotate('document.pdf', 90)
# Rotate specific pages:
result = await client.rotate('document.pdf', 90, {'start': 1, 'end': 3}) # Pages 1, 2, 3
# Rotate the last page:
result = await client.rotate('document.pdf', 90, {'end': -1}) # Last page
# Rotate from page 2 to the second-to-last page:
result = await client.rotate('document.pdf', 90, {'start': 2, 'end': -2})Adds blank pages to a document.
Parameters:
file: FileInput- The PDF file to add pages tocount: int- The number of blank pages to add (default: 1)index: int | None- Optional index where to add the blank pages (0-based). If not provided, pages are added at the end.
Returns: BufferOutput - The document with added pages
# Add 2 blank pages at the end
result = await client.add_page('document.pdf', 2)
# Add 1 blank page after the first page (at index 1)
result = await client.add_page('document.pdf', 1, 1)Optimizes a PDF document for size reduction.
Parameters:
file: FileInput- The PDF file to optimizeoptions: OptimizePdf | None- Optimization options
Returns: BufferOutput - The optimized document
result = await client.optimize('large-document.pdf', {
'grayscaleImages': True,
'mrcCompression': True,
'imageOptimizationQuality': 2
})Splits a PDF document into multiple parts based on page ranges.
Parameters:
file: FileInput- The PDF file to splitpage_ranges: list[PageRange]- Array of page ranges to extract
Returns: list[BufferOutput] - An array of PDF documents, one for each page range
results = await client.split('document.pdf', [
{'start': 0, 'end': 2}, # Pages 0, 1, 2
{'start': 3, 'end': 5} # Pages 3, 4, 5
])
# Split using negative indices
results = await client.split('document.pdf', [
{'start': 0, 'end': 2}, # First three pages
{'start': 3, 'end': -3}, # Middle pages
{'start': -2, 'end': -1} # Last two pages
])
# Process each resulting PDF
for i, result in enumerate(results):
# Access the PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open(f'split-part-{i}.pdf', 'wb') as f:
f.write(pdf_buffer)Creates a new PDF containing only the specified pages in the order provided.
Parameters:
file: FileInput- The PDF file to extract pages frompage_indices: list[int]- Array of page indices to include in the new PDF (0-based) Negative indices count from the end of the document (e.g., -1 is the last page)
Returns: BufferOutput - A new document with only the specified pages
# Create a new PDF with only the first and third pages
result = await client.duplicate_pages('document.pdf', [0, 2])
# Create a new PDF with pages in a different order
result = await client.duplicate_pages('document.pdf', [2, 0, 1])
# Create a new PDF with duplicated pages
result = await client.duplicate_pages('document.pdf', [0, 0, 1, 1, 0])
# Create a new PDF with the first and last pages
result = await client.duplicate_pages('document.pdf', [0, -1])
# Create a new PDF with the last three pages in reverse order
result = await client.duplicate_pages('document.pdf', [-1, -2, -3])
# Access the PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('duplicated-pages.pdf', 'wb') as f:
f.write(pdf_buffer)Deletes pages from a PDF document.
Parameters:
file: FileInput- The PDF file to modifypage_indices: list[int]- Array of page indices to delete (0-based) Negative indices count from the end of the document (e.g., -1 is the last page)
Returns: BufferOutput - The document with deleted pages
# Delete second and fourth pages
result = await client.delete_pages('document.pdf', [1, 3])
# Delete the last page
result = await client.delete_pages('document.pdf', [-1])
# Delete the first and last two pages
result = await client.delete_pages('document.pdf', [0, -1, -2])
# Access the modified PDF buffer
pdf_buffer = result['buffer']
# Get the MIME type of the output
print(result['mimeType']) # 'application/pdf'
# Save the buffer to a file
with open('modified-document.pdf', 'wb') as f:
f.write(pdf_buffer)The workflow builder provides a fluent interface for chaining multiple operations. See WORKFLOW.md for detailed information about workflow methods including:
workflow()- Create a new workflow builderadd_file_part()- Add file parts to the workflowadd_html_part()- Add HTML contentapply_action()- Apply processing actionsoutput_pdf(),output_image(),output_json()- Set output formatsexecute()- Execute the workflow
All methods can raise the following exceptions:
ValidationError- Invalid input parametersAuthenticationError- Authentication failedAPIError- API returned an errorNetworkError- Network request failedNutrientError- Base error class
from nutrient_dws import (
NutrientError,
ValidationError,
APIError,
AuthenticationError,
NetworkError
)
try:
result = await client.convert('file.docx', 'pdf')
except ValidationError as error:
print(f'Invalid input: {error.message} - Details: {error.details}')
except AuthenticationError as error:
print(f'Auth error: {error.message} - Status: {error.status_code}')
except APIError as error:
print(f'API error: {error.message} - Status: {error.status_code} - Details: {error.details}')
except NetworkError as error:
print(f'Network error: {error.message} - Details: {error.details}')