| title | Invoice OCR Python |
|---|---|
| category | 622b805aaec68102ea7fcbc2 |
| slug | python-invoice-ocr |
| parentDoc | 609808f773b0b90051d839de |
The Python OCR SDK supports the Invoice API.
Using the sample below, we are going to illustrate how to extract the data that we want using the OCR SDK.

#
# Install the Python client library by running:
# pip install mindee
#
from mindee import Client, PredictResponse, product
# Init a new client
mindee_client = Client(api_key="my-api-key")
# Load a file from disk
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")
# Load a file from disk and parse it.
result: PredictResponse = mindee_client.parse(
product.InvoiceV4,
input_doc,
)
# Print a summary of the API result
print(result.document)
# Print the document-level summary
# print(result.document.inference.prediction)You can also call this product asynchronously:
#
# Install the Python client library by running:
# pip install mindee
#
from mindee import Client, product, AsyncPredictResponse
# Init a new client
mindee_client = Client(api_key="my-api-key")
# Load a file from disk
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")
# Load a file from disk and enqueue it.
result: AsyncPredictResponse = mindee_client.enqueue_and_parse(
product.InvoiceV4,
input_doc,
)
# Print a brief summary of the parsed data
print(result.document)Output (RST):
########
Document
########
:Mindee ID: 744748d5-9051-461c-b70c-bbf81f5ff943
:Filename: default_sample.jpg
Inference
#########
:Product: mindee/invoices v4.11
:Rotation applied: Yes
Prediction
==========
:Locale: en-CA; en; CA; CAD;
:Invoice Number: 14
:Purchase Order Number: AD29094
:Reference Numbers: AD29094
:Purchase Date: 2018-09-25
:Due Date:
:Payment Date:
:Total Net: 2145.00
:Total Amount: 2608.20
:Total Tax: 193.20
:Taxes:
+---------------+--------+----------+---------------+
| Base | Code | Rate (%) | Amount |
+===============+========+==========+===============+
| 2145.00 | | 8.00 | 193.20 |
+---------------+--------+----------+---------------+
:Supplier Payment Details:
:Supplier Name: TURNPIKE DESIGNS
:Supplier Company Registrations:
:Supplier Address: 156 University Ave, Toronto ON, Canada, M5H 2H7
:Supplier Phone Number: 4165551212
:Supplier Website:
:Supplier Email: j_coi@example.com
:Customer Name: JIRO DOI
:Customer Company Registrations:
:Customer Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
:Customer ID:
:Shipping Address:
:Billing Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
:Document Type: INVOICE
:Document Type Extended: INVOICE
:Purchase Subcategory:
:Purchase Category: miscellaneous
:Line Items:
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
| Description | Product code | Quantity | Tax Amount | Tax Rate (%) | Total Amount | Unit of measure | Unit Price |
+======================================+==============+==========+============+==============+==============+=================+============+
| Platinum web hosting package Down... | | 1.00 | | | 65.00 | | 65.00 |
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
| 2 page website design Includes ba... | | 3.00 | | | 2100.00 | | 2100.00 |
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
| Mobile designs Includes responsiv... | | 1.00 | | | 250.00 | 1 | 250.00 |
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
Page Predictions
================
Page 0
------
:Locale: en-CA; en; CA; CAD;
:Invoice Number: 14
:Purchase Order Number: AD29094
:Reference Numbers: AD29094
:Purchase Date: 2018-09-25
:Due Date:
:Payment Date:
:Total Net: 2145.00
:Total Amount: 2608.20
:Total Tax: 193.20
:Taxes:
+---------------+--------+----------+---------------+
| Base | Code | Rate (%) | Amount |
+===============+========+==========+===============+
| 2145.00 | | 8.00 | 193.20 |
+---------------+--------+----------+---------------+
:Supplier Payment Details:
:Supplier Name: TURNPIKE DESIGNS
:Supplier Company Registrations:
:Supplier Address: 156 University Ave, Toronto ON, Canada, M5H 2H7
:Supplier Phone Number: 4165551212
:Supplier Website:
:Supplier Email: j_coi@example.com
:Customer Name: JIRO DOI
:Customer Company Registrations:
:Customer Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
:Customer ID:
:Shipping Address:
:Billing Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
:Document Type: INVOICE
:Document Type Extended: INVOICE
:Purchase Subcategory:
:Purchase Category: miscellaneous
:Line Items:
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
| Description | Product code | Quantity | Tax Amount | Tax Rate (%) | Total Amount | Unit of measure | Unit Price |
+======================================+==============+==========+============+==============+==============+=================+============+
| Platinum web hosting package Down... | | 1.00 | | | 65.00 | | 65.00 |
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
| 2 page website design Includes ba... | | 3.00 | | | 2100.00 | | 2100.00 |
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+
| Mobile designs Includes responsiv... | | 1.00 | | | 250.00 | 1 | 250.00 |
+--------------------------------------+--------------+----------+------------+--------------+--------------+-----------------+------------+These fields are generic and used in several products.
Each prediction object contains a set of fields that inherit from the generic BaseField class.
A typical BaseField object will have the following attributes:
- value (
Union[float, str]): corresponds to the field value. Can beNoneif no value was extracted. - confidence (
float): the confidence score of the field prediction. - bounding_box (
[Point, Point, Point, Point]): contains exactly 4 relative vertices (points) coordinates of a right rectangle containing the field in the document. - polygon (
List[Point]): contains the relative vertices coordinates (Point) of a polygon containing the field in the image. - page_id (
int): the ID of the page, alwaysNonewhen at document-level. - reconstructed (
bool): indicates whether an object was reconstructed (not extracted as the API gave it).
Note: A
Pointsimply refers to a List of two numbers ([float, float]).
Aside from the previous attributes, all basic fields have access to a custom __str__ method that can be used to print their value as a string.
Aside from the basic BaseField attributes, the address field AddressField also implements the following:
- street_number (
str): String representation of the street number. Can beNone. - street_name (
str): Name of the street. Can beNone. - po_box (
str): String representation of the PO Box number. Can beNone. - address_complement (
str): Address complement. Can beNone. - city (
str): City name. Can beNone. - postal_code (
str): String representation of the postal code. Can beNone. - state (
str): State name. Can beNone. - country (
str): Country name. Can beNone.
Note: The value field of an AddressField should be a concatenation of the rest of the values.
The amount field AmountField only has one constraint: its value is an Optional[float].
The classification field ClassificationField does not implement all the basic BaseField attributes. It only implements value, confidence and page_id.
Note: a classification field's
value is always astr`.
Aside from the basic BaseField attributes, the company registration field CompanyRegistrationField also implements the following:
- type (
str): the type of company.
Aside from the basic BaseField attributes, the date field DateField also implements the following:
- date_object (
Date): an accessible representation of the value as a python object. Can beNone.
The locale field LocaleField only implements the value, confidence and page_id base BaseField attributes, but it comes with its own:
- language (
str): ISO 639-1 language code (e.g.:enfor English). Can beNone. - country (
str): ISO 3166-1 alpha-2 or ISO 3166-1 alpha-3 code for countries (e.g.:GRBorGBfor "Great Britain"). Can beNone. - currency (
str): ISO 4217 code for currencies (e.g.:USDfor "US Dollars"). Can beNone.
Aside from the basic BaseField attributes, the payment details field PaymentDetailsField also implements the following:
- account_number (
str): number of an account, expressed as a string. Can beNone. - iban (
str): International Bank Account Number. Can beNone. - routing_number (
str): routing number of an account. Can beNone. - swift (
str): the account holder's bank's SWIFT Business Identifier Code (BIC). Can beNone.
The text field StringField only has one constraint: its value is an Optional[str].
Aside from the basic BaseField attributes, the tax field TaxField also implements the following:
- rate (
float): the tax rate applied to an item expressed as a percentage. Can beNone. - code (
str): tax code (or equivalent, depending on the origin of the document). Can beNone. - basis (
float): base amount used for the tax. Can beNone. - value (
float): the value of the tax. Can beNone.
Note: currently
TaxFieldis not used on its own, and is accessed through a parentTaxesobject, a list-like structure.
The Taxes field represents a list-like collection of TaxField objects. As it is the representation of several objects, it has access to a custom __str__ method that can render a TaxField object as a table line.
Fields which are specific to this product; they are not used in any other product.
List of all the line items present on the invoice.
A InvoiceV4LineItem implements the following attributes:
- description (
str): The item description. - product_code (
str): The product code of the item. - quantity (
float): The item quantity - tax_amount (
float): The item tax amount. - tax_rate (
float): The item tax rate in percentage. - total_amount (
float): The item total amount. - unit_measure (
str): The item unit of measure. - unit_price (
float): The item unit price.
The following fields are extracted for Invoice V4:
billing_address (AddressField): The customer billing address.
print(result.document.inference.prediction.billing_address.value)category (ClassificationField): The purchase category.
- 'toll'
- 'food'
- 'parking'
- 'transport'
- 'accommodation'
- 'telecom'
- 'miscellaneous'
- 'software'
- 'shopping'
- 'energy'
print(result.document.inference.prediction.category.value)customer_address (AddressField): The address of the customer.
print(result.document.inference.prediction.customer_address.value)customer_company_registrations (List[CompanyRegistrationField]): List of company registration numbers associated to the customer.
for customer_company_registrations_elem in result.document.inference.prediction.customer_company_registrations:
print(customer_company_registrations_elem.value)customer_id (StringField): The customer account number or identifier from the supplier.
print(result.document.inference.prediction.customer_id.value)customer_name (StringField): The name of the customer or client.
print(result.document.inference.prediction.customer_name.value)date (DateField): The date the purchase was made.
print(result.document.inference.prediction.date.value)document_type (ClassificationField): Document type: INVOICE or CREDIT NOTE.
- 'INVOICE'
- 'CREDIT NOTE'
print(result.document.inference.prediction.document_type.value)document_type_extended (ClassificationField): Document type extended.
- 'CREDIT NOTE'
- 'INVOICE'
- 'OTHER'
- 'OTHER_FINANCIAL'
- 'PAYSLIP'
- 'PURCHASE ORDER'
- 'QUOTE'
- 'RECEIPT'
- 'STATEMENT'
print(result.document.inference.prediction.document_type_extended.value)due_date (DateField): The date on which the payment is due.
print(result.document.inference.prediction.due_date.value)invoice_number (StringField): The invoice number or identifier.
print(result.document.inference.prediction.invoice_number.value)line_items (List[InvoiceV4LineItem]): List of all the line items present on the invoice.
for line_items_elem in result.document.inference.prediction.line_items:
print(line_items_elem)locale (LocaleField): The locale of the document.
print(result.document.inference.prediction.locale.value)payment_date (DateField): The date on which the payment is due / was full-filled.
print(result.document.inference.prediction.payment_date.value)po_number (StringField): The purchase order number.
print(result.document.inference.prediction.po_number.value)reference_numbers (List[StringField]): List of all reference numbers on the invoice, including the purchase order number.
for reference_numbers_elem in result.document.inference.prediction.reference_numbers:
print(reference_numbers_elem.value)shipping_address (AddressField): Customer's delivery address.
print(result.document.inference.prediction.shipping_address.value)subcategory (ClassificationField): The purchase subcategory for transport, food and shopping.
- 'plane'
- 'taxi'
- 'train'
- 'restaurant'
- 'shopping'
- 'other'
- 'groceries'
- 'cultural'
- 'electronics'
- 'office_supplies'
- 'micromobility'
- 'car_rental'
- 'public'
- 'delivery'
- None
print(result.document.inference.prediction.subcategory.value)supplier_address (AddressField): The address of the supplier or merchant.
print(result.document.inference.prediction.supplier_address.value)supplier_company_registrations (List[CompanyRegistrationField]): List of company registration numbers associated to the supplier.
for supplier_company_registrations_elem in result.document.inference.prediction.supplier_company_registrations:
print(supplier_company_registrations_elem.value)supplier_email (StringField): The email address of the supplier or merchant.
print(result.document.inference.prediction.supplier_email.value)supplier_name (StringField): The name of the supplier or merchant.
print(result.document.inference.prediction.supplier_name.value)supplier_payment_details (List[PaymentDetailsField]): List of payment details associated to the supplier of the invoice.
for supplier_payment_details_elem in result.document.inference.prediction.supplier_payment_details:
print(supplier_payment_details_elem.value)supplier_phone_number (StringField): The phone number of the supplier or merchant.
print(result.document.inference.prediction.supplier_phone_number.value)supplier_website (StringField): The website URL of the supplier or merchant.
print(result.document.inference.prediction.supplier_website.value)taxes (List[TaxField]): List of taxes. Each item contains the detail of the tax.
for taxes_elem in result.document.inference.prediction.taxes:
print(taxes_elem.polygon)total_amount (AmountField): The total amount of the invoice: includes taxes, tips, fees, and other charges.
print(result.document.inference.prediction.total_amount.value)total_net (AmountField): The net amount of the invoice: does not include taxes, fees, and discounts.
print(result.document.inference.prediction.total_net.value)total_tax (AmountField): The total tax: the sum of all the taxes for this invoice.
print(result.document.inference.prediction.total_tax.value)