| title | International ID OCR Python |
|---|---|
| category | 622b805aaec68102ea7fcbc2 |
| slug | python-international-id-ocr |
| parentDoc | 609808f773b0b90051d839de |
The Python OCR SDK supports the International ID API.
Using the sample below, we are going to illustrate how to extract the data that we want using the OCR SDK.

from mindee import Client, product, AsyncPredictResponse
# Init a new client
mindee_client = Client(api_key="my-api-key")
# Load a file from disk
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")
# Load a file from disk and enqueue it.
result: AsyncPredictResponse = mindee_client.enqueue_and_parse(
product.InternationalIdV2,
input_doc,
)
# Print a brief summary of the parsed data
print(result.document)Output (RST):
########
Document
########
:Mindee ID: cfa20a58-20cf-43b6-8cec-9505fa69d1c2
:Filename: default_sample.jpg
Inference
#########
:Product: mindee/international_id v2.0
:Rotation applied: No
Prediction
==========
:Document Type: IDENTIFICATION_CARD
:Document Number: 12345678A
:Surnames: MUESTRA
MUESTRA
:Given Names: CARMEN
:Sex: F
:Birth Date: 1980-01-01
:Birth Place: CAMPO DE CRIPTANA CIUDAD REAL ESPANA
:Nationality: ESP
:Personal Number: BAB1834284<44282767Q0
:Country of Issue: ESP
:State of Issue: MADRID
:Issue Date:
:Expiration Date: 2030-01-01
:Address: C/REAL N13, 1 DCHA COLLADO VILLALBA MADRID MADRID MADRID
:MRZ Line 1: IDESPBAB1834284<44282767Q0<<<<
:MRZ Line 2: 8001010F1301017ESP<<<<<<<<<<<3
:MRZ Line 3: MUESTRA<MUESTRA<<CARMEN<<<<<<<These fields are generic and used in several products.
Each prediction object contains a set of fields that inherit from the generic BaseField class.
A typical BaseField object will have the following attributes:
- value (
Union[float, str]): corresponds to the field value. Can beNoneif no value was extracted. - confidence (
float): the confidence score of the field prediction. - bounding_box (
[Point, Point, Point, Point]): contains exactly 4 relative vertices (points) coordinates of a right rectangle containing the field in the document. - polygon (
List[Point]): contains the relative vertices coordinates (Point) of a polygon containing the field in the image. - page_id (
int): the ID of the page, alwaysNonewhen at document-level. - reconstructed (
bool): indicates whether an object was reconstructed (not extracted as the API gave it).
Note: A
Pointsimply refers to a List of two numbers ([float, float]).
Aside from the previous attributes, all basic fields have access to a custom __str__ method that can be used to print their value as a string.
The classification field ClassificationField does not implement all the basic BaseField attributes. It only implements value, confidence and page_id.
Note: a classification field's
value is always astr`.
Aside from the basic BaseField attributes, the date field DateField also implements the following:
- date_object (
Date): an accessible representation of the value as a python object. Can beNone.
The text field StringField only has one constraint: its value is an Optional[str].
The following fields are extracted for International ID V2:
address (StringField): The physical address of the document holder.
print(result.document.inference.prediction.address.value)birth_date (DateField): The date of birth of the document holder.
print(result.document.inference.prediction.birth_date.value)birth_place (StringField): The place of birth of the document holder.
print(result.document.inference.prediction.birth_place.value)country_of_issue (StringField): The country where the document was issued.
print(result.document.inference.prediction.country_of_issue.value)document_number (StringField): The unique identifier assigned to the document.
print(result.document.inference.prediction.document_number.value)document_type (ClassificationField): The type of personal identification document.
- 'IDENTIFICATION_CARD'
- 'PASSPORT'
- 'DRIVER_LICENSE'
- 'VISA'
- 'RESIDENCY_CARD'
- 'VOTER_REGISTRATION'
print(result.document.inference.prediction.document_type.value)expiry_date (DateField): The date when the document becomes invalid.
print(result.document.inference.prediction.expiry_date.value)given_names (List[StringField]): The list of the document holder's given names.
for given_names_elem in result.document.inference.prediction.given_names:
print(given_names_elem.value)issue_date (DateField): The date when the document was issued.
print(result.document.inference.prediction.issue_date.value)mrz_line1 (StringField): The Machine Readable Zone, first line.
print(result.document.inference.prediction.mrz_line1.value)mrz_line2 (StringField): The Machine Readable Zone, second line.
print(result.document.inference.prediction.mrz_line2.value)mrz_line3 (StringField): The Machine Readable Zone, third line.
print(result.document.inference.prediction.mrz_line3.value)nationality (StringField): The country of citizenship of the document holder.
print(result.document.inference.prediction.nationality.value)personal_number (StringField): The unique identifier assigned to the document holder.
print(result.document.inference.prediction.personal_number.value)sex (StringField): The biological sex of the document holder.
print(result.document.inference.prediction.sex.value)state_of_issue (StringField): The state or territory where the document was issued.
print(result.document.inference.prediction.state_of_issue.value)surnames (List[StringField]): The list of the document holder's family names.
for surnames_elem in result.document.inference.prediction.surnames:
print(surnames_elem.value)