This repository contains a list of large language models tailored for the Portuguese language.
Criteria:
- Multilingual models are only included if they have been fine-tuned for Portuguese;
- Only models that can answer queries or questions are considered (not models that only complete sentences);
- Models with less than 1 billion parameters are not listed.
Models are grouped into "families", usually when they share a name and have been developed by the same team.
Pull requests are welcome.
To add models to our list, create directories under data/models/ containing metadata.toml files
following this template:
Template
name = "Your Model Name"
url = "https://example.com/your-model"
release_date = "2025-12-31"
license = "The Model's License 123"
language_varieties = [ "pt-PT", "pt-BR" ]
size = "1.8B"
base_model = "Name of the Base Model"
model_id = "my_model_1.8b"
[[training_data]]
name = "Dataset 1"
url = ""
[[training_data]]
name = "Dataset 2"
url = "https://example.com/dataset2"
[[origin]]
name = "Person or Institution 1"
url = "https://example.com/institution1"
[[origin]]
name = "Person or Institution 2"
url = ""
[weight_availability]
available_now = false
[public_api_availability]
available_now = true
url = "https://example.com/your-model-api"
[online_chat_availability]
available_now = false
planned = true
[knowledge_cutoff]
date = "2024-12"
type = "possibly_later"Note that most fields are required, and a blank string should be used when the information is not available.
For models that are grouped into families,
the name field is mandatory in all levels,
and all the other fields must be defined
only once per model and cannot be overridden
(i.e. if a family already defines size,
its subdirectories must not redefine size).
name: The model name (required).url: The address of a webpage that is relevant to the model (such as a page about the model on the website of the company or institution that developed it, or a published paper, or a Hugging Face link if nothing else is available). If there are no relevant URLs, leave the field blank.release_date: When the model was released. Accepted formats:YYYY-MM-DD("2025-12-31"),YYYY-MM("2025-12"),YYYY("2025"),future(for unreleased models), or a blank string (when the information was not found).license: The model license. If it is a proprietary model, write the stringproprietary. If the license is unknown, leave the field blank.language_varieties: The list of language varieties the model was trained to handle. If unknown, write an empty list[ ]. Currently, the possible values arept-BR(Brazilian Portuguese),pt-PT(European Portuguese) andgl-ES(Galician).size: The number of parameters in the model. It must be a number followed byB(billions) orT(trillions), e.g. "2.5B", or a blank string if not found.base_model: Name of the base model, in case it is the result of a fine-tuning process. If this information is not available, leave it blank.model_id: ID of the model, as used by model repositories and API servers.training_data: List of datasets used to train/fine-tune the model. Write one[[training_data]]block for each dataset, with itsname=(required) andurl=(required even if blank). If the training data is not known, remove the[[training_data]]blocks and write the linetraining_data = [ ]before all sections.origin: List of people or institutions responsible for developing the model. Write one[[origin]]block for each item, with itsname=(required) andurl=(required even if blank). If the origin is not known, remove the[[origin]]blocks and write the lineorigin = [ ]before all sections.weight_availability: Public availability of the model's weights. If the subfieldavailable_nowisfalse, then nourlshould be given, and the optional subfieldplanned = truecan be added in case the weights are expected to be released in the future. If the subfieldavailable_nowistrue, then theurlis mandatory and should point where the weights can be found.public_api_availability: Availability of an API where the model can be accessed programmatically. If the subfieldavailable_nowisfalse, then nourlshould be given, and the optional subfieldplanned = truecan be added in case an API is expected to be released in the future. If the subfieldavailable_nowistrue, then theurlis mandatory and should point to where the API can be accessed.online_chat_availability: Availability of an online chat where the model can be accessed by users. If the subfieldavailable_nowisfalse, then nourlshould be given, and the optional subfieldplanned = truecan be added in case an online chat is expected to be released in the future. If the subfieldavailable_nowistrue, then theurlis mandatory and should point to where the chat can be accessed.knowledge_cutoff: The limit date of the model's knowledge. The subfielddateaccepts the following formats:YYYY-MM-DD("2025-12-31"),YYYY-MM("2025-12"),YYYY("2025"), or a blank string (when the information was not found). The subfieldtypemust be one ofexact,possibly_earlierandpossibly_later, or blank if the date is also blank.
In the metadata.toml files, all the values must be written in English, except for proper names (of people, institutions, datasets, etc.).
The values of the fields license, base_model, and the subfield name
of the fields training_data and origin can be translated into the other
supported languages by creating an i18n-<LANG>.toml file (e.g. i18n-pt.toml)
in the same directory, following this format:
"Original string 1" = "Translated string 1"
"Original string 2" = "Translated string 2"Install uv, then run uv sync to install the dependencies.
To check the integrity of the data and generate an HTML page, run:
uv run python main.py data/ html/If there are errors, the script will print them and halt. Otherwise, HTML files will be generated in the html/ directory, one per language (currently, English and Portuguese).