Skip to content

feat: rewrite toString and other oveloaded methods#1401

Merged
lfoppiano merged 7 commits intofeature/code-formattingfrom
feature/rewrite-overloaded
Apr 21, 2026
Merged

feat: rewrite toString and other oveloaded methods#1401
lfoppiano merged 7 commits intofeature/code-formattingfrom
feature/rewrite-overloaded

Conversation

@lfoppiano
Copy link
Copy Markdown
Member

Rewrite all the overloaded methods using standard libraries.
We ignore the toString that are actually parsed from other methods /utilities.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a broad refactor/release-prep update that (a) standardizes many toString/equals/hashCode implementations using Apache Commons builders, (b) refactors training entrypoints and adds a centralized trainer registry, and (c) updates service/UI, Crossref handling, docs, Docker/CI, and shipped model configs for the 0.9.x line.

Changes:

  • Standardize object toString/equals/hashCode implementations across core data/layout/tokenization/counters classes.
  • Refactor trainer wiring (new TrainerRegistry, shared trainAndEvaluate helpers, CLI options for custom params/model paths, progress bar behavior).
  • Update REST/service behavior (BibTeX negotiation tweaks, BibTeX key generation), web UI health/version indicators, Crossref integration, docs, Dockerfiles, and CI workflows.

Reviewed changes

Copilot reviewed 128 out of 195 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
mkdocs.yml Reorganizes MkDocs navigation and updates GitHub repo URL.
grobid-trainer/src/main/java/org/grobid/trainer/evaluation/EndToEndEvaluation.java Adds progress bar auto/simple modes and builder-based configuration.
grobid-trainer/src/main/java/org/grobid/trainer/TrainerRunner.java Refactors CLI: dynamic model list, new Wapiti params and -modelPath.
grobid-trainer/src/main/java/org/grobid/trainer/TrainerRegistry.java New central mapping of model names to trainer factories.
grobid-trainer/src/main/java/org/grobid/trainer/TrainerFactory.java Removes debug prints.
grobid-trainer/src/main/java/org/grobid/trainer/TableTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/ShorttextTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/SegmentationTrainer.java Switches flavor-aware main() to shared helper.
grobid-trainer/src/main/java/org/grobid/trainer/ReferenceSegmenterTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/PatentParserTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/NameHeaderTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/NameCitationTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/MonographTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/HeaderTrainer.java Switches flavor-aware main() to shared helper.
grobid-trainer/src/main/java/org/grobid/trainer/FundingAcknowledgementTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/FulltextTrainer.java Switches flavor-aware main() to shared helper.
grobid-trainer/src/main/java/org/grobid/trainer/FigureTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/DateTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/CitationTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/ChemicalEntityTrainer.java Switches main() to shared trainAndEvaluate and updates signature.
grobid-trainer/src/main/java/org/grobid/trainer/AffiliationAddressTrainer.java Switches main() to shared trainAndEvaluate.
grobid-trainer/src/main/java/org/grobid/trainer/AbstractTrainer.java Adds custom training/eval model paths, parameter overrides, shared main() helpers, custom tagger loading.
grobid-trainer/doc/eLife_984.results.* Removes an outdated evaluation results artifact from repo.
grobid-service/src/main/resources/web/resources/css/style.css Adds health-indicator styling and positions header container.
grobid-service/src/main/resources/web/grobid/grobid.js Adds version fetch + periodic health polling and updates UI navigation behavior.
grobid-service/src/main/java/org/grobid/service/util/BibTexMediaType.java Adds MEDIA_TYPE_QS to prefer XML when Accept is missing.
grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessTraining.java Switches training model selection to TrainerRegistry.
grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessString.java Changes BibTeX key generation for citation endpoints.
grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessFiles.java Changes BibTeX key generation for file-based endpoints.
grobid-service/src/main/java/org/grobid/service/GrobidRestService.java Applies qs media type for BibTeX endpoints to improve content negotiation defaults.
grobid-service/scripts/run-service.sh Adds arch detection and adjusts native library path selection.
grobid-home/scripts/run_evaluation.sh Improves evaluation invocation flags and Python env library path handling.
grobid-home/models/table-BidLSTM_CRF/preprocessor.json Updates DL model preprocessor vocab mapping.
grobid-home/models/table-BidLSTM_CRF/config.json Updates DL model config (char vocab size).
grobid-home/models/reference-segmenter-BidLSTM_ChainCRF_FEATURES/config.json Adjusts DL training config (batch size).
grobid-home/models/reference-segmenter-BidLSTM_CRF_FEATURES/config.json Adjusts DL training config (batch size).
grobid-home/models/patent-citation-BidLSTM_CRF_FEATURES/preprocessor.json Updates DL model preprocessor vocab mapping.
grobid-home/models/patent-citation-BidLSTM_CRF_FEATURES/config.json Updates DL model config (char vocab size, removes ELMo flag).
grobid-home/models/name-citation-BidLSTM_CRF_FEATURES/preprocessor.json Updates DL model preprocessor vocab mapping.
grobid-home/models/name-citation-BidLSTM_CRF_FEATURES/config.json Updates DL model config (char vocab size).
grobid-home/models/header-article-light-ref-BidLSTM_ChainCRF_FEATURES/config.json Removes obsolete ELMo flag.
grobid-home/models/header-article-light-ref-BidLSTM_CRF_FEATURES/config.json Removes obsolete ELMo flag.
grobid-home/models/header-article-light-BidLSTM_ChainCRF_FEATURES/config.json Removes obsolete ELMo flag.
grobid-home/models/header-article-light-BidLSTM_CRF_FEATURES/config.json Removes obsolete ELMo flag.
grobid-home/models/date-BidLSTM_CRF_FEATURES/config.json Updates DL model config (char vocab size).
grobid-home/models/citation-BidLSTM_ChainCRF_FEATURES/config.json Updates DL model config (seq length, batch size, vocab size).
grobid-home/models/citation-BidLSTM_CRF_FEATURES/config.json Updates DL model config (seq length, batch size, vocab size).
grobid-home/config/grobid.yaml Extends Crossref consolidation settings (rate limiting, post-validation).
grobid-core/src/test/java/org/grobid/core/engines/AuthorParserTest.java Strengthens tests for et al stripping and adds whitespace case.
grobid-core/src/test/java/org/grobid/core/engines/AuthorParserReDoSTest.java Refactors ReDoS test to test regex directly without model loading.
grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequestTask.java Updates Crossref response handling to backoff/disable token and parse new headers.
grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequestListener.java Adds extra response fields and modernizes toString().
grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequest.java Updates UA/version, token gating, reads new Crossref headers, handles 429.
grobid-core/src/main/java/org/grobid/core/utilities/counters/impl/CounterImpl.java Uses Commons builders for equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/counters/impl/CntManagerImpl.java Uses Commons builders for toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/Triple.java Uses Commons builders for toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/Pair.java Uses Commons builders for toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/GrobidProperties.java Adds Crossref min-request-interval and post-validation accessors.
grobid-core/src/main/java/org/grobid/core/utilities/GrobidConfig.java Adds Crossref config fields for interval and post-validation.
grobid-core/src/main/java/org/grobid/core/utilities/ElementCounter.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/tokenization/TaggingTokenCluster.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/tokenization/LabeledTokensContainer.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/lexicon/Lexicon.java Fixes URL destination/token offset alignment and clarifies comment.
grobid-core/src/main/java/org/grobid/core/layout/PDFAnnotation.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/layout/LayoutToken.java Modernizes hashCode().
grobid-core/src/main/java/org/grobid/core/layout/GraphicObject.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/layout/BoundingBox.java Modernizes equals/hashCode.
grobid-core/src/main/java/org/grobid/core/jni/JEPThreadPoolClassifier.java Sets env for legacy Keras 2 and imports tf_keras.
grobid-core/src/main/java/org/grobid/core/jni/JEPThreadPool.java Sets env for legacy Keras 2 and imports tf_keras.
grobid-core/src/main/java/org/grobid/core/engines/tagging/WapitiTagger.java Adds constructor to load a model from a File.
grobid-core/src/main/java/org/grobid/core/engines/tagging/TaggerFactory.java Adds getTaggerFromPath for explicit model file loading.
grobid-core/src/main/java/org/grobid/core/engines/citations/LabeledReferenceResult.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/engines/AuthorParser.java Avoids repeated trim() and ensures consistent regex matching input.
grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java Adds status and source attributes on <biblStruct>.
grobid-core/src/main/java/org/grobid/core/document/DocumentPointer.java Modernizes toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/document/DocumentPiece.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/document/DocumentNode.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/Person.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/PatentItem.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/Keyword.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/Funding.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/Funder.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/Date.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/BibDataSet.java Modernizes toString().
grobid-core/src/main/java/org/grobid/core/data/Affiliation.java Modernizes toString().
gradle.properties Bumps project version to 0.9.1-SNAPSHOT.
doc/index.md Restructures docs landing page into clearer sections.
doc/getting_started.md Adds upgrade tip and updates links/wording and Docker tags.
doc/benchmarks/flavors/article_light/benchmaking-pmc.md Updates benchmark numbers and formatting.
doc/benchmarks/flavors/article_light/benchmaking-plos.md Updates benchmark numbers and formatting.
doc/benchmarks/flavors/article_light/benchmaking-elife.md Updates benchmark numbers and formatting.
doc/benchmarks/flavors/article_light/benchmaking-biorxiv.md Updates benchmark numbers and formatting.
doc/benchmarks/Benchmarking.md Fixes link paths.
doc/Upgrading.md Adds new upgrade guide (notably 0.9.0).
doc/Training-the-models-of-Grobid.md Updates build command and adds extensive FAQ section.
doc/References.md Updates repo links to new GitHub org.
doc/Principles.md Fixes benchmarking links/paths.
doc/Introduction.md Updates badges/links and Crossref references.
doc/Install-Grobid.md Updates versions, URLs, and build command.
doc/Grobid-service.md Updates versions, health/version docs, and adds clients/output format notes.
doc/Grobid-java-library.md Updates version numbers and examples.
doc/Grobid-docker.md Updates Docker tags/commands and links.
doc/End-to-end-evaluation.md Updates dataset hosting info/links and benchmarking links.
doc/Deep-Learning-models.md Updates DL stack guidance (DeLFT/TF/Python/ARM notes).
doc/Coordinates-in-PDF.md Adds FAQ section and improves wording.
doc/Configuration.md Fixes internal links to local docs.
build.gradle Updates plugin/deps, adds Spotless, adds git revision expansion and progressbar system prop, adjusts release branching rule.
Readme.md Updates badges/links, runtime requirements, citation snippet, and upgrade pointer.
Dockerfile.evaluation Updates TF/DeLFT/JEP/Python versions and LD_LIBRARY_PATH handling.
Dockerfile.delft Major rebuild for JDK21 + Python 3.11 + TF 2.17 + multi-arch handling.
Dockerfile.crf Updates native lib handling and LD_LIBRARY_PATH for multi-arch.
CITATION.cff Updates repository URL, version fields, and citation metadata.
CHANGELOG.md Adds 0.9.0 release notes.
.gitignore Ignores Python cache artifacts.
.github/workflows/ci-build-unstable.yml Adds Spotless check (non-blocking), fetches tags, and passes computed version into Docker builds; triggers HF reboot.
.github/workflows/ci-build-tag-custom.yml Retags Docker images digest-preservingly with Buildx imagetools and safer auth handling.
.github/workflows/ci-build-manual-onnx.yml Fetches tags, computes version, passes build args.
.github/workflows/ci-build-manual-full.yml Fetches tags, computes version, passes build args, clarifies summary.
.github/workflows/ci-build-manual-eval.yml Fetches tags, computes version, passes build args.
.github/workflows/ci-build-manual-crf.yml Splits amd64/arm64 builds, merges manifest, computes version, improves summary.
.editorconfig Harmonizes formatting defaults (final newline, trimming, indent rules).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 62 to +132
@@ -108,12 +94,48 @@ public static void main(String[] args) {
} else if (args[i].equals("-i")) {
incremental = true;

} else if (args[i].equals("-epsilon")) {
if (i + 1 == args.length) {
throw new IllegalStateException("Missing epsilon value. ");
}
try {
epsilon = Double.parseDouble(args[i + 1]);
} catch (Exception e) {
throw new IllegalStateException("Invalid epsilon value: " + args[i + 1]);
}

} else if (args[i].equals("-w")) {
if (i + 1 == args.length) {
throw new IllegalStateException("Missing window value. ");
}
try {
window = Integer.parseInt(args[i + 1]);
} catch (Exception e) {
throw new IllegalStateException("Invalid window value: " + args[i + 1]);
}

} else if (args[i].equals("-maxIter")) {
if (i + 1 == args.length) {
throw new IllegalStateException("Missing nbMaxIterations value. ");
}
try {
nbMaxIterations = Integer.parseInt(args[i + 1]);
} catch (Exception e) {
throw new IllegalStateException("Invalid nbMaxIterations value: " + args[i + 1]);
}

} else if (args[i].equals("-modelPath")) {
if (i + 1 == args.length) {
throw new IllegalStateException("Missing model output path value. ");
}
outputModelFilePath = args[i + 1];

Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument parsing consumes the value at args[i+1] for flags like -gH/-s/-n/-epsilon/-w/-maxIter/-modelPath but does not advance the loop index. This means the value token is processed again on the next iteration and can be misinterpreted if it matches another flag (or starts with '-'), leading to incorrect parsing. Consider incrementing i after consuming a flag value (or switching to a dedicated CLI parser) so each option/value pair is handled atomically.

Copilot uses AI. Check for mistakes.
Comment on lines 553 to +557
protected GenericTagger getTagger() {
if (tagger == null) {
tagger = TaggerFactory.getTagger(model);
tagger = (evaluationModelPath != null)
? TaggerFactory.getTaggerFromPath(evaluationModelPath, GrobidProperties.getGrobidEngine(model))
: TaggerFactory.getTagger(model);
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When evaluationModelPath is set, getTagger() routes to TaggerFactory.getTaggerFromPath(...), which currently only supports the Wapiti engine and throws for DeLFT/others. This makes the new -modelPath option fail at runtime for any model configured with a non-Wapiti engine. Either validate earlier (and fail with a clear message) or extend TaggerFactory.getTaggerFromPath to support the other engines used by trainers.

Copilot uses AI. Check for mistakes.
Comment on lines 278 to 283
response = Response.status(Status.NO_CONTENT).build();
} else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
response = Response.status(Status.OK)
.entity(biblioItem.toBibTeX("-1", config))
.entity(biblioItem.toBibTeX(biblioItem.generateBibTeXKey(), config))
.header(HttpHeaders.CONTENT_TYPE, BibTexMediaType.MEDIA_TYPE + "; charset=UTF-8")
.build();
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching the BibTeX key from a guaranteed-unique placeholder ("-1") to biblioItem.generateBibTeXKey() can produce duplicate keys across requests (and even within a single response), because the key is derived from surname/year/title and has no disambiguation. This can result in invalid/ambiguous BibTeX output. Consider keeping a deterministic per-response disambiguation (e.g., append a counter when a key repeats) or preserve the previous unique key behavior for single-citation endpoints.

Copilot uses AI. Check for mistakes.
Comment on lines 325 to 330
response = Response.status(Status.NO_CONTENT).build();
} else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
StringBuilder responseContent = new StringBuilder();
int n = 0;
for(BiblioItem biblioItem : biblioItems) {
responseContent.append(biblioItem.toBibTeX(""+n, config));
responseContent.append(biblioItem.toBibTeX(biblioItem.generateBibTeXKey(), config));
responseContent.append("\n");
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For citation lists, replacing the per-item numeric key ("0", "1", ...) with generateBibTeXKey() can yield duplicate BibTeX entry keys within the same response, which breaks BibTeX consumers. Consider tracking used keys within the loop and appending a suffix/counter when collisions occur (or keep the previous index-based keying for list responses).

Copilot uses AI. Check for mistakes.
Comment on lines 157 to 162
response = Response.status(Response.Status.NO_CONTENT).build();
} else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
response = Response.status(Response.Status.OK)
.entity(result.toBibTeX("-1"))
.entity(result.toBibTeX(result.generateBibTeXKey()))
.header(HttpHeaders.CONTENT_TYPE, BibTexMediaType.MEDIA_TYPE + "; charset=UTF-8")
.build();
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the BibTeX key "-1" with result.generateBibTeXKey() removes the previous guaranteed-unique placeholder and can cause key collisions across outputs (and makes output less predictable for callers). If the intent is to provide stable keys, consider adding collision-avoidance (e.g., suffixing) or documenting the new behavior, because BibTeX requires unique entry keys within a file.

Copilot uses AI. Check for mistakes.
Comment on lines 775 to 781
} else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
StringBuilder result = new StringBuilder();
GrobidAnalysisConfig config = new GrobidAnalysisConfig.GrobidAnalysisConfigBuilder().includeRawCitations(includeRawCitations).build();
int p = 0;
for (BibDataSet res : bibDataSetList) {
result.append(res.getResBib().toBibTeX(Integer.toString(p), config));
BiblioItem bib = res.getResBib();
result.append(bib.toBibTeX(bib.generateBibTeXKey(), config));
result.append("\n");
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For references lists, using bib.generateBibTeXKey() for every entry can generate duplicate keys within the same response, because the key derivation does not include any disambiguation. This can make the returned BibTeX invalid for downstream tools. Consider de-duplicating keys in the loop (append a counter/suffix on collision) or keep the previous index-based keys for list responses.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +19
ARCH=$(uname -m)
if [ "$ARCH" = "aarch64" ]; then
LIB_DIR="../grobid-home/lib/lin_arm-64"
else
LIB_DIR="../grobid-home/lib/lin-64"
fi

java -Djava.library.path="$LIB_DIR" -jar build/libs/grobid-service-*-onejar.jar server $config || exit $?
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run-service.sh used to include the grobid-home JEP native library directory in -Djava.library.path, but now only points to the architecture-specific lib directory. If the JEP JNI library is still expected under "$LIB_DIR/jep" in non-Docker installs, this change will break Deep Learning (JEP) initialization. Consider adding the JEP subdirectory back (and/or incorporating the active Python env's JEP library path) so both Wapiti and JEP natives are discoverable.

Copilot uses AI. Check for mistakes.
Comment on lines 192 to 198
echo "DRY: would write log to ${log_dst}"
else
# execute gradlew from the directory where the script was invoked (START_PWD)
(cd "${START_PWD}" && "${cmd[@]}") 2>&1 | tee "${log_dst}"
(cd "${START_PWD}" && "${cmd[@]}") | tee "${log_dst}"
exit_code=${PIPESTATUS[0]}

if [ $exit_code -ne 0 ]; then
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation script no longer redirects stderr into the tee pipeline (it removed 2>&1). This means stack traces and error diagnostics may not be captured in the log file, making failures harder to debug and potentially hiding important warnings. Consider restoring stderr redirection (or redirecting stderr to a separate log) while still keeping the console settings that avoid progress-bar/TTY issues.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.append("status", status)
.append("interval", interval)
.append("limitIterations", limitIterations)
.append("results", results)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response.toString() now appends the entire results list, which can create very large log lines (and potentially expensive nested toString() calls). Previously this was summarized via results.size(). Consider logging only a count (and maybe a small sample) instead of the full list, and include errorMessage/errorException if the intent is diagnostics.

Suggested change
.append("results", results)
.append("resultsCount", CollectionUtils.isEmpty(results) ? 0 : results.size())
.append("errorMessage", errorMessage)
.append("errorException", errorException)

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +183
.append("classCounters", classCounters)
.append("strCnts", strCnts)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CntManagerImpl.toString() now prints classCounters/strCnts maps directly, which will emit CounterImpl@<hash> object identities rather than the actual counter values (and can become very large). The previous implementation converted counters to Map<String, Long> and included metrics; consider switching this to append getAllCounters() (or a dedicated representation) so the output remains value-focused and stable.

Suggested change
.append("classCounters", classCounters)
.append("strCnts", strCnts)
.append("counters", getAllCounters())
.append("metrics", metrics)

Copilot uses AI. Check for mistakes.
Signed-off-by: Luca Foppiano <luca@foppiano.org>
@lfoppiano lfoppiano force-pushed the feature/rewrite-overloaded branch from 4d93160 to b84bff0 Compare April 19, 2026 06:38
@lfoppiano lfoppiano requested a review from Copilot April 21, 2026 09:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread grobid-core/src/main/java/org/grobid/core/layout/Block.java
Comment on lines 68 to 71
@Override
public String toString() {
return "BiblioItem{" +
"submission_date='" + submission_date + '\'' +
", download_date='" + download_date + '\'' +
", server_date='" + server_date + '\'' +
", languageUtilities=" + languageUtilities +
", item=" + item +
", parentItem=" + parentItem +
", ISBN13='" + ISBN13 + '\'' +
", ISBN10='" + ISBN10 + '\'' +
", title='" + title + '\'' +
", publisher='" + publisher + '\'' +
", nbPages=" + nbPages +
", edition='" + edition + '\'' +
", language='" + language + '\'' +
", subtitle='" + subtitle + '\'' +
", publication_date='" + publication_date + '\'' +
", normalized_publication_date=" + normalized_publication_date +
", editors='" + editors + '\'' +
", publisher_website='" + publisher_website + '\'' +
", serie='" + serie + '\'' +
", ISSN='" + ISSN + '\'' +
", ISSNe='" + ISSNe + '\'' +
", volume='" + volume + '\'' +
", number='" + number + '\'' +
", month='" + month + '\'' +
", support_type='" + support_type + '\'' +
", version='" + version + '\'' +
", smallImageURL='" + smallImageURL + '\'' +
", largeImageURL='" + largeImageURL + '\'' +
", publisherPlace='" + publisherPlace + '\'' +
", review='" + review + '\'' +
", keywords=" + keywords +
", subjects=" + subjects +
", categories='" + categories + '\'' +
", type='" + type + '\'' +
", typeDescription='" + typeDescription + '\'' +
", book_type='" + book_type + '\'' +
", DOI='" + doi + '\'' +
", arXivId='" + arXivId + '\'' +
", PMID='" + PMID + '\'' +
", PMCID='" + PMCID + '\'' +
", PII='" + PII + '\'' +
", HALId='" + halId + '\'' +
", ark='" + ark + '\'' +
", istexId='" + istexId + '\'' +
", inDOI='" + inDOI + '\'' +
", abstract_='" + abstract_ + '\'' +
", authors='" + authors + '\'' +
", firstAuthorSurname='" + firstAuthorSurname + '\'' +
", location='" + location + '\'' +
", bookTitle='" + bookTitle + '\'' +
", serieTitle='" + serieTitle + '\'' +
", pageRange='" + pageRange + '\'' +
", journal='" + journal + '\'' +
", volumeBlock='" + volumeBlock + '\'' +
", institution='" + institution + '\'' +
", note='" + note + '\'' +
", affiliation='" + affiliation + '\'' +
", address='" + address + '\'' +
", country='" + country + '\'' +
", town='" + town + '\'' +
", email='" + email + '\'' +
", pubnum='" + pubnum + '\'' +
", keyword='" + keyword + '\'' +
", phone='" + phone + '\'' +
", degree='" + degree + '\'' +
", web='" + web + '\'' +
", issue='" + issue + '\'' +
", journal_abbrev='" + journal_abbrev + '\'' +
", event='" + event + '\'' +
", abstractHeader='" + abstractHeader + '\'' +
", day='" + day + '\'' +
", locationPublisher='" + locationPublisher + '\'' +
", dedication='" + dedication + '\'' +
", submission='" + submission + '\'' +
", english_title='" + english_title + '\'' +
", url='" + url + '\'' +
", oaUrl='" + oaUrl + '\'' +
", uri='" + uri + '\'' +
", confidence='" + confidence + '\'' +
", conf=" + conf +
", e_year='" + e_year + '\'' +
", e_month='" + e_month + '\'' +
", e_day='" + e_day + '\'' +
", s_year='" + s_year + '\'' +
", s_month='" + s_month + '\'' +
", s_day='" + s_day + '\'' +
", d_year='" + d_year + '\'' +
", d_month='" + d_month + '\'' +
", d_day='" + d_day + '\'' +
", a_year='" + a_year + '\'' +
", a_month='" + a_month + '\'' +
", a_day='" + a_day + '\'' +
", authorList=" + authorList +
", editorList=" + editorList +
", affiliationList=" + affiliationList +
", addressList=" + addressList +
", emailList=" + emailList +
", webList=" + webList +
", phoneList=" + phoneList +
", markers=" + markers +
", fullAuthors=" + fullAuthors +
", fullEditors=" + fullEditors +
", fullAffiliations=" + fullAffiliations +
", reference='" + reference + '\'' +
", copyright='" + copyright + '\'' +
", funding='" + funding + '\'' +
", affiliationAddressBlock='" + affiliationAddressBlock + '\'' +
", articleTitle='" + articleTitle + '\'' +
", beginPage=" + beginPage +
", endPage=" + endPage +
", year='" + year + '\'' +
", authorString='" + authorString + '\'' +
", path='" + path + '\'' +
", collaboration='" + collaboration + '\'' +
", postProcessEditors=" + postProcessEditors +
", crossrefError=" + crossrefError +
", normalized_submission_date=" + normalized_submission_date +
", normalized_download_date=" + normalized_download_date +
", originalAffiliation='" + originalAffiliation + '\'' +
", originalAbstract='" + originalAbstract + '\'' +
", originalTitle='" + originalTitle + '\'' +
", originalAuthors='" + originalAuthors + '\'' +
", originalEditors='" + originalEditors + '\'' +
", originalAddress='" + originalAddress + '\'' +
", originalNote='" + originalNote + '\'' +
", originalKeyword='" + originalKeyword + '\'' +
", originalVolumeBlock='" + originalVolumeBlock + '\'' +
", originalJournal='" + originalJournal + '\'' +
", workingGroup='" + workingGroup + '\'' +
", documentType='" + documentType + '\'' +
'}';
return ReflectionToStringBuilder.toString(this, ToStringStyle.SHORT_PREFIX_STYLE);
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to ReflectionToStringBuilder.toString(this, ...) will include all non-static fields (e.g., labeledTokens, working token copies, authorEmailAssigner, etc.). This can create extremely large/expensive log strings and may unintentionally dump internal objects/PII when BiblioItem is logged or concatenated into error messages. Prefer an explicit ToStringBuilder that whitelists a small set of stable, safe fields (or keep the previous curated list).

Copilot uses AI. Check for mistakes.
Base automatically changed from feature/add-code-cleaning to feature/code-formatting April 21, 2026 10:10
@lfoppiano lfoppiano merged commit c702a5b into feature/code-formatting Apr 21, 2026
0 of 4 checks passed
@lfoppiano lfoppiano deleted the feature/rewrite-overloaded branch April 21, 2026 10:15
@lfoppiano lfoppiano mentioned this pull request Apr 21, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants