feat: rewrite toString and other oveloaded methods by lfoppiano · Pull Request #1401 · grobidOrg/grobid

lfoppiano · 2026-03-22T08:53:58Z

Rewrite all the overloaded methods using standard libraries.
We ignore the toString that are actually parsed from other methods /utilities.

Copilot

Pull request overview

This PR is a broad refactor/release-prep update that (a) standardizes many toString/equals/hashCode implementations using Apache Commons builders, (b) refactors training entrypoints and adds a centralized trainer registry, and (c) updates service/UI, Crossref handling, docs, Docker/CI, and shipped model configs for the 0.9.x line.

Changes:

Standardize object toString/equals/hashCode implementations across core data/layout/tokenization/counters classes.
Refactor trainer wiring (new TrainerRegistry, shared trainAndEvaluate helpers, CLI options for custom params/model paths, progress bar behavior).
Update REST/service behavior (BibTeX negotiation tweaks, BibTeX key generation), web UI health/version indicators, Crossref integration, docs, Dockerfiles, and CI workflows.

Reviewed changes

Copilot reviewed 128 out of 195 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
mkdocs.yml	Reorganizes MkDocs navigation and updates GitHub repo URL.
grobid-trainer/src/main/java/org/grobid/trainer/evaluation/EndToEndEvaluation.java	Adds progress bar auto/simple modes and builder-based configuration.
grobid-trainer/src/main/java/org/grobid/trainer/TrainerRunner.java	Refactors CLI: dynamic model list, new Wapiti params and `-modelPath`.
grobid-trainer/src/main/java/org/grobid/trainer/TrainerRegistry.java	New central mapping of model names to trainer factories.
grobid-trainer/src/main/java/org/grobid/trainer/TrainerFactory.java	Removes debug prints.
grobid-trainer/src/main/java/org/grobid/trainer/TableTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/ShorttextTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/SegmentationTrainer.java	Switches flavor-aware `main()` to shared helper.
grobid-trainer/src/main/java/org/grobid/trainer/ReferenceSegmenterTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/PatentParserTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/NameHeaderTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/NameCitationTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/MonographTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/HeaderTrainer.java	Switches flavor-aware `main()` to shared helper.
grobid-trainer/src/main/java/org/grobid/trainer/FundingAcknowledgementTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/FulltextTrainer.java	Switches flavor-aware `main()` to shared helper.
grobid-trainer/src/main/java/org/grobid/trainer/FigureTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/DateTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/CitationTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/ChemicalEntityTrainer.java	Switches `main()` to shared `trainAndEvaluate` and updates signature.
grobid-trainer/src/main/java/org/grobid/trainer/AffiliationAddressTrainer.java	Switches `main()` to shared `trainAndEvaluate`.
grobid-trainer/src/main/java/org/grobid/trainer/AbstractTrainer.java	Adds custom training/eval model paths, parameter overrides, shared `main()` helpers, custom tagger loading.
grobid-trainer/doc/eLife_984.results.*	Removes an outdated evaluation results artifact from repo.
grobid-service/src/main/resources/web/resources/css/style.css	Adds health-indicator styling and positions header container.
grobid-service/src/main/resources/web/grobid/grobid.js	Adds version fetch + periodic health polling and updates UI navigation behavior.
grobid-service/src/main/java/org/grobid/service/util/BibTexMediaType.java	Adds `MEDIA_TYPE_QS` to prefer XML when Accept is missing.
grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessTraining.java	Switches training model selection to `TrainerRegistry`.
grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessString.java	Changes BibTeX key generation for citation endpoints.
grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessFiles.java	Changes BibTeX key generation for file-based endpoints.
grobid-service/src/main/java/org/grobid/service/GrobidRestService.java	Applies `qs` media type for BibTeX endpoints to improve content negotiation defaults.
grobid-service/scripts/run-service.sh	Adds arch detection and adjusts native library path selection.
grobid-home/scripts/run_evaluation.sh	Improves evaluation invocation flags and Python env library path handling.
grobid-home/models/table-BidLSTM_CRF/preprocessor.json	Updates DL model preprocessor vocab mapping.
grobid-home/models/table-BidLSTM_CRF/config.json	Updates DL model config (char vocab size).
grobid-home/models/reference-segmenter-BidLSTM_ChainCRF_FEATURES/config.json	Adjusts DL training config (batch size).
grobid-home/models/reference-segmenter-BidLSTM_CRF_FEATURES/config.json	Adjusts DL training config (batch size).
grobid-home/models/patent-citation-BidLSTM_CRF_FEATURES/preprocessor.json	Updates DL model preprocessor vocab mapping.
grobid-home/models/patent-citation-BidLSTM_CRF_FEATURES/config.json	Updates DL model config (char vocab size, removes ELMo flag).
grobid-home/models/name-citation-BidLSTM_CRF_FEATURES/preprocessor.json	Updates DL model preprocessor vocab mapping.
grobid-home/models/name-citation-BidLSTM_CRF_FEATURES/config.json	Updates DL model config (char vocab size).
grobid-home/models/header-article-light-ref-BidLSTM_ChainCRF_FEATURES/config.json	Removes obsolete ELMo flag.
grobid-home/models/header-article-light-ref-BidLSTM_CRF_FEATURES/config.json	Removes obsolete ELMo flag.
grobid-home/models/header-article-light-BidLSTM_ChainCRF_FEATURES/config.json	Removes obsolete ELMo flag.
grobid-home/models/header-article-light-BidLSTM_CRF_FEATURES/config.json	Removes obsolete ELMo flag.
grobid-home/models/date-BidLSTM_CRF_FEATURES/config.json	Updates DL model config (char vocab size).
grobid-home/models/citation-BidLSTM_ChainCRF_FEATURES/config.json	Updates DL model config (seq length, batch size, vocab size).
grobid-home/models/citation-BidLSTM_CRF_FEATURES/config.json	Updates DL model config (seq length, batch size, vocab size).
grobid-home/config/grobid.yaml	Extends Crossref consolidation settings (rate limiting, post-validation).
grobid-core/src/test/java/org/grobid/core/engines/AuthorParserTest.java	Strengthens tests for `et al` stripping and adds whitespace case.
grobid-core/src/test/java/org/grobid/core/engines/AuthorParserReDoSTest.java	Refactors ReDoS test to test regex directly without model loading.
grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequestTask.java	Updates Crossref response handling to backoff/disable token and parse new headers.
grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequestListener.java	Adds extra response fields and modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequest.java	Updates UA/version, token gating, reads new Crossref headers, handles 429.
grobid-core/src/main/java/org/grobid/core/utilities/counters/impl/CounterImpl.java	Uses Commons builders for equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/counters/impl/CntManagerImpl.java	Uses Commons builders for toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/Triple.java	Uses Commons builders for toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/Pair.java	Uses Commons builders for toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/utilities/GrobidProperties.java	Adds Crossref min-request-interval and post-validation accessors.
grobid-core/src/main/java/org/grobid/core/utilities/GrobidConfig.java	Adds Crossref config fields for interval and post-validation.
grobid-core/src/main/java/org/grobid/core/utilities/ElementCounter.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/tokenization/TaggingTokenCluster.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/tokenization/LabeledTokensContainer.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/lexicon/Lexicon.java	Fixes URL destination/token offset alignment and clarifies comment.
grobid-core/src/main/java/org/grobid/core/layout/PDFAnnotation.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/layout/LayoutToken.java	Modernizes `hashCode()`.
grobid-core/src/main/java/org/grobid/core/layout/GraphicObject.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/layout/BoundingBox.java	Modernizes equals/hashCode.
grobid-core/src/main/java/org/grobid/core/jni/JEPThreadPoolClassifier.java	Sets env for legacy Keras 2 and imports `tf_keras`.
grobid-core/src/main/java/org/grobid/core/jni/JEPThreadPool.java	Sets env for legacy Keras 2 and imports `tf_keras`.
grobid-core/src/main/java/org/grobid/core/engines/tagging/WapitiTagger.java	Adds constructor to load a model from a `File`.
grobid-core/src/main/java/org/grobid/core/engines/tagging/TaggerFactory.java	Adds `getTaggerFromPath` for explicit model file loading.
grobid-core/src/main/java/org/grobid/core/engines/citations/LabeledReferenceResult.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/engines/AuthorParser.java	Avoids repeated `trim()` and ensures consistent regex matching input.
grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java	Adds `status` and `source` attributes on `<biblStruct>`.
grobid-core/src/main/java/org/grobid/core/document/DocumentPointer.java	Modernizes toString/equals/hashCode.
grobid-core/src/main/java/org/grobid/core/document/DocumentPiece.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/document/DocumentNode.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/Person.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/PatentItem.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/Keyword.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/Funding.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/Funder.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/Date.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/BibDataSet.java	Modernizes `toString()`.
grobid-core/src/main/java/org/grobid/core/data/Affiliation.java	Modernizes `toString()`.
gradle.properties	Bumps project version to `0.9.1-SNAPSHOT`.
doc/index.md	Restructures docs landing page into clearer sections.
doc/getting_started.md	Adds upgrade tip and updates links/wording and Docker tags.
doc/benchmarks/flavors/article_light/benchmaking-pmc.md	Updates benchmark numbers and formatting.
doc/benchmarks/flavors/article_light/benchmaking-plos.md	Updates benchmark numbers and formatting.
doc/benchmarks/flavors/article_light/benchmaking-elife.md	Updates benchmark numbers and formatting.
doc/benchmarks/flavors/article_light/benchmaking-biorxiv.md	Updates benchmark numbers and formatting.
doc/benchmarks/Benchmarking.md	Fixes link paths.
doc/Upgrading.md	Adds new upgrade guide (notably 0.9.0).
doc/Training-the-models-of-Grobid.md	Updates build command and adds extensive FAQ section.
doc/References.md	Updates repo links to new GitHub org.
doc/Principles.md	Fixes benchmarking links/paths.
doc/Introduction.md	Updates badges/links and Crossref references.
doc/Install-Grobid.md	Updates versions, URLs, and build command.
doc/Grobid-service.md	Updates versions, health/version docs, and adds clients/output format notes.
doc/Grobid-java-library.md	Updates version numbers and examples.
doc/Grobid-docker.md	Updates Docker tags/commands and links.
doc/End-to-end-evaluation.md	Updates dataset hosting info/links and benchmarking links.
doc/Deep-Learning-models.md	Updates DL stack guidance (DeLFT/TF/Python/ARM notes).
doc/Coordinates-in-PDF.md	Adds FAQ section and improves wording.
doc/Configuration.md	Fixes internal links to local docs.
build.gradle	Updates plugin/deps, adds Spotless, adds git revision expansion and progressbar system prop, adjusts release branching rule.
Readme.md	Updates badges/links, runtime requirements, citation snippet, and upgrade pointer.
Dockerfile.evaluation	Updates TF/DeLFT/JEP/Python versions and LD_LIBRARY_PATH handling.
Dockerfile.delft	Major rebuild for JDK21 + Python 3.11 + TF 2.17 + multi-arch handling.
Dockerfile.crf	Updates native lib handling and LD_LIBRARY_PATH for multi-arch.
CITATION.cff	Updates repository URL, version fields, and citation metadata.
CHANGELOG.md	Adds 0.9.0 release notes.
.gitignore	Ignores Python cache artifacts.
.github/workflows/ci-build-unstable.yml	Adds Spotless check (non-blocking), fetches tags, and passes computed version into Docker builds; triggers HF reboot.
.github/workflows/ci-build-tag-custom.yml	Retags Docker images digest-preservingly with Buildx imagetools and safer auth handling.
.github/workflows/ci-build-manual-onnx.yml	Fetches tags, computes version, passes build args.
.github/workflows/ci-build-manual-full.yml	Fetches tags, computes version, passes build args, clarifies summary.
.github/workflows/ci-build-manual-eval.yml	Fetches tags, computes version, passes build args.
.github/workflows/ci-build-manual-crf.yml	Splits amd64/arm64 builds, merges manifest, computes version, improves summary.
.editorconfig	Harmonizes formatting defaults (final newline, trimming, indent rules).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T16:55:36Z

@@ -108,12 +94,48 @@ public static void main(String[] args) {
            } else if (args[i].equals("-i")) {
                incremental = true;

+            } else if (args[i].equals("-epsilon")) {
+                if (i + 1 == args.length) {
+                    throw new IllegalStateException("Missing epsilon value. ");
+                }
+                try {
+                    epsilon = Double.parseDouble(args[i + 1]);
+                } catch (Exception e) {
+                    throw new IllegalStateException("Invalid epsilon value: " + args[i + 1]);
+                }
+
+            } else if (args[i].equals("-w")) {
+                if (i + 1 == args.length) {
+                    throw new IllegalStateException("Missing window value. ");
+                }
+                try {
+                    window = Integer.parseInt(args[i + 1]);
+                } catch (Exception e) {
+                    throw new IllegalStateException("Invalid window value: " + args[i + 1]);
+                }
+
+            } else if (args[i].equals("-maxIter")) {
+                if (i + 1 == args.length) {
+                    throw new IllegalStateException("Missing nbMaxIterations value. ");
+                }
+                try {
+                    nbMaxIterations = Integer.parseInt(args[i + 1]);
+                } catch (Exception e) {
+                    throw new IllegalStateException("Invalid nbMaxIterations value: " + args[i + 1]);
+                }
+
+            } else if (args[i].equals("-modelPath")) {
+                if (i + 1 == args.length) {
+                    throw new IllegalStateException("Missing model output path value. ");
+                }
+                outputModelFilePath = args[i + 1];
+


Argument parsing consumes the value at args[i+1] for flags like -gH/-s/-n/-epsilon/-w/-maxIter/-modelPath but does not advance the loop index. This means the value token is processed again on the next iteration and can be misinterpreted if it matches another flag (or starts with '-'), leading to incorrect parsing. Consider incrementing i after consuming a flag value (or switching to a dedicated CLI parser) so each option/value pair is handled atomically.

Copilot · 2026-04-18T16:55:36Z

    protected GenericTagger getTagger() {
        if (tagger == null) {
-            tagger = TaggerFactory.getTagger(model);
+            tagger = (evaluationModelPath != null)
+                ? TaggerFactory.getTaggerFromPath(evaluationModelPath, GrobidProperties.getGrobidEngine(model))
+                : TaggerFactory.getTagger(model);


When evaluationModelPath is set, getTagger() routes to TaggerFactory.getTaggerFromPath(...), which currently only supports the Wapiti engine and throws for DeLFT/others. This makes the new -modelPath option fail at runtime for any model configured with a non-Wapiti engine. Either validate earlier (and fail with a clear message) or extend TaggerFactory.getTaggerFromPath to support the other engines used by trainers.

Copilot · 2026-04-18T16:55:37Z

 				response = Response.status(Status.NO_CONTENT).build();
 			} else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
 				response = Response.status(Status.OK)
-							.entity(biblioItem.toBibTeX("-1", config))
+							.entity(biblioItem.toBibTeX(biblioItem.generateBibTeXKey(), config))
 							.header(HttpHeaders.CONTENT_TYPE, BibTexMediaType.MEDIA_TYPE + "; charset=UTF-8")
 							.build();


Switching the BibTeX key from a guaranteed-unique placeholder ("-1") to biblioItem.generateBibTeXKey() can produce duplicate keys across requests (and even within a single response), because the key is derived from surname/year/title and has no disambiguation. This can result in invalid/ambiguous BibTeX output. Consider keeping a deterministic per-response disambiguation (e.g., append a counter when a key repeats) or preserve the previous unique key behavior for single-citation endpoints.

Copilot · 2026-04-18T16:55:37Z

 				response = Response.status(Status.NO_CONTENT).build();
 			} else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
 				StringBuilder responseContent = new StringBuilder();
-				int n = 0;
 				for(BiblioItem biblioItem : biblioItems) {
-					responseContent.append(biblioItem.toBibTeX(""+n, config));
+					responseContent.append(biblioItem.toBibTeX(biblioItem.generateBibTeXKey(), config));
 					responseContent.append("\n");


For citation lists, replacing the per-item numeric key ("0", "1", ...) with generateBibTeXKey() can yield duplicate BibTeX entry keys within the same response, which breaks BibTeX consumers. Consider tracking used keys within the loop and appending a suffix/counter when collisions occur (or keep the previous index-based keying for list responses).

Copilot · 2026-04-18T16:55:37Z

                response = Response.status(Response.Status.NO_CONTENT).build();
            } else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
                response = Response.status(Response.Status.OK)
-                    .entity(result.toBibTeX("-1"))
+                    .entity(result.toBibTeX(result.generateBibTeXKey()))
                    .header(HttpHeaders.CONTENT_TYPE, BibTexMediaType.MEDIA_TYPE + "; charset=UTF-8")
                    .build();


Replacing the BibTeX key "-1" with result.generateBibTeXKey() removes the previous guaranteed-unique placeholder and can cause key collisions across outputs (and makes output less predictable for callers). If the intent is to provide stable keys, consider adding collision-avoidance (e.g., suffixing) or documenting the new behavior, because BibTeX requires unique entry keys within a file.

Copilot · 2026-04-18T16:55:37Z

            } else if (expectedResponseType == ExpectedResponseType.BIBTEX) {
                StringBuilder result = new StringBuilder();
                GrobidAnalysisConfig config = new GrobidAnalysisConfig.GrobidAnalysisConfigBuilder().includeRawCitations(includeRawCitations).build();
-                int p = 0;
                for (BibDataSet res : bibDataSetList) {
-                    result.append(res.getResBib().toBibTeX(Integer.toString(p), config));
+                    BiblioItem bib = res.getResBib();
+                    result.append(bib.toBibTeX(bib.generateBibTeXKey(), config));
                    result.append("\n");


For references lists, using bib.generateBibTeXKey() for every entry can generate duplicate keys within the same response, because the key derivation does not include any disambiguation. This can make the returned BibTeX invalid for downstream tools. Consider de-duplicating keys in the loop (append a counter/suffix on collision) or keep the previous index-based keys for list responses.

Copilot · 2026-04-18T16:55:37Z

+ARCH=$(uname -m)
+if [ "$ARCH" = "aarch64" ]; then
+    LIB_DIR="../grobid-home/lib/lin_arm-64"
+else
+    LIB_DIR="../grobid-home/lib/lin-64"
+fi
+
+java -Djava.library.path="$LIB_DIR" -jar build/libs/grobid-service-*-onejar.jar server $config || exit $?


run-service.sh used to include the grobid-home JEP native library directory in -Djava.library.path, but now only points to the architecture-specific lib directory. If the JEP JNI library is still expected under "$LIB_DIR/jep" in non-Docker installs, this change will break Deep Learning (JEP) initialization. Consider adding the JEP subdirectory back (and/or incorporating the active Python env's JEP library path) so both Wapiti and JEP natives are discoverable.

Copilot · 2026-04-18T16:55:38Z

    echo "DRY: would write log to ${log_dst}"
  else
    # execute gradlew from the directory where the script was invoked (START_PWD)
-    (cd "${START_PWD}" && "${cmd[@]}") 2>&1 | tee "${log_dst}"
+    (cd "${START_PWD}" && "${cmd[@]}") | tee "${log_dst}"
    exit_code=${PIPESTATUS[0]}

    if [ $exit_code -ne 0 ]; then


The evaluation script no longer redirects stderr into the tee pipeline (it removed 2>&1). This means stack traces and error diagnostics may not be captured in the log file, making failures harder to debug and potentially hiding important warnings. Consider restoring stderr redirection (or redirecting stderr to a separate log) while still keeping the console settings that avoid progress-bar/TTY issues.

… eclipse-formatter.xml

Copilot

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T17:22:14Z

+                    .append("status", status)
+                    .append("interval", interval)
+                    .append("limitIterations", limitIterations)
+                    .append("results", results)


Response.toString() now appends the entire results list, which can create very large log lines (and potentially expensive nested toString() calls). Previously this was summarized via results.size(). Consider logging only a count (and maybe a small sample) instead of the full list, and include errorMessage/errorException if the intent is diagnostics.

Suggested change

.append("results", results)

.append("resultsCount", CollectionUtils.isEmpty(results) ? 0 : results.size())

.append("errorMessage", errorMessage)

.append("errorException", errorException)

Copilot · 2026-04-18T17:22:14Z

+                .append("classCounters", classCounters)
+                .append("strCnts", strCnts)


CntManagerImpl.toString() now prints classCounters/strCnts maps directly, which will emit CounterImpl@<hash> object identities rather than the actual counter values (and can become very large). The previous implementation converted counters to Map<String, Long> and included metrics; consider switching this to append getAllCounters() (or a dedicated representation) so the output remains value-focused and stable.

Suggested change

.append("classCounters", classCounters)

.append("strCnts", strCnts)

.append("counters", getAllCounters())

.append("metrics", metrics)

Signed-off-by: Luca Foppiano <luca@foppiano.org>

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T09:41:34Z

    @Override
    public String toString() {
-        return "BiblioItem{" +
-                "submission_date='" + submission_date + '\'' +
-                ", download_date='" + download_date + '\'' +
-                ", server_date='" + server_date + '\'' +
-                ", languageUtilities=" + languageUtilities +
-                ", item=" + item +
-                ", parentItem=" + parentItem +
-                ", ISBN13='" + ISBN13 + '\'' +
-                ", ISBN10='" + ISBN10 + '\'' +
-                ", title='" + title + '\'' +
-                ", publisher='" + publisher + '\'' +
-                ", nbPages=" + nbPages +
-                ", edition='" + edition + '\'' +
-                ", language='" + language + '\'' +
-                ", subtitle='" + subtitle + '\'' +
-                ", publication_date='" + publication_date + '\'' +
-                ", normalized_publication_date=" + normalized_publication_date +
-                ", editors='" + editors + '\'' +
-                ", publisher_website='" + publisher_website + '\'' +
-                ", serie='" + serie + '\'' +
-                ", ISSN='" + ISSN + '\'' +
-                ", ISSNe='" + ISSNe + '\'' +
-                ", volume='" + volume + '\'' +
-                ", number='" + number + '\'' +
-                ", month='" + month + '\'' +
-                ", support_type='" + support_type + '\'' +
-                ", version='" + version + '\'' +
-                ", smallImageURL='" + smallImageURL + '\'' +
-                ", largeImageURL='" + largeImageURL + '\'' +
-                ", publisherPlace='" + publisherPlace + '\'' +
-                ", review='" + review + '\'' +
-                ", keywords=" + keywords +
-                ", subjects=" + subjects +
-                ", categories='" + categories + '\'' +
-                ", type='" + type + '\'' +
-                ", typeDescription='" + typeDescription + '\'' +
-                ", book_type='" + book_type + '\'' +
-                ", DOI='" + doi + '\'' +
-                ", arXivId='" + arXivId + '\'' +
-                ", PMID='" + PMID + '\'' +
-                ", PMCID='" + PMCID + '\'' +
-                ", PII='" + PII + '\'' +
-                ", HALId='" + halId + '\'' +
-                ", ark='" + ark + '\'' +
-                ", istexId='" + istexId + '\'' +
-                ", inDOI='" + inDOI + '\'' +
-                ", abstract_='" + abstract_ + '\'' +
-                ", authors='" + authors + '\'' +
-                ", firstAuthorSurname='" + firstAuthorSurname + '\'' +
-                ", location='" + location + '\'' +
-                ", bookTitle='" + bookTitle + '\'' +
-                ", serieTitle='" + serieTitle + '\'' +
-                ", pageRange='" + pageRange + '\'' +
-                ", journal='" + journal + '\'' +
-                ", volumeBlock='" + volumeBlock + '\'' +
-                ", institution='" + institution + '\'' +
-                ", note='" + note + '\'' +
-                ", affiliation='" + affiliation + '\'' +
-                ", address='" + address + '\'' +
-                ", country='" + country + '\'' +
-                ", town='" + town + '\'' +
-                ", email='" + email + '\'' +
-                ", pubnum='" + pubnum + '\'' +
-                ", keyword='" + keyword + '\'' +
-                ", phone='" + phone + '\'' +
-                ", degree='" + degree + '\'' +
-                ", web='" + web + '\'' +
-                ", issue='" + issue + '\'' +
-                ", journal_abbrev='" + journal_abbrev + '\'' +
-                ", event='" + event + '\'' +
-                ", abstractHeader='" + abstractHeader + '\'' +
-                ", day='" + day + '\'' +
-                ", locationPublisher='" + locationPublisher + '\'' +
-                ", dedication='" + dedication + '\'' +
-                ", submission='" + submission + '\'' +
-                ", english_title='" + english_title + '\'' +
-                ", url='" + url + '\'' +
-                ", oaUrl='" + oaUrl + '\'' +
-                ", uri='" + uri + '\'' +
-                ", confidence='" + confidence + '\'' +
-                ", conf=" + conf +
-                ", e_year='" + e_year + '\'' +
-                ", e_month='" + e_month + '\'' +
-                ", e_day='" + e_day + '\'' +
-                ", s_year='" + s_year + '\'' +
-                ", s_month='" + s_month + '\'' +
-                ", s_day='" + s_day + '\'' +
-                ", d_year='" + d_year + '\'' +
-                ", d_month='" + d_month + '\'' +
-                ", d_day='" + d_day + '\'' +
-                ", a_year='" + a_year + '\'' +
-                ", a_month='" + a_month + '\'' +
-                ", a_day='" + a_day + '\'' +
-                ", authorList=" + authorList +
-                ", editorList=" + editorList +
-                ", affiliationList=" + affiliationList +
-                ", addressList=" + addressList +
-                ", emailList=" + emailList +
-                ", webList=" + webList +
-                ", phoneList=" + phoneList +
-                ", markers=" + markers +
-                ", fullAuthors=" + fullAuthors +
-                ", fullEditors=" + fullEditors +
-                ", fullAffiliations=" + fullAffiliations +
-                ", reference='" + reference + '\'' +
-                ", copyright='" + copyright + '\'' +
-                ", funding='" + funding + '\'' +
-                ", affiliationAddressBlock='" + affiliationAddressBlock + '\'' +
-                ", articleTitle='" + articleTitle + '\'' +
-                ", beginPage=" + beginPage +
-                ", endPage=" + endPage +
-                ", year='" + year + '\'' +
-                ", authorString='" + authorString + '\'' +
-                ", path='" + path + '\'' +
-                ", collaboration='" + collaboration + '\'' +
-                ", postProcessEditors=" + postProcessEditors +
-                ", crossrefError=" + crossrefError +
-                ", normalized_submission_date=" + normalized_submission_date +
-                ", normalized_download_date=" + normalized_download_date +
-                ", originalAffiliation='" + originalAffiliation + '\'' +
-                ", originalAbstract='" + originalAbstract + '\'' +
-                ", originalTitle='" + originalTitle + '\'' +
-                ", originalAuthors='" + originalAuthors + '\'' +
-                ", originalEditors='" + originalEditors + '\'' +
-                ", originalAddress='" + originalAddress + '\'' +
-                ", originalNote='" + originalNote + '\'' +
-                ", originalKeyword='" + originalKeyword + '\'' +
-                ", originalVolumeBlock='" + originalVolumeBlock + '\'' +
-                ", originalJournal='" + originalJournal + '\'' +
-                ", workingGroup='" + workingGroup + '\'' +
-                ", documentType='" + documentType + '\'' +
-                '}';
+        return ReflectionToStringBuilder.toString(this, ToStringStyle.SHORT_PREFIX_STYLE);
    }


Switching to ReflectionToStringBuilder.toString(this, ...) will include all non-static fields (e.g., labeledTokens, working token copies, authorEmailAssigner, etc.). This can create extremely large/expensive log strings and may unintentionally dump internal objects/PII when BiblioItem is logged or concatenated into error messages. Prefer an explicit ToStringBuilder that whitelists a small set of stable, safe fields (or keep the previous curated list).

lfoppiano force-pushed the feature/rewrite-overloaded branch from 996e66e to 4d93160 Compare April 18, 2026 16:49

lfoppiano requested a review from Copilot April 18, 2026 16:49

Copilot started reviewing on behalf of lfoppiano April 18, 2026 16:50 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

lfoppiano added 3 commits April 18, 2026 19:02

chore: enhance code formatting and add Spotless for code cleanup

930b4ae

chore: add code formatting checks and update Eclipse formatter settings

f8d58fe

chore: update import order and formatter settings in build.gradle and…

2a0f060

… eclipse-formatter.xml

lfoppiano force-pushed the feature/add-code-cleaning branch from 7e20453 to 2a0f060 Compare April 18, 2026 17:03

lfoppiano requested a review from Copilot April 18, 2026 17:17

Copilot started reviewing on behalf of lfoppiano April 18, 2026 17:19 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

feat: rewrite toString and other oveloaded methods

b84bff0

Signed-off-by: Luca Foppiano <luca@foppiano.org>

lfoppiano force-pushed the feature/rewrite-overloaded branch from 4d93160 to b84bff0 Compare April 19, 2026 06:38

feat: rewrite additional toString methods

e6fd4e7

lfoppiano requested a review from Copilot April 21, 2026 09:35

Copilot AI reviewed Apr 21, 2026

View reviewed changes

fix: address issue with Block.java

ce9a9bc

Base automatically changed from feature/add-code-cleaning to feature/code-formatting April 21, 2026 10:10

Merge branch 'feature/code-formatting' into feature/rewrite-overloaded

3ffc93e

lfoppiano merged commit c702a5b into feature/code-formatting Apr 21, 2026
0 of 4 checks passed

lfoppiano deleted the feature/rewrite-overloaded branch April 21, 2026 10:15

lfoppiano mentioned this pull request Apr 21, 2026

Implement code formatting #1431

Merged

3 tasks

-                    .append("results", results)
+                    .append("resultsCount", CollectionUtils.isEmpty(results) ? 0 : results.size())
+                    .append("errorMessage", errorMessage)
+                    .append("errorException", errorException)

		.append("classCounters", classCounters)
		.append("strCnts", strCnts)

Conversation

lfoppiano commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants