feat: rewrite toString and other oveloaded methods#1401
feat: rewrite toString and other oveloaded methods#1401lfoppiano merged 7 commits intofeature/code-formattingfrom
Conversation
996e66e to
4d93160
Compare
There was a problem hiding this comment.
Pull request overview
This PR is a broad refactor/release-prep update that (a) standardizes many toString/equals/hashCode implementations using Apache Commons builders, (b) refactors training entrypoints and adds a centralized trainer registry, and (c) updates service/UI, Crossref handling, docs, Docker/CI, and shipped model configs for the 0.9.x line.
Changes:
- Standardize object
toString/equals/hashCodeimplementations across core data/layout/tokenization/counters classes. - Refactor trainer wiring (new
TrainerRegistry, sharedtrainAndEvaluatehelpers, CLI options for custom params/model paths, progress bar behavior). - Update REST/service behavior (BibTeX negotiation tweaks, BibTeX key generation), web UI health/version indicators, Crossref integration, docs, Dockerfiles, and CI workflows.
Reviewed changes
Copilot reviewed 128 out of 195 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Reorganizes MkDocs navigation and updates GitHub repo URL. |
| grobid-trainer/src/main/java/org/grobid/trainer/evaluation/EndToEndEvaluation.java | Adds progress bar auto/simple modes and builder-based configuration. |
| grobid-trainer/src/main/java/org/grobid/trainer/TrainerRunner.java | Refactors CLI: dynamic model list, new Wapiti params and -modelPath. |
| grobid-trainer/src/main/java/org/grobid/trainer/TrainerRegistry.java | New central mapping of model names to trainer factories. |
| grobid-trainer/src/main/java/org/grobid/trainer/TrainerFactory.java | Removes debug prints. |
| grobid-trainer/src/main/java/org/grobid/trainer/TableTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/ShorttextTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/SegmentationTrainer.java | Switches flavor-aware main() to shared helper. |
| grobid-trainer/src/main/java/org/grobid/trainer/ReferenceSegmenterTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/PatentParserTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/NameHeaderTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/NameCitationTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/MonographTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/HeaderTrainer.java | Switches flavor-aware main() to shared helper. |
| grobid-trainer/src/main/java/org/grobid/trainer/FundingAcknowledgementTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/FulltextTrainer.java | Switches flavor-aware main() to shared helper. |
| grobid-trainer/src/main/java/org/grobid/trainer/FigureTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/DateTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/CitationTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/ChemicalEntityTrainer.java | Switches main() to shared trainAndEvaluate and updates signature. |
| grobid-trainer/src/main/java/org/grobid/trainer/AffiliationAddressTrainer.java | Switches main() to shared trainAndEvaluate. |
| grobid-trainer/src/main/java/org/grobid/trainer/AbstractTrainer.java | Adds custom training/eval model paths, parameter overrides, shared main() helpers, custom tagger loading. |
| grobid-trainer/doc/eLife_984.results.* | Removes an outdated evaluation results artifact from repo. |
| grobid-service/src/main/resources/web/resources/css/style.css | Adds health-indicator styling and positions header container. |
| grobid-service/src/main/resources/web/grobid/grobid.js | Adds version fetch + periodic health polling and updates UI navigation behavior. |
| grobid-service/src/main/java/org/grobid/service/util/BibTexMediaType.java | Adds MEDIA_TYPE_QS to prefer XML when Accept is missing. |
| grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessTraining.java | Switches training model selection to TrainerRegistry. |
| grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessString.java | Changes BibTeX key generation for citation endpoints. |
| grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessFiles.java | Changes BibTeX key generation for file-based endpoints. |
| grobid-service/src/main/java/org/grobid/service/GrobidRestService.java | Applies qs media type for BibTeX endpoints to improve content negotiation defaults. |
| grobid-service/scripts/run-service.sh | Adds arch detection and adjusts native library path selection. |
| grobid-home/scripts/run_evaluation.sh | Improves evaluation invocation flags and Python env library path handling. |
| grobid-home/models/table-BidLSTM_CRF/preprocessor.json | Updates DL model preprocessor vocab mapping. |
| grobid-home/models/table-BidLSTM_CRF/config.json | Updates DL model config (char vocab size). |
| grobid-home/models/reference-segmenter-BidLSTM_ChainCRF_FEATURES/config.json | Adjusts DL training config (batch size). |
| grobid-home/models/reference-segmenter-BidLSTM_CRF_FEATURES/config.json | Adjusts DL training config (batch size). |
| grobid-home/models/patent-citation-BidLSTM_CRF_FEATURES/preprocessor.json | Updates DL model preprocessor vocab mapping. |
| grobid-home/models/patent-citation-BidLSTM_CRF_FEATURES/config.json | Updates DL model config (char vocab size, removes ELMo flag). |
| grobid-home/models/name-citation-BidLSTM_CRF_FEATURES/preprocessor.json | Updates DL model preprocessor vocab mapping. |
| grobid-home/models/name-citation-BidLSTM_CRF_FEATURES/config.json | Updates DL model config (char vocab size). |
| grobid-home/models/header-article-light-ref-BidLSTM_ChainCRF_FEATURES/config.json | Removes obsolete ELMo flag. |
| grobid-home/models/header-article-light-ref-BidLSTM_CRF_FEATURES/config.json | Removes obsolete ELMo flag. |
| grobid-home/models/header-article-light-BidLSTM_ChainCRF_FEATURES/config.json | Removes obsolete ELMo flag. |
| grobid-home/models/header-article-light-BidLSTM_CRF_FEATURES/config.json | Removes obsolete ELMo flag. |
| grobid-home/models/date-BidLSTM_CRF_FEATURES/config.json | Updates DL model config (char vocab size). |
| grobid-home/models/citation-BidLSTM_ChainCRF_FEATURES/config.json | Updates DL model config (seq length, batch size, vocab size). |
| grobid-home/models/citation-BidLSTM_CRF_FEATURES/config.json | Updates DL model config (seq length, batch size, vocab size). |
| grobid-home/config/grobid.yaml | Extends Crossref consolidation settings (rate limiting, post-validation). |
| grobid-core/src/test/java/org/grobid/core/engines/AuthorParserTest.java | Strengthens tests for et al stripping and adds whitespace case. |
| grobid-core/src/test/java/org/grobid/core/engines/AuthorParserReDoSTest.java | Refactors ReDoS test to test regex directly without model loading. |
| grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequestTask.java | Updates Crossref response handling to backoff/disable token and parse new headers. |
| grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequestListener.java | Adds extra response fields and modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/utilities/crossref/CrossrefRequest.java | Updates UA/version, token gating, reads new Crossref headers, handles 429. |
| grobid-core/src/main/java/org/grobid/core/utilities/counters/impl/CounterImpl.java | Uses Commons builders for equals/hashCode. |
| grobid-core/src/main/java/org/grobid/core/utilities/counters/impl/CntManagerImpl.java | Uses Commons builders for toString/equals/hashCode. |
| grobid-core/src/main/java/org/grobid/core/utilities/Triple.java | Uses Commons builders for toString/equals/hashCode. |
| grobid-core/src/main/java/org/grobid/core/utilities/Pair.java | Uses Commons builders for toString/equals/hashCode. |
| grobid-core/src/main/java/org/grobid/core/utilities/GrobidProperties.java | Adds Crossref min-request-interval and post-validation accessors. |
| grobid-core/src/main/java/org/grobid/core/utilities/GrobidConfig.java | Adds Crossref config fields for interval and post-validation. |
| grobid-core/src/main/java/org/grobid/core/utilities/ElementCounter.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/tokenization/TaggingTokenCluster.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/tokenization/LabeledTokensContainer.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/lexicon/Lexicon.java | Fixes URL destination/token offset alignment and clarifies comment. |
| grobid-core/src/main/java/org/grobid/core/layout/PDFAnnotation.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/layout/LayoutToken.java | Modernizes hashCode(). |
| grobid-core/src/main/java/org/grobid/core/layout/GraphicObject.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/layout/BoundingBox.java | Modernizes equals/hashCode. |
| grobid-core/src/main/java/org/grobid/core/jni/JEPThreadPoolClassifier.java | Sets env for legacy Keras 2 and imports tf_keras. |
| grobid-core/src/main/java/org/grobid/core/jni/JEPThreadPool.java | Sets env for legacy Keras 2 and imports tf_keras. |
| grobid-core/src/main/java/org/grobid/core/engines/tagging/WapitiTagger.java | Adds constructor to load a model from a File. |
| grobid-core/src/main/java/org/grobid/core/engines/tagging/TaggerFactory.java | Adds getTaggerFromPath for explicit model file loading. |
| grobid-core/src/main/java/org/grobid/core/engines/citations/LabeledReferenceResult.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/engines/AuthorParser.java | Avoids repeated trim() and ensures consistent regex matching input. |
| grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java | Adds status and source attributes on <biblStruct>. |
| grobid-core/src/main/java/org/grobid/core/document/DocumentPointer.java | Modernizes toString/equals/hashCode. |
| grobid-core/src/main/java/org/grobid/core/document/DocumentPiece.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/document/DocumentNode.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/Person.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/PatentItem.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/Keyword.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/Funding.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/Funder.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/Date.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/BibDataSet.java | Modernizes toString(). |
| grobid-core/src/main/java/org/grobid/core/data/Affiliation.java | Modernizes toString(). |
| gradle.properties | Bumps project version to 0.9.1-SNAPSHOT. |
| doc/index.md | Restructures docs landing page into clearer sections. |
| doc/getting_started.md | Adds upgrade tip and updates links/wording and Docker tags. |
| doc/benchmarks/flavors/article_light/benchmaking-pmc.md | Updates benchmark numbers and formatting. |
| doc/benchmarks/flavors/article_light/benchmaking-plos.md | Updates benchmark numbers and formatting. |
| doc/benchmarks/flavors/article_light/benchmaking-elife.md | Updates benchmark numbers and formatting. |
| doc/benchmarks/flavors/article_light/benchmaking-biorxiv.md | Updates benchmark numbers and formatting. |
| doc/benchmarks/Benchmarking.md | Fixes link paths. |
| doc/Upgrading.md | Adds new upgrade guide (notably 0.9.0). |
| doc/Training-the-models-of-Grobid.md | Updates build command and adds extensive FAQ section. |
| doc/References.md | Updates repo links to new GitHub org. |
| doc/Principles.md | Fixes benchmarking links/paths. |
| doc/Introduction.md | Updates badges/links and Crossref references. |
| doc/Install-Grobid.md | Updates versions, URLs, and build command. |
| doc/Grobid-service.md | Updates versions, health/version docs, and adds clients/output format notes. |
| doc/Grobid-java-library.md | Updates version numbers and examples. |
| doc/Grobid-docker.md | Updates Docker tags/commands and links. |
| doc/End-to-end-evaluation.md | Updates dataset hosting info/links and benchmarking links. |
| doc/Deep-Learning-models.md | Updates DL stack guidance (DeLFT/TF/Python/ARM notes). |
| doc/Coordinates-in-PDF.md | Adds FAQ section and improves wording. |
| doc/Configuration.md | Fixes internal links to local docs. |
| build.gradle | Updates plugin/deps, adds Spotless, adds git revision expansion and progressbar system prop, adjusts release branching rule. |
| Readme.md | Updates badges/links, runtime requirements, citation snippet, and upgrade pointer. |
| Dockerfile.evaluation | Updates TF/DeLFT/JEP/Python versions and LD_LIBRARY_PATH handling. |
| Dockerfile.delft | Major rebuild for JDK21 + Python 3.11 + TF 2.17 + multi-arch handling. |
| Dockerfile.crf | Updates native lib handling and LD_LIBRARY_PATH for multi-arch. |
| CITATION.cff | Updates repository URL, version fields, and citation metadata. |
| CHANGELOG.md | Adds 0.9.0 release notes. |
| .gitignore | Ignores Python cache artifacts. |
| .github/workflows/ci-build-unstable.yml | Adds Spotless check (non-blocking), fetches tags, and passes computed version into Docker builds; triggers HF reboot. |
| .github/workflows/ci-build-tag-custom.yml | Retags Docker images digest-preservingly with Buildx imagetools and safer auth handling. |
| .github/workflows/ci-build-manual-onnx.yml | Fetches tags, computes version, passes build args. |
| .github/workflows/ci-build-manual-full.yml | Fetches tags, computes version, passes build args, clarifies summary. |
| .github/workflows/ci-build-manual-eval.yml | Fetches tags, computes version, passes build args. |
| .github/workflows/ci-build-manual-crf.yml | Splits amd64/arm64 builds, merges manifest, computes version, improves summary. |
| .editorconfig | Harmonizes formatting defaults (final newline, trimming, indent rules). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -108,12 +94,48 @@ public static void main(String[] args) { | |||
| } else if (args[i].equals("-i")) { | |||
| incremental = true; | |||
|
|
|||
| } else if (args[i].equals("-epsilon")) { | |||
| if (i + 1 == args.length) { | |||
| throw new IllegalStateException("Missing epsilon value. "); | |||
| } | |||
| try { | |||
| epsilon = Double.parseDouble(args[i + 1]); | |||
| } catch (Exception e) { | |||
| throw new IllegalStateException("Invalid epsilon value: " + args[i + 1]); | |||
| } | |||
|
|
|||
| } else if (args[i].equals("-w")) { | |||
| if (i + 1 == args.length) { | |||
| throw new IllegalStateException("Missing window value. "); | |||
| } | |||
| try { | |||
| window = Integer.parseInt(args[i + 1]); | |||
| } catch (Exception e) { | |||
| throw new IllegalStateException("Invalid window value: " + args[i + 1]); | |||
| } | |||
|
|
|||
| } else if (args[i].equals("-maxIter")) { | |||
| if (i + 1 == args.length) { | |||
| throw new IllegalStateException("Missing nbMaxIterations value. "); | |||
| } | |||
| try { | |||
| nbMaxIterations = Integer.parseInt(args[i + 1]); | |||
| } catch (Exception e) { | |||
| throw new IllegalStateException("Invalid nbMaxIterations value: " + args[i + 1]); | |||
| } | |||
|
|
|||
| } else if (args[i].equals("-modelPath")) { | |||
| if (i + 1 == args.length) { | |||
| throw new IllegalStateException("Missing model output path value. "); | |||
| } | |||
| outputModelFilePath = args[i + 1]; | |||
|
|
|||
There was a problem hiding this comment.
Argument parsing consumes the value at args[i+1] for flags like -gH/-s/-n/-epsilon/-w/-maxIter/-modelPath but does not advance the loop index. This means the value token is processed again on the next iteration and can be misinterpreted if it matches another flag (or starts with '-'), leading to incorrect parsing. Consider incrementing i after consuming a flag value (or switching to a dedicated CLI parser) so each option/value pair is handled atomically.
| protected GenericTagger getTagger() { | ||
| if (tagger == null) { | ||
| tagger = TaggerFactory.getTagger(model); | ||
| tagger = (evaluationModelPath != null) | ||
| ? TaggerFactory.getTaggerFromPath(evaluationModelPath, GrobidProperties.getGrobidEngine(model)) | ||
| : TaggerFactory.getTagger(model); |
There was a problem hiding this comment.
When evaluationModelPath is set, getTagger() routes to TaggerFactory.getTaggerFromPath(...), which currently only supports the Wapiti engine and throws for DeLFT/others. This makes the new -modelPath option fail at runtime for any model configured with a non-Wapiti engine. Either validate earlier (and fail with a clear message) or extend TaggerFactory.getTaggerFromPath to support the other engines used by trainers.
| response = Response.status(Status.NO_CONTENT).build(); | ||
| } else if (expectedResponseType == ExpectedResponseType.BIBTEX) { | ||
| response = Response.status(Status.OK) | ||
| .entity(biblioItem.toBibTeX("-1", config)) | ||
| .entity(biblioItem.toBibTeX(biblioItem.generateBibTeXKey(), config)) | ||
| .header(HttpHeaders.CONTENT_TYPE, BibTexMediaType.MEDIA_TYPE + "; charset=UTF-8") | ||
| .build(); |
There was a problem hiding this comment.
Switching the BibTeX key from a guaranteed-unique placeholder ("-1") to biblioItem.generateBibTeXKey() can produce duplicate keys across requests (and even within a single response), because the key is derived from surname/year/title and has no disambiguation. This can result in invalid/ambiguous BibTeX output. Consider keeping a deterministic per-response disambiguation (e.g., append a counter when a key repeats) or preserve the previous unique key behavior for single-citation endpoints.
| response = Response.status(Status.NO_CONTENT).build(); | ||
| } else if (expectedResponseType == ExpectedResponseType.BIBTEX) { | ||
| StringBuilder responseContent = new StringBuilder(); | ||
| int n = 0; | ||
| for(BiblioItem biblioItem : biblioItems) { | ||
| responseContent.append(biblioItem.toBibTeX(""+n, config)); | ||
| responseContent.append(biblioItem.toBibTeX(biblioItem.generateBibTeXKey(), config)); | ||
| responseContent.append("\n"); |
There was a problem hiding this comment.
For citation lists, replacing the per-item numeric key ("0", "1", ...) with generateBibTeXKey() can yield duplicate BibTeX entry keys within the same response, which breaks BibTeX consumers. Consider tracking used keys within the loop and appending a suffix/counter when collisions occur (or keep the previous index-based keying for list responses).
| response = Response.status(Response.Status.NO_CONTENT).build(); | ||
| } else if (expectedResponseType == ExpectedResponseType.BIBTEX) { | ||
| response = Response.status(Response.Status.OK) | ||
| .entity(result.toBibTeX("-1")) | ||
| .entity(result.toBibTeX(result.generateBibTeXKey())) | ||
| .header(HttpHeaders.CONTENT_TYPE, BibTexMediaType.MEDIA_TYPE + "; charset=UTF-8") | ||
| .build(); |
There was a problem hiding this comment.
Replacing the BibTeX key "-1" with result.generateBibTeXKey() removes the previous guaranteed-unique placeholder and can cause key collisions across outputs (and makes output less predictable for callers). If the intent is to provide stable keys, consider adding collision-avoidance (e.g., suffixing) or documenting the new behavior, because BibTeX requires unique entry keys within a file.
| } else if (expectedResponseType == ExpectedResponseType.BIBTEX) { | ||
| StringBuilder result = new StringBuilder(); | ||
| GrobidAnalysisConfig config = new GrobidAnalysisConfig.GrobidAnalysisConfigBuilder().includeRawCitations(includeRawCitations).build(); | ||
| int p = 0; | ||
| for (BibDataSet res : bibDataSetList) { | ||
| result.append(res.getResBib().toBibTeX(Integer.toString(p), config)); | ||
| BiblioItem bib = res.getResBib(); | ||
| result.append(bib.toBibTeX(bib.generateBibTeXKey(), config)); | ||
| result.append("\n"); |
There was a problem hiding this comment.
For references lists, using bib.generateBibTeXKey() for every entry can generate duplicate keys within the same response, because the key derivation does not include any disambiguation. This can make the returned BibTeX invalid for downstream tools. Consider de-duplicating keys in the loop (append a counter/suffix on collision) or keep the previous index-based keys for list responses.
| ARCH=$(uname -m) | ||
| if [ "$ARCH" = "aarch64" ]; then | ||
| LIB_DIR="../grobid-home/lib/lin_arm-64" | ||
| else | ||
| LIB_DIR="../grobid-home/lib/lin-64" | ||
| fi | ||
|
|
||
| java -Djava.library.path="$LIB_DIR" -jar build/libs/grobid-service-*-onejar.jar server $config || exit $? |
There was a problem hiding this comment.
run-service.sh used to include the grobid-home JEP native library directory in -Djava.library.path, but now only points to the architecture-specific lib directory. If the JEP JNI library is still expected under "$LIB_DIR/jep" in non-Docker installs, this change will break Deep Learning (JEP) initialization. Consider adding the JEP subdirectory back (and/or incorporating the active Python env's JEP library path) so both Wapiti and JEP natives are discoverable.
| echo "DRY: would write log to ${log_dst}" | ||
| else | ||
| # execute gradlew from the directory where the script was invoked (START_PWD) | ||
| (cd "${START_PWD}" && "${cmd[@]}") 2>&1 | tee "${log_dst}" | ||
| (cd "${START_PWD}" && "${cmd[@]}") | tee "${log_dst}" | ||
| exit_code=${PIPESTATUS[0]} | ||
|
|
||
| if [ $exit_code -ne 0 ]; then |
There was a problem hiding this comment.
The evaluation script no longer redirects stderr into the tee pipeline (it removed 2>&1). This means stack traces and error diagnostics may not be captured in the log file, making failures harder to debug and potentially hiding important warnings. Consider restoring stderr redirection (or redirecting stderr to a separate log) while still keeping the console settings that avoid progress-bar/TTY issues.
7e20453 to
2a0f060
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 29 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .append("status", status) | ||
| .append("interval", interval) | ||
| .append("limitIterations", limitIterations) | ||
| .append("results", results) |
There was a problem hiding this comment.
Response.toString() now appends the entire results list, which can create very large log lines (and potentially expensive nested toString() calls). Previously this was summarized via results.size(). Consider logging only a count (and maybe a small sample) instead of the full list, and include errorMessage/errorException if the intent is diagnostics.
| .append("results", results) | |
| .append("resultsCount", CollectionUtils.isEmpty(results) ? 0 : results.size()) | |
| .append("errorMessage", errorMessage) | |
| .append("errorException", errorException) |
| .append("classCounters", classCounters) | ||
| .append("strCnts", strCnts) |
There was a problem hiding this comment.
CntManagerImpl.toString() now prints classCounters/strCnts maps directly, which will emit CounterImpl@<hash> object identities rather than the actual counter values (and can become very large). The previous implementation converted counters to Map<String, Long> and included metrics; consider switching this to append getAllCounters() (or a dedicated representation) so the output remains value-focused and stable.
| .append("classCounters", classCounters) | |
| .append("strCnts", strCnts) | |
| .append("counters", getAllCounters()) | |
| .append("metrics", metrics) |
Signed-off-by: Luca Foppiano <luca@foppiano.org>
4d93160 to
b84bff0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @Override | ||
| public String toString() { | ||
| return "BiblioItem{" + | ||
| "submission_date='" + submission_date + '\'' + | ||
| ", download_date='" + download_date + '\'' + | ||
| ", server_date='" + server_date + '\'' + | ||
| ", languageUtilities=" + languageUtilities + | ||
| ", item=" + item + | ||
| ", parentItem=" + parentItem + | ||
| ", ISBN13='" + ISBN13 + '\'' + | ||
| ", ISBN10='" + ISBN10 + '\'' + | ||
| ", title='" + title + '\'' + | ||
| ", publisher='" + publisher + '\'' + | ||
| ", nbPages=" + nbPages + | ||
| ", edition='" + edition + '\'' + | ||
| ", language='" + language + '\'' + | ||
| ", subtitle='" + subtitle + '\'' + | ||
| ", publication_date='" + publication_date + '\'' + | ||
| ", normalized_publication_date=" + normalized_publication_date + | ||
| ", editors='" + editors + '\'' + | ||
| ", publisher_website='" + publisher_website + '\'' + | ||
| ", serie='" + serie + '\'' + | ||
| ", ISSN='" + ISSN + '\'' + | ||
| ", ISSNe='" + ISSNe + '\'' + | ||
| ", volume='" + volume + '\'' + | ||
| ", number='" + number + '\'' + | ||
| ", month='" + month + '\'' + | ||
| ", support_type='" + support_type + '\'' + | ||
| ", version='" + version + '\'' + | ||
| ", smallImageURL='" + smallImageURL + '\'' + | ||
| ", largeImageURL='" + largeImageURL + '\'' + | ||
| ", publisherPlace='" + publisherPlace + '\'' + | ||
| ", review='" + review + '\'' + | ||
| ", keywords=" + keywords + | ||
| ", subjects=" + subjects + | ||
| ", categories='" + categories + '\'' + | ||
| ", type='" + type + '\'' + | ||
| ", typeDescription='" + typeDescription + '\'' + | ||
| ", book_type='" + book_type + '\'' + | ||
| ", DOI='" + doi + '\'' + | ||
| ", arXivId='" + arXivId + '\'' + | ||
| ", PMID='" + PMID + '\'' + | ||
| ", PMCID='" + PMCID + '\'' + | ||
| ", PII='" + PII + '\'' + | ||
| ", HALId='" + halId + '\'' + | ||
| ", ark='" + ark + '\'' + | ||
| ", istexId='" + istexId + '\'' + | ||
| ", inDOI='" + inDOI + '\'' + | ||
| ", abstract_='" + abstract_ + '\'' + | ||
| ", authors='" + authors + '\'' + | ||
| ", firstAuthorSurname='" + firstAuthorSurname + '\'' + | ||
| ", location='" + location + '\'' + | ||
| ", bookTitle='" + bookTitle + '\'' + | ||
| ", serieTitle='" + serieTitle + '\'' + | ||
| ", pageRange='" + pageRange + '\'' + | ||
| ", journal='" + journal + '\'' + | ||
| ", volumeBlock='" + volumeBlock + '\'' + | ||
| ", institution='" + institution + '\'' + | ||
| ", note='" + note + '\'' + | ||
| ", affiliation='" + affiliation + '\'' + | ||
| ", address='" + address + '\'' + | ||
| ", country='" + country + '\'' + | ||
| ", town='" + town + '\'' + | ||
| ", email='" + email + '\'' + | ||
| ", pubnum='" + pubnum + '\'' + | ||
| ", keyword='" + keyword + '\'' + | ||
| ", phone='" + phone + '\'' + | ||
| ", degree='" + degree + '\'' + | ||
| ", web='" + web + '\'' + | ||
| ", issue='" + issue + '\'' + | ||
| ", journal_abbrev='" + journal_abbrev + '\'' + | ||
| ", event='" + event + '\'' + | ||
| ", abstractHeader='" + abstractHeader + '\'' + | ||
| ", day='" + day + '\'' + | ||
| ", locationPublisher='" + locationPublisher + '\'' + | ||
| ", dedication='" + dedication + '\'' + | ||
| ", submission='" + submission + '\'' + | ||
| ", english_title='" + english_title + '\'' + | ||
| ", url='" + url + '\'' + | ||
| ", oaUrl='" + oaUrl + '\'' + | ||
| ", uri='" + uri + '\'' + | ||
| ", confidence='" + confidence + '\'' + | ||
| ", conf=" + conf + | ||
| ", e_year='" + e_year + '\'' + | ||
| ", e_month='" + e_month + '\'' + | ||
| ", e_day='" + e_day + '\'' + | ||
| ", s_year='" + s_year + '\'' + | ||
| ", s_month='" + s_month + '\'' + | ||
| ", s_day='" + s_day + '\'' + | ||
| ", d_year='" + d_year + '\'' + | ||
| ", d_month='" + d_month + '\'' + | ||
| ", d_day='" + d_day + '\'' + | ||
| ", a_year='" + a_year + '\'' + | ||
| ", a_month='" + a_month + '\'' + | ||
| ", a_day='" + a_day + '\'' + | ||
| ", authorList=" + authorList + | ||
| ", editorList=" + editorList + | ||
| ", affiliationList=" + affiliationList + | ||
| ", addressList=" + addressList + | ||
| ", emailList=" + emailList + | ||
| ", webList=" + webList + | ||
| ", phoneList=" + phoneList + | ||
| ", markers=" + markers + | ||
| ", fullAuthors=" + fullAuthors + | ||
| ", fullEditors=" + fullEditors + | ||
| ", fullAffiliations=" + fullAffiliations + | ||
| ", reference='" + reference + '\'' + | ||
| ", copyright='" + copyright + '\'' + | ||
| ", funding='" + funding + '\'' + | ||
| ", affiliationAddressBlock='" + affiliationAddressBlock + '\'' + | ||
| ", articleTitle='" + articleTitle + '\'' + | ||
| ", beginPage=" + beginPage + | ||
| ", endPage=" + endPage + | ||
| ", year='" + year + '\'' + | ||
| ", authorString='" + authorString + '\'' + | ||
| ", path='" + path + '\'' + | ||
| ", collaboration='" + collaboration + '\'' + | ||
| ", postProcessEditors=" + postProcessEditors + | ||
| ", crossrefError=" + crossrefError + | ||
| ", normalized_submission_date=" + normalized_submission_date + | ||
| ", normalized_download_date=" + normalized_download_date + | ||
| ", originalAffiliation='" + originalAffiliation + '\'' + | ||
| ", originalAbstract='" + originalAbstract + '\'' + | ||
| ", originalTitle='" + originalTitle + '\'' + | ||
| ", originalAuthors='" + originalAuthors + '\'' + | ||
| ", originalEditors='" + originalEditors + '\'' + | ||
| ", originalAddress='" + originalAddress + '\'' + | ||
| ", originalNote='" + originalNote + '\'' + | ||
| ", originalKeyword='" + originalKeyword + '\'' + | ||
| ", originalVolumeBlock='" + originalVolumeBlock + '\'' + | ||
| ", originalJournal='" + originalJournal + '\'' + | ||
| ", workingGroup='" + workingGroup + '\'' + | ||
| ", documentType='" + documentType + '\'' + | ||
| '}'; | ||
| return ReflectionToStringBuilder.toString(this, ToStringStyle.SHORT_PREFIX_STYLE); | ||
| } |
There was a problem hiding this comment.
Switching to ReflectionToStringBuilder.toString(this, ...) will include all non-static fields (e.g., labeledTokens, working token copies, authorEmailAssigner, etc.). This can create extremely large/expensive log strings and may unintentionally dump internal objects/PII when BiblioItem is logged or concatenated into error messages. Prefer an explicit ToStringBuilder that whitelists a small set of stable, safe fields (or keep the previous curated list).
Rewrite all the overloaded methods using standard libraries.
We ignore the toString that are actually parsed from other methods /utilities.