Skip to content

Reusable React components on JSF: file uploader (DVWebloader v2) and lazy file tree view (#6691, #12179)#12382

Draft
ErykKul wants to merge 15 commits intodevelopfrom
6691_reusable_components
Draft

Reusable React components on JSF: file uploader (DVWebloader v2) and lazy file tree view (#6691, #12179)#12382
ErykKul wants to merge 15 commits intodevelopfrom
6691_reusable_components

Conversation

@ErykKul
Copy link
Copy Markdown
Collaborator

@ErykKul ErykKul commented May 5, 2026

What this PR does / why we need it:

Lands the backend half of the reusable React components pattern that lets a single React component built in dataverse-frontend mount on either the SPA or a JSF page, behind a feature flag, with the legacy widget as the off-state. Two concrete components ride on that pattern in this PR:

  1. React file uploader (DVWebloader v2) — replaces the classic PrimeFaces p:fileUpload widget on the dataset edit page when dataverse.feature.react-uploader is enabled.
  2. React lazy file tree view — replaces the classic PrimeFaces tree on the dataset Files tab when dataverse.feature.react-tree-view is enabled. The Table view is unchanged.

Both flags default to off. JSF behaves exactly as before until an operator opts in.

Net-new in this PR:

  • GET /api/datasets/{id}/versions/{versionId}/tree — a paginated, lazy listing of the immediate folders + files inside a folder of a dataset version. Opaque keyset cursor, name ordering, include/order/originals filters, ETag + If-None-Match for published versions. Used by the tree component but generally useful to any client that wants to walk a dataset's directory structure without materialising all files at once.
  • DatasetVersionTreeService + unit + integration tests (DatasetsTreeIT).
  • Two FeatureFlags enum entries (REACT_UPLOADER, REACT_TREE_VIEW).
  • One new JVM setting dataverse.reusable-components.base-url (default /dvwebloader) so operators can host the bundle in a sidecar container, on their own nginx, or behind a CDN — see new operator-facing Sphinx page.
  • <ui:fragment> swaps in editFilesFragment.xhtml and filesFragment.xhtml, gated on the flags.
  • Server-authoritative S3 tagging: S3AccessIO.generateTemporaryS3UploadUrls now includes a tagging field in its JSON response when dataverse.files.<driverId>.disable-tagging is unset. The dataverse-client-javascript SDK reads this and decides whether to send x-amz-tagging — non-breaking, additive.
  • doc/Architecture/reusable_frontend_components.md (refactored to be a backend integration guide; cross-links the frontend half).
  • doc/sphinx-guides/source/container/running/reusable-components.rst (new operator guide).
  • Two release-notes snippets under doc/release-notes/.

Which issue(s) this PR closes:

Special notes for your reviewer:

  • The branch is rebased onto develop, decoupled from the 12178_* hardening track, so review and merge are independent.
  • Backend perf caveat — read before enabling dataverse.feature.react-tree-view on a JSF-only install with large datasets. The DatasetVersionTreeService.listChildren first cut walks version.getFileMetadatas() once per request and partitions in memory. The wire is correct and the cursor behaviour is stable, but for a dataset with ~100k files, opening 10 folders is roughly 10× the backend work the table view does in 1 request. This is acceptable for a few-thousand-file install, an SPA-only opt-in, or a power user driving the URL bookmark; it is not yet right for advertising the JSF mount on a large-dataset operator. Promotion to a native folder query + JPA Criteria for files (with Flyway indices and a side-by-side fixture-comparison IT) is tracked as the next focused PR. Until then the JSF feature flag should stay off on big installs.
  • The streaming-zip download in the SPA / JSF tree is entirely client-side (browser builds the zip via client-zip); there is no server-side ZIP endpoint touched in this PR.
  • Two existing controllers gain inline error mapping for the new tree endpoint (invalid query, path-not-found bundle keys); the flow uses the same getDatasetVersionOrDie plumbing as /versions/{versionId}/files, so permissions / embargoes / restrictions / deaccession honour the same rules.
  • The reusable-component bundle URL is configurable via dataverse.reusable-components.base-url. Default /dvwebloader preserves backwards compatibility with the dev-environment nginx alias and with operators who already self-host. Three reasonable hosting patterns are documented (sidecar image, own nginx, CDN).
  • Merge ordering — this PR can merge in parallel with IQSS/dataverse-client-javascript#403; the two are independent because dataverse has no npm / js-dataverse dependency. The matching frontend PR (IQSS/dataverse-frontend#898) merges last, with a final commit that bumps the SDK pin from the GitHub Packages prerelease (2.2.0-pr403.<sha>) to the released semver the SDK PR cuts.
  • No bundle publish workflow yet. Hosting the dv-uploader.js and dv-tree-view.js bundles for production operators (npm package + Docker sidecar image) is a separate piece of work tracked in the cross-repo plan and deferred to a team discussion. Until it lands, operators wanting the JSF mount in production have to build dataverse-frontend and serve dist-uploader/ themselves. Documented in the operator guide as the current limitation. The dev-compose setup automates this for reviewers, so this PR is independently reviewable as-is.

Suggestions on how to test this:

Backend-only checks (no frontend repo needed):

mvn test -Dtest='DatasetVersionTreeServiceTest'         # unit (fast)
mvn -Pct itest -Dit.test='DatasetsTreeIT'               # integration: happy path,
                                                         # path normalisation, cursor
                                                         # stability, 400 / 403 paths,
                                                         # originals=true
mvn test                                                 # full unit suite

Then exercise the new endpoint directly:

export SERVER_URL=http://localhost:8080
export PID="doi:10.5072/FK2/AAAAAA"
curl "$SERVER_URL/api/datasets/:persistentId/versions/:latest/tree?path=&limit=100" \
    -H "X-Dataverse-key: $API_TOKEN" \
    -G --data-urlencode "persistentId=$PID"
# Then echo back the returned nextCursor:
curl "$SERVER_URL/api/datasets/:persistentId/versions/:latest/tree?cursor=<token>" ...

End-to-end with the JSF mount (requires the matching frontend bundle to be reachable; the dev-compose setup in dataverse-frontend/dev-env/ serves it for you):

  1. Bring up the dev compose: cd dataverse-frontend/dev-env && docker compose up. It pulls a gdcc/dataverse:<DATAVERSE_IMAGE_TAG> image and mounts the locally-built bundle at /dvwebloader/. Both flags are on by default in the dev compose env.
  2. Browse to http://localhost:8000/editdatafiles.xhtml?datasetId=<id> → the React uploader replaces the PrimeFaces upload widget.
  3. Browse to http://localhost:8000/dataset.xhtml?persistentId=<pid> → flip the existing Tree toggle in the Files tab → the React lazy tree mounts.
  4. Toggle either flag off and reload — legacy JSF widget renders unchanged.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Yes, but only when one of the two new feature flags is enabled:

  • dataverse.feature.react-uploader=on → the file upload widget on the dataset edit page becomes the React uploader. Same dataset-edit flow, no DB-level differences.
  • dataverse.feature.react-tree-view=on → the existing Table / Tree toggle on the dataset Files tab keeps working; the Tree view is rendered by the React lazy tree (with selectable rows, keyboard navigation, URL bookmarkability, and a client-side streaming-zip download for the user's selection) instead of the PrimeFaces tree. The Table view is unchanged.

The frontend half (with screenshots / Storybook stories / Chromatic baselines) is in IQSS/dataverse-frontend#898.

Is there a release notes update needed for this change?:

Yes — included in this PR under doc/release-notes/:

  • 6691-reusable-frontend-components.md — covers the JSF mount pattern, both feature flags, the new JVM setting, the S3-tagging change, hosting options, and prerequisites.
  • 6691-dataset-version-tree-listing-api.md — covers the new GET .../tree endpoint, query parameters, response shape, ETag semantics, and the matching SDK helpers in dataverse-client-javascript.

Additional documentation:

  • New operator guide: doc/sphinx-guides/source/container/running/reusable-components.rst (Reusable Frontend Components) — how to host the bundle, sidecar vs CDN vs same-origin nginx, prerequisites, versioning.
  • New native-API section: doc/sphinx-guides/source/api/native-api.rst § List a Folder of a Dataset Version (Tree View) — endpoint contract, params, response, ETag, error codes.
  • Refactored: doc/Architecture/reusable_frontend_components.md — the backend half of the reusable-components contract (matches dataverse-frontend/docs/reusable-components.md on the frontend side).
  • Sphinx config entries for dataverse.feature.react-uploader, dataverse.feature.react-tree-view, and dataverse.reusable-components.base-url.

Cross-repo PRs that pair with this one:

A canonical, living plan for the cross-repo work is at tree_view_plan.md in the workspace dataverse-context repo.


AI assistance disclosure: This PR was developed with significant assistance from an AI coding assistant (Claude). All Java, JSF, Sphinx, and release-notes content was generated with AI involvement; the human author reviewed and curated each commit before pushing. Reviewers should treat the diff as if any human had written it — flag anything that looks off, especially around the in-memory paginator's behaviour on large datasets.

ErykKul added 14 commits May 5, 2026 11:08
When S3 tagging is enabled (DISABLE_S3_TAGGING is false or unset),
generateTemporaryS3UploadUrls now includes "tagging": "dv-state=temp" in
the JSON response. The client reads this field and sets x-amz-tagging
accordingly — making the server authoritative instead of duplicating the
JVM setting on the client.

Also adds doc/Architecture/reusable_frontend_components.md covering
the cross-repo uploader and tree view design decisions.
- FeatureFlags.REACT_UPLOADER: replace @todo with @SInCE 6.11; document
  the runtime requirement (api-session-auth) and the expected bundle URL.
- editFilesFragment.xhtml: short comment explaining why
  dropBoxUploadFinished is now hoisted out of the legacy upload block
  (the Dropbox panel renders independently of the React/JSF upload
  switch and still needs the callback).
- reusable_frontend_components.md: document the CSS isolation strategy
  and the remaining Bootstrap-globals limitation, with PostCSS scoping
  / Shadow DOM as the planned follow-ups.
The JSF page that mounts the React uploader currently hardcodes the
bundle path as `/dvwebloader/...` (legacy from DVWebloader v1). This
worked only when the dataverse-frontend dev environment served the
build output at that same-origin path.

To support institutions that don't run the SPA — and that may host
the bundle from a sidecar container, an existing nginx alias, or a
CDN — make the base URL configurable.

- JvmSettings: new entry REUSABLE_COMPONENTS_BASE_URL bound to
  `dataverse.reusable-components.base-url`.
- SystemConfig.getReusableComponentsBaseUrl(): returns the configured
  URL with any trailing slash trimmed, defaulting to `/dvwebloader`
  to preserve backward compatibility with the existing dev nginx
  alias and any same-origin operator setup.
- editFilesFragment.xhtml: the React-uploader script tag now reads
  `#{systemConfig.reusableComponentsBaseUrl}/reusable-components/
  dv-uploader.js` instead of the literal `/dvwebloader/...`. JSF
  fallback path is unchanged.

Non-breaking: default behaviour matches the previous hardcoded path.
Operator-facing documentation for the reusable React components track:
how to host the bundle, how to point the JSF page at it, and how
versioning flows through npm → Docker image → JVM setting.

- doc/sphinx-guides/source/container/running/reusable-components.rst
  is a new guide page modelled on previewers-provider in the demo
  guide. It explains the npm + sidecar-image distribution model,
  walks through three valid hosting choices (gdcc/dataverse-reusable-
  components container, operator-managed nginx, CDN), gives a sample
  Docker Compose service block, and cross-references the relevant
  feature flags + the frontend-side contract document.

- frontend-dev.rst now links to the new page so readers landing on
  the SPA-frontend guide find the JSF integration story.

- container/running/index.rst toctree includes the new page between
  frontend-dev and backend-dev.

- installation/config.rst adds:
  - dataverse.feature.react-uploader (the existing flag, finally
    documented) with prerequisite notes.
  - dataverse.reusable-components.base-url next to dataverse.siteUrl,
    with examples for sidecar / nginx / CDN setups.

- doc/release-notes/6691-reusable-frontend-components.md describes
  the React uploader feature flag, the new JVM setting, the S3
  tagging server-authoritative change, the prerequisites for
  enabling the feature, and the cross-repo coordination.
The original document mixed cross-repo decision-log content with
backend-side integration mechanics. Split that responsibility:

- This document (in dataverse) is now strictly the BACKEND HALF of
  the dual-mode contract: how JSF pages mount React components built
  in dataverse-frontend, how feature flags gate the swap, how nginx
  hosts the bundle, and how to add a new JSF page that mounts an
  SPA component.

- The matching FRONTEND HALF — config interfaces, build pipeline,
  CSS isolation, how to make a component reusable — lives in
  dataverse-frontend/docs/reusable-components.md (added in that repo).

- Cross-repo decisions, branch tracking, and active-track notes move
  out of this file entirely; they belong in the working plan rather
  than in committed Dataverse documentation.

The new content covers:
- Why dual-mode + the integration pattern diagram.
- Feature flag conventions and naming.
- Authentication prerequisites (session-cookie + hardening).
- Hosting options for the bundle (image / nginx / CDN).
- A worked example of replacing a JSF widget with an SPA component
  (the uploader).
- Adding a brand-new reusable component to a JSF page (the upcoming
  tree-view case).
- Currently shipped components (uploader, tree-view planned).
- Risks and trade-offs (Bootstrap collision, session-cookie, etc.).
New API endpoint that lazy-lists the immediate children (folders +
files) inside a folder of a dataset version, enabling tree-view UIs
to fetch on demand and paginate stably across very large datasets:

  GET /api/datasets/{id}/versions/{versionId}/tree

Query parameters: path, limit (default 100, clamped 1-1000), cursor
(opaque keyset token), include (all|folders|files), order
(NameAZ|NameZA), includeDeaccessioned, originals.

Response: {path, items[], nextCursor, limit, order, include,
approximateCount}. Folders come first, then files; both name-sorted
case-insensitively, files break ties on data file id for stability.
Folder items carry counts of distinct subfolders + descendant files.
File items carry id, size, contentType, access (public/restricted/
embargoed), optional checksum, and downloadUrl. Permissions and
embargoes are honoured exactly as on .../files.

Implementation:
- DatasetVersionTreeService (new package edu.harvard.iq.dataverse.
  datasetversiontree): walks DatasetVersion.fileMetadatas once,
  groups files by their first segment relative to the requested
  path, applies include/order, paginates in memory with an opaque
  Base64 "offset=N" cursor. Wire format and cursor behaviour are
  stable; promotion to native keyset SQL is tracked as a follow-up
  and won't change the contract.
- Datasets.getVersionTree handler + jsonTreePage serialiser.
- Bundle.properties keys for invalid-query / not-found errors.

Tests:
- DatasetVersionTreeServiceTest covers root grouping, folder-only
  immediate-children listing, path normalisation
  (/data//sub/// → data/sub), include filter, cursor-paginated
  retrieval, invalid-cursor / invalid-order rejection, originals
  toggle on the downloadUrl, descending order, restricted /
  embargoed access strings, and folder-counts semantics.

Sphinx native-api.rst gains a "List a Folder of a Dataset Version
(Tree View)" section. Release-notes snippet at
doc/release-notes/6691-dataset-version-tree-listing-api.md.
End-to-end coverage of the new dataset-version tree endpoint, run
against a live container in CI. Complements the unit-level
DatasetVersionTreeServiceTest which only exercises the service bean.

Tests:
- root listing returns immediate children, folders first, with the
  expected counts {files, folders} on each folder item.
- folder listing returns only immediate children.
- path normalisation (/data//sub///) → "data/sub".
- cursor pagination is stable and exhausts cleanly.
- invalid cursor → 400.
- invalid order → 400.
- include filter restricts items to folders or files.
- descending order keeps folders-first but reverses the within-type
  sort.
- originals=true switches the file downloadUrl to ?format=original.
- unauthenticated access to a draft → 401/403.
- another authenticated user without permission → 404 (Dataverse's
  standard "draft not visible" behaviour, not 403).
- empty dataset → empty items list with approximateCount=0.
- a published dataset is readable via :latest.

UtilIT gains a getVersionTree helper that mirrors the existing
getVersionFiles helper.
For published, non-deaccessioned versions, the response now carries:

  ETag:          "<sha256-prefix>"
  Cache-Control: public, immutable

The ETag is derived from a stable hash of (version id, version
state, path, limit, cursor, include, order, originals,
includeDeaccessioned). Subsequent requests including a matching
If-None-Match header receive 304 Not Modified with no body.

Drafts and deaccessioned versions do not emit an ETag because their
content can change in place. The published-version assumption holds
because Dataverse versions are immutable once released; deaccession
is the only state change, and we exclude it explicitly.

Doc + release-notes updates describe the caching contract.
DatasetsTreeIT gains two tests:
- draft response must NOT carry an ETag
- published response carries ETag + Cache-Control, honours
  If-None-Match (returning 304), and changes the ETag on
  different query params.
Sphinx guide and the per-issue release-notes snippet now mention the
ETag / Cache-Control / If-None-Match contract added in the previous
commit. The behaviour itself is unchanged.
…#12179)

Mirrors the existing react-uploader pattern: a JVM feature flag
controls whether the JSF page renders the React reusable component
or the classic PrimeFaces widget.

- New feature flag dataverse.feature.react-tree-view in
  FeatureFlags.java + SystemConfig.isReactTreeViewEnabled().
- filesFragment.xhtml: when the flag is on AND the user selects the
  Tree mode of the existing Table/Tree toggle, the page renders
  <div id="dv-tree-view"> + a window.dvTreeViewConfig snippet + a
  module script tag pointing at #{systemConfig.reusableComponentsBaseUrl}
  /reusable-components/dv-tree-view.js. Otherwise the existing
  p:tree continues to render unchanged.
- Sphinx config.rst documents the new flag next to react-uploader
  and links to the operator guide.
- container/running/reusable-components.rst notes both shipped
  components share the same build/distribution.
- 6691-reusable-frontend-components.md release-notes file gains a
  bullet for the tree-view flag.

The React bundle is built by the dataverse-frontend
build-uploader script (vite.config.uploader.ts) and ships
alongside dv-uploader.js with shared chunks.

This satisfies #12179 (direct JS mount in JSF for tree view).
Replaces the 'in development' tree-view note with the shipped surface
(JSF mount path, config interface, backend endpoint, ETag, streaming
zip) and updates the greenfield-pattern paragraph to reflect that the
tree view has landed.
Drops the last two FQN references in the new tree handler's ETag
helper. Cosmetic; matches prevailing style in the file.
@github-actions github-actions Bot added Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: File Upload & Handling Type: Feature a feature request Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc. User Role: Guest Anyone using the system, even without an account labels May 5, 2026
@ErykKul ErykKul marked this pull request as draft May 5, 2026 16:33
Title 'List a Folder of a Dataset Version (Tree View)' is 46
characters; underline was 45. Sphinx 7.x treats this as a build error
('Warning, treated as error: Title underline too short.') under the
docs / readthedocs CI. One extra tilde fixes it.
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 25.041% (+0.08%) from 24.958% — 6691_reusable_components into develop

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:6691-reusable-components
ghcr.io/gdcc/configbaker:6691-reusable-components

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: File Upload & Handling Type: Feature a feature request Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc. User Role: Guest Anyone using the system, even without an account

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Feature-Flagged Direct JSF Mount for SPA Tree View (with iframe Fallback) Allow selecting of files in Tree View to Edit or Download

2 participants