What happened?
WorkflowResource.deleteWorkflow opens a JDBC transaction and CASCADE-deletes the workflow row without first stopping any in-flight executions that target the same workflow. While the ComputingUnitWorker keeps writing to FK-child tables (workflow_view_count, workflow_executions, workflow_user_likes, …), the CASCADE check blocks on a row-level lock and never returns. Every subsequent createWorkflow / deleteWorkflow / view-count POST piles up behind it on the same lock.
From the user's perspective the Workflows page becomes fully unresponsive: uploads hang with no error, deletes hang, and the webpack-dev-server proxy eventually emits ECONNRESET then ECONNREFUSED. Recovery requires restarting the JVMs.
Problematic code at WorkflowResource.scala:631:
context.transaction { _ =>
for (wid <- workflowIDs.wids) {
if (workflowOfUserExists(wid, user.getUid)) {
workflowDao.deleteById(wid)
} else {
throw new BadRequestException("The workflow does not exist.")
}
}
}
No active-execution check, no lock_timeout / statement_timeout, no error path — the request thread sits in executeQuery indefinitely.
Suggested fixes (in order of preference)
- Cancel running executions before deleting. In
deleteWorkflow, look up active executions of the workflow via ExecutionResultService / WorkflowExecutionsResource and abort them before opening the delete transaction. Deleting a workflow should imply "stop everything that depends on it".
- Bound the delete transaction.
SET LOCAL lock_timeout = '10s'; SET LOCAL statement_timeout = '30s'; at the start of the transaction so a hung child-table lock surfaces as a 5xx instead of freezing the entire workflow API.
- Independently, harden
HubResource.postView. It blindly upserts into workflow_view_count for whatever wid the dashboard sends; if that wid was just deleted in another tab, the FK violation throws as a 500 and stale tabs keep retrying, exacerbating the contention. An existence check (context.fetchExists(BaseEntityTable(entityType).table, idColumn.eq(entityID))) before the upsert turns those into a no-op return 0.
Workaround
Kill the Texera JVMs (TexeraWebApplication, ComputingUnitWorker, ComputingUnitMaster), restart them, then reload the Workflows page to clear any cached stale wids being POSTed for view-count.
How to reproduce?
- Open a workflow and start an execution that keeps the worker busy for >10 s (e.g. an iris ML pipeline).
- While the execution is still running, navigate to
/dashboard/user/workflow and delete that workflow from the row's delete action.
- Try to upload another workflow (or delete a second one) from the same page.
Expected: upload completes; delete completes once the execution is canceled or finishes.
Observed: delete hangs forever, upload hangs forever, every subsequent workflow-table write piles up behind the same lock. After enough pileup the JVM closes connections under socket pressure and the dev-server proxy starts emitting ECONNRESET → ECONNREFUSED.
Branch
main
Commit Hash (Optional)
No response
What browsers are you seeing the problem on?
Not browser-specific — reproduces on any client; the freeze is server-side.
Relevant log output
# Thread dump of TexeraWebApplication while the API is frozen
# Problem: one open delete transaction holding the row lock,
# every other workflow-table write queued behind it.
"dw-NN" #N daemon (waiting on Postgres response)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:137)
at org.jooq.tools.jdbc.DefaultPreparedStatement.executeQuery(DefaultPreparedStatement.java:104)
at org.jooq.impl.AbstractDMLQuery.executeReturningQuery(AbstractDMLQuery.java:1249)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:428)
at org.jooq.impl.AbstractDMLQuery.execute(AbstractDMLQuery.java:961)
at org.jooq.impl.DAOImpl.deleteById(DAOImpl.java:284)
at org.apache.texera.web.resource.dashboard.user.workflow.WorkflowResource.$anonfun$deleteWorkflow$3(WorkflowResource.scala:634)
at org.jooq.impl.DefaultDSLContext.lambda$transaction$5(DefaultDSLContext.java:612)
at org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:611)
at org.apache.texera.web.resource.dashboard.user.workflow.WorkflowResource.deleteWorkflow(WorkflowResource.scala:631)
"dw-MM" / "dw-OO" / "dw-PP" ... (queued behind the open transaction)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:137)
at org.jooq.impl.AbstractDMLQuery.execute(AbstractDMLQuery.java:1074)
at org.jooq.impl.TableRecordImpl.storeInsert0(TableRecordImpl.java:193)
at org.jooq.impl.TableRecordImpl.insert(TableRecordImpl.java:140)
at org.jooq.impl.DAOImpl.insert(DAOImpl.java:156)
at org.apache.texera.web.resource.dashboard.user.workflow.WorkflowResource$.insertWorkflow(WorkflowResource.scala:89)
at org.apache.texera.web.resource.dashboard.user.workflow.WorkflowResource.createWorkflow(WorkflowResource.scala:573)
# Frontend webpack-dev-server proxy view of the same incident:
[HPM] Error occurred while proxying request localhost:4200/api/workflow/create to http://localhost:8080/ [ECONNRESET]
[HPM] Error occurred while proxying request localhost:4200/api/workflow/delete to http://localhost:8080/ [ECONNRESET]
... (many lines later, after enough socket exhaustion)
[HPM] Error occurred while proxying request localhost:4200/api/workflow/create to http://localhost:8080/ [ECONNREFUSED]
# Foreign-key violation that compounds the contention while the lock is held —
# fired by stale dashboard tabs POSTing /api/hub/view for the deleted wid:
org.jooq.exception.DataAccessException: SQL [insert into "texera_db"."workflow_view_count" ("wid", "view_count") values (?, ?)
on conflict ("wid") do update set "view_count" = ("texera_db"."workflow_view_count"."view_count" + ?)
returning "texera_db"."workflow_view_count"."view_count"];
ERROR: insert or update on table "workflow_view_count" violates foreign key constraint "workflow_view_count_wid_fkey"
Detail: Key (wid)=(173) is not present in table "workflow".
at org.apache.texera.web.resource.dashboard.hub.HubResource.postView(HubResource.scala:401)
What happened?
WorkflowResource.deleteWorkflowopens a JDBC transaction and CASCADE-deletes the workflow row without first stopping any in-flight executions that target the same workflow. While theComputingUnitWorkerkeeps writing to FK-child tables (workflow_view_count,workflow_executions,workflow_user_likes, …), the CASCADE check blocks on a row-level lock and never returns. Every subsequentcreateWorkflow/deleteWorkflow/ view-count POST piles up behind it on the same lock.From the user's perspective the Workflows page becomes fully unresponsive: uploads hang with no error, deletes hang, and the webpack-dev-server proxy eventually emits
ECONNRESETthenECONNREFUSED. Recovery requires restarting the JVMs.Problematic code at
WorkflowResource.scala:631:context.transaction { _ => for (wid <- workflowIDs.wids) { if (workflowOfUserExists(wid, user.getUid)) { workflowDao.deleteById(wid) } else { throw new BadRequestException("The workflow does not exist.") } } }No active-execution check, no
lock_timeout/statement_timeout, no error path — the request thread sits inexecuteQueryindefinitely.Suggested fixes (in order of preference)
deleteWorkflow, look up active executions of the workflow viaExecutionResultService/WorkflowExecutionsResourceand abort them before opening the delete transaction. Deleting a workflow should imply "stop everything that depends on it".SET LOCAL lock_timeout = '10s'; SET LOCAL statement_timeout = '30s';at the start of the transaction so a hung child-table lock surfaces as a 5xx instead of freezing the entire workflow API.HubResource.postView. It blindly upserts intoworkflow_view_countfor whatever wid the dashboard sends; if that wid was just deleted in another tab, the FK violation throws as a 500 and stale tabs keep retrying, exacerbating the contention. An existence check (context.fetchExists(BaseEntityTable(entityType).table, idColumn.eq(entityID))) before the upsert turns those into a no-opreturn 0.Workaround
Kill the Texera JVMs (
TexeraWebApplication,ComputingUnitWorker,ComputingUnitMaster), restart them, then reload the Workflows page to clear any cached stale wids being POSTed for view-count.How to reproduce?
/dashboard/user/workflowand delete that workflow from the row's delete action.Expected: upload completes; delete completes once the execution is canceled or finishes.
Observed: delete hangs forever, upload hangs forever, every subsequent workflow-table write piles up behind the same lock. After enough pileup the JVM closes connections under socket pressure and the dev-server proxy starts emitting
ECONNRESET → ECONNREFUSED.Branch
main
Commit Hash (Optional)
No response
What browsers are you seeing the problem on?
Not browser-specific — reproduces on any client; the freeze is server-side.
Relevant log output