Skip to content

Jupyter kernel hangs after first visit when using default image template #10

@yueneiqi

Description

@yueneiqi

Description

When creating a studio with the default image template that includes Jupyter, the Jupyter kernel only works on the first visit. Subsequent connections fail with the kernel stuck at "Connecting" and eventually timing out.

Root Cause

The issue is caused by debugpy deadlocking under QEMU amd64-on-arm64 emulation. During kernel startup, IPKernelApp.init_kernel() calls debugpy's debugger initialization, which spawns threads and uses low-level socket/fd operations that deadlock under QEMU's binary translation. The kernel process appears alive (ports bound, listening) but its event loop never starts processing messages — heartbeat, shell, and control channels all time out.

Workaround

Create a custom image that disables debugpy. Build a Dockerfile extending the base image with the following fixes:

FROM ggo-studio-torch:flan-t5

RUN pip install --no-cache-dir jupyterlab

# Disable token auth and fix kernel hang under QEMU amd64-on-arm64 emulation.
# debugpy deadlocks under QEMU emulation, so we disable it and uninstall it.
RUN mkdir -p /root/.jupyter && echo "\
c.IdentityProvider.token = ''\n\
c.ServerApp.token = ''\n\
c.MappingKernelManager.kernel_info_timeout = 120\n\
" > /root/.jupyter/jupyter_server_config.py && \
    mkdir -p /etc/ipython && echo "\
c.IPKernelApp.capture_fd_output = False\n\
" > /etc/ipython/ipython_kernel_config.py && \
    pip uninstall -y debugpy && \
    python3 -c "import json; p='/usr/local/share/jupyter/kernels/python3/kernel.json'; \
    d=json.load(open(p)); d['metadata']['debugger']=False; json.dump(d,open(p,'w'),indent=1)"

EXPOSE 8888

CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]

Key fixes in the workaround

Fix Why
pip uninstall -y debugpy Removes the debugger that deadlocks under QEMU emulation
kernel.json metadata debugger: false Tells Jupyter UI not to offer debug features
capture_fd_output = False Disables fd-watching pipe redirects (precautionary)
IdentityProvider.token = '' Allows accessing /lab without a token query parameter
kernel_info_timeout = 120 Gives kernels more time to start under emulation

Build with DOCKER_BUILDKIT=0 (required because BuildKit cannot resolve local-only base images):

DOCKER_BUILDKIT=0 docker build -f Dockerfile.flan-t5-jupyter -t ggo-studio-torch:flan-t5-jupyter .

Trade-offs

  • Lost: "Debug Cell" feature in Jupyter (step-through debugging)
  • Kept: All other Jupyter functionality — code execution, notebooks, terminals, extensions

Expected Fix

The default image template should either ship with debugpy disabled or detect QEMU emulation and disable it automatically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions