Description
When creating a studio with the default image template that includes Jupyter, the Jupyter kernel only works on the first visit. Subsequent connections fail with the kernel stuck at "Connecting" and eventually timing out.
Root Cause
The issue is caused by debugpy deadlocking under QEMU amd64-on-arm64 emulation. During kernel startup, IPKernelApp.init_kernel() calls debugpy's debugger initialization, which spawns threads and uses low-level socket/fd operations that deadlock under QEMU's binary translation. The kernel process appears alive (ports bound, listening) but its event loop never starts processing messages — heartbeat, shell, and control channels all time out.
Workaround
Create a custom image that disables debugpy. Build a Dockerfile extending the base image with the following fixes:
FROM ggo-studio-torch:flan-t5
RUN pip install --no-cache-dir jupyterlab
# Disable token auth and fix kernel hang under QEMU amd64-on-arm64 emulation.
# debugpy deadlocks under QEMU emulation, so we disable it and uninstall it.
RUN mkdir -p /root/.jupyter && echo "\
c.IdentityProvider.token = ''\n\
c.ServerApp.token = ''\n\
c.MappingKernelManager.kernel_info_timeout = 120\n\
" > /root/.jupyter/jupyter_server_config.py && \
mkdir -p /etc/ipython && echo "\
c.IPKernelApp.capture_fd_output = False\n\
" > /etc/ipython/ipython_kernel_config.py && \
pip uninstall -y debugpy && \
python3 -c "import json; p='/usr/local/share/jupyter/kernels/python3/kernel.json'; \
d=json.load(open(p)); d['metadata']['debugger']=False; json.dump(d,open(p,'w'),indent=1)"
EXPOSE 8888
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
Key fixes in the workaround
| Fix |
Why |
pip uninstall -y debugpy |
Removes the debugger that deadlocks under QEMU emulation |
kernel.json metadata debugger: false |
Tells Jupyter UI not to offer debug features |
capture_fd_output = False |
Disables fd-watching pipe redirects (precautionary) |
IdentityProvider.token = '' |
Allows accessing /lab without a token query parameter |
kernel_info_timeout = 120 |
Gives kernels more time to start under emulation |
Build with DOCKER_BUILDKIT=0 (required because BuildKit cannot resolve local-only base images):
DOCKER_BUILDKIT=0 docker build -f Dockerfile.flan-t5-jupyter -t ggo-studio-torch:flan-t5-jupyter .
Trade-offs
- Lost: "Debug Cell" feature in Jupyter (step-through debugging)
- Kept: All other Jupyter functionality — code execution, notebooks, terminals, extensions
Expected Fix
The default image template should either ship with debugpy disabled or detect QEMU emulation and disable it automatically.
Description
When creating a studio with the default image template that includes Jupyter, the Jupyter kernel only works on the first visit. Subsequent connections fail with the kernel stuck at "Connecting" and eventually timing out.
Root Cause
The issue is caused by debugpy deadlocking under QEMU amd64-on-arm64 emulation. During kernel startup,
IPKernelApp.init_kernel()calls debugpy's debugger initialization, which spawns threads and uses low-level socket/fd operations that deadlock under QEMU's binary translation. The kernel process appears alive (ports bound, listening) but its event loop never starts processing messages — heartbeat, shell, and control channels all time out.Workaround
Create a custom image that disables debugpy. Build a Dockerfile extending the base image with the following fixes:
Key fixes in the workaround
pip uninstall -y debugpykernel.jsonmetadatadebugger: falsecapture_fd_output = FalseIdentityProvider.token = ''/labwithout a token query parameterkernel_info_timeout = 120Build with
DOCKER_BUILDKIT=0(required because BuildKit cannot resolve local-only base images):DOCKER_BUILDKIT=0 docker build -f Dockerfile.flan-t5-jupyter -t ggo-studio-torch:flan-t5-jupyter .Trade-offs
Expected Fix
The default image template should either ship with debugpy disabled or detect QEMU emulation and disable it automatically.