Fix: ValueError: filedescriptor out of range in select() for papermill notebook execution #2994

RexBearIU · 2026-01-22T09:42:20Z

Description

This pull request updates the RL and SFT demo notebooks to improve compatibility with non-interactive execution environments (such as Papermill), replacing notebook magic commands with Python subprocess calls and providing more robust error handling and logging. It also updates kernel and Python version metadata and enhances output visibility for key initialization steps.

Error log:

ValueError                                Traceback (most recent call last)
Cell In[8], line 3
      1 if not os.path.exists(MODEL_CHECKPOINT_PATH):
      2     # install torch for the conversion script
----> 3     get_ipython().system('python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu')
      5     get_ipython().system('JAX_PLATFORMS=cpu PYTHONPATH={MAXTEXT_REPO_ROOT} {sys.executable} -m MaxText.utils.ckpt_conversion.to_maxtext        {MAXTEXT_REPO_ROOT}/configs/base.yml        model_name={MODEL_NAME}        base_output_directory={MODEL_CHECKPOINT_PATH}        hf_access_token={HF_TOKEN}        use_multimodal=false        scan_layers=true        skip_jax_distributed_system=True')
      7 if not os.path.exists(MOD
[2026-01-22, 06:14:00 UTC] {logging_mixin.py:190} WARNING - EL_CHECKPOINT_PATH):

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/ipykernel/zmqshell.py:788, in ZMQInteractiveShell.system_piped(self, cmd)
    786         self.user_ns["_exit_code"] = system(cmd)
    787 else:
--> 788     self.user_ns["_exit_code"] = system(self.var_expand(cmd, depth=1))

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/IPython/utils/_process_posix.py:130, in ProcessHandler.system(self, cmd)
    126 flush = sys.stdout.flush
    127 while True:
    128     # res is the index of the pattern that caused the match, so we
    129     # know whether we've finished (if we matched EOF) or not
--> 130     res_idx = child.expect_list(patterns, self.read_timeout)
    131     print(child.before[out_size:].decode(enc, 'replace'), end='')
    132     flush()

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/pexpect/spawnbase.py:383, in SpawnBase.expect_list(self, pattern_list, timeout, searchwindowsize, async_, **kw)
    381     return expect_async(exp,
[2026-01-22, 06:14:00 UTC] {logging_mixin.py:190} WARNING -  timeout)
    382 else:
--> 383     return exp.expect_loop(timeout)

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/pexpect/expect.py:169, in Expecter.expect_loop(self, timeout)
    167     return self.timeout()
    168 # Still have time left, so read more data
--> 169 incoming = spawn.read_nonblocking(spawn.maxread, timeout)
    170 if self.spawn.delayafterread is not None:
    171     time.sleep(self.spawn.delayafterread)

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/pexpect/pty_spawn.py:458, in spawn.read_nonblocking(self, size, timeout)
    450         return select_ignore_interrupts([self.child_fd], [], [], timeout)[0]
    452 # If there is data available to read right now, read as much as
    453 # we can. We do this to increase performance if there are a lot
    454 # of bytes to be read. This also avoids calling isalive() too
    455 # often. See also:
    456 # * https://github.com/pexpect/pexpect/pull/304
    457 # * http://trac.sagemath.org/ticket/10295
[2026-01-22, 06:14:00 UTC] {logging_mixin.py:190} WARNING - 
--> 458 if select(0):
    459     try:
    460         incoming = super(spawn, self).read_nonblocking(size)

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/pexpect/pty_spawn.py:450, in spawn.read_nonblocking.<locals>.select(timeout)
    449 def select(timeout):
--> 450     return select_ignore_interrupts([self.child_fd], [], [], timeout)[0]

File ~/maxtext/maxtext_venv/lib/python3.12/site-packages/pexpect/utils.py:143, in select_ignore_interrupts(iwtd, owtd, ewtd, timeout)
    141 while True:
    142     try:
--> 143         return select.select(iwtd, owtd, ewtd, timeout)
    144     except InterruptedError:
    145         err = sys.exc_info()[1]

ValueError: filedescriptor out of range in select()

Tests

Manually triggered the three notebook and monitored the execution flow step-by-step. Confirmed that the training loop finished and resources were released.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

SurbhiJainUSC · 2026-01-22T20:54:05Z

Please check why notebook CI test is failing

RexBearIU · 2026-01-23T19:57:55Z

Please check why notebook CI test is failing

The CI for notebook is blocking right now, but it works in my local with papermill after added back the import of pyconfig

SurbhiJainUSC · 2026-01-30T01:30:23Z

Please rebase after this is merged: #3000

RexBearIU requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners January 22, 2026 09:42

RexBearIU force-pushed the jackyf/fix_posttraining_notebook branch 3 times, most recently from 39f3531 to 2b26a59 Compare January 22, 2026 10:23

RexBearIU force-pushed the jackyf/fix_posttraining_notebook branch from 2b26a59 to 9ccdbea Compare January 23, 2026 06:14

Fix: ValueError: filedescriptor out of range in select()

3d8d9ad

RexBearIU force-pushed the jackyf/fix_posttraining_notebook branch from 9ccdbea to 3d8d9ad Compare January 23, 2026 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: ValueError: filedescriptor out of range in select() for papermill notebook execution #2994

Fix: ValueError: filedescriptor out of range in select() for papermill notebook execution #2994

RexBearIU commented Jan 22, 2026 •

edited

Loading

Uh oh!

SurbhiJainUSC commented Jan 22, 2026

Uh oh!

RexBearIU commented Jan 23, 2026

Uh oh!

SurbhiJainUSC commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: ValueError: filedescriptor out of range in select() for papermill notebook execution #2994

Are you sure you want to change the base?

Fix: ValueError: filedescriptor out of range in select() for papermill notebook execution #2994

Conversation

RexBearIU commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

SurbhiJainUSC commented Jan 22, 2026

Uh oh!

RexBearIU commented Jan 23, 2026

Uh oh!

SurbhiJainUSC commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RexBearIU commented Jan 22, 2026 •

edited

Loading