Skip to content

Here's what I noticed when reviewing the notebooks... #9

@pancakereport

Description

@pancakereport
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/1-Instructor_Utils/1-1-API_Key_Setup.ipynb:
    1. Might be worth it to explain that ls lists all the files in a given dir in Step 1. You're trying to show that the dir exists happily, right?
    2. All keys written to .env are capitalized and later you use openai_API_KEY. It still loads, but weird inconsistency.
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/1-Instructor_Utils/1-2-HuggingFace_Hub_Download_gguf.ipynb
    1. model_file = os.path.join(shared_model_path, "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf") and shared_model_path is to the readwrite dir(not the read dir) which could cause confusion for students down the line for those with similar set ups if taken out of context.
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/2-Beginner_NBs/2-1-LlamaCpp_SmallLM_Demo.ipynb
    1. This notebook assumes that at least one ‘Small model’ file ending in .gguf has already been downloaded into a directory (see GPT4All_Download_gguf.ipynb for more). Should this be "see 1-2-HuggingFace_Hub_Download_gguf.ipynb for more"?
    2. chat_format="chatml" # Qwen uses ChatML format suggests to be that the chat_format is determined by the model, but otherwise this was not apparent to me. This does end up being addressed later, so it could be moved up (probably not important) and/or an additional note about how to determine the proper format could be added there.
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/2-Beginner_NBs/2-2-Model_Weights_and_Tokens.ipynb
    1. how a language model actually works with numbers -> how a language model actually works with numbers (embeddings)
    2. Cells with ## 1. Environment Setup, ## 5. Inside a GGUF File, ## 7. How Concepts Map to Numbers: Token Embeddings, and ## 8. Putting It All Together: The Full Pipeline have --- at the beginning so they don't render correctly.
    3. The first time you see a tokenizer break up words into smaller word pieces happens mid-way through the notebook with "comparative." Is there a quick way to incorporate this sooner? Even just showing that word tokenized by itself could be good.
    4. I don't think I'm getting correct numbers printed in section 5.1 Reading GGUF Metadata when I run the notebook... Context length can't be 3, right?! Image Image
    5. Similarly, this doesn't feel right (and I guess if it is, it could use additional explanatory text) Image
    6. Is it worth it to mention A size (bytes on disk — much smaller than the raw float32 equivalent) in 5.3 if this isn't printed out?
    7. Cell in 7.1doesn't run for me. Image Also you may want to silence the deprecation warning.
    8. Next steps at the end of the notebook don't make sense. Inside_Small_Model.ipynb doesn't exist in the repo that is cloned and LlamaCpp_SmallLM_Demo.ipynb is numbered 2.1 (before this notebook 2.2)
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/3-API_NBs/3-1-Anthropic_API.ipynb
    1. Experiment 1 haiku cell doesn't run for me (sonnet ok): NotFoundError: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-haiku-4-20250514'}, 'request_id': 'req_011CYsz6HZhhv1XJuj5STKya'}. Changing to haiku_model_id = "claude-3-haiku-20240307" worked for me to run the rest of the notebook.
    2. Discussion of temperature slightly conflicts with nb 2.1, at least when it comes to setting temp to 1.
    3. Cell that goes with Putting It All Together doesn't run for me: BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'stop_sequences: each stop sequence must contain non-whitespace'}, 'request_id': 'req_011CYszyuv9rRMfYwWhotTnS'} Removing the stop_sequences parameter makes the cell run.
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/3-API_NBs/3-2-OpenAI_API.ipynb
    1. Cell that begins with # Send a chat message to GPT-4o-mini should be removed/moved down because it doesn't belong to the Checking Available Models section.
    2. The two cells after the interactive widgit are incredibly similar. Do you need both? Can you provide more context for both/either?
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/4-SAT-TestTaker/QwenTestTaker-shared.ipynb
    1. Reflection checkpoints are mentioned at the beginning and have a weird framing out of context/on first read. It also appears that only Checkpoint 3 actually writes to an external file...
    2. Tables of SAT questions include these columns visuals.type visuals.svg_content that are all NaNs. I'd exclude these from output.
    3. The dataframes that are created with pd.concat() seem to have rows with the same question twice.. what's up with that? Either more explanation is needed or these should be removed. I don't think the dataframes are even used..
    4. Ask the model - English code cell doesn't define random_para_text. It should include random_para_text = random_entry["question"]["paragraph"]
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/5-RAG/RAG_Tutorial.ipynb
    1. Cells with ## Our Knowledge Base, ## Finding the Right Document, ### The Real Test: Different Words, Same Meaning, ### Visualizing Meaning Space, ## The Full Pipeline: Retrieve + Generate, ## Try It Yourself!, and ## Key Takeaways have --- at the beginning so they don't render correctly.
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/6-NoCodeNBs/LLM_Context_Management_and_Dynamic_Prompts.ipynb
    1. I highly doubt that many of the color choices for text have sufficient contrast.
    2. Image
  • https://github.com/ds-modules/Small_Models_SP26/blob/main/6-NoCodeNBs/Personal_Writing_Assistant.ipynb
    1. Buttons don't render: Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions