fix the bug of md content extraction.#297
Conversation
|
I don’t think we should let the LLM know whether the file is PDF or Markdown, just to decide which method to call — that logic should be handled by the code, not the prompt. Also, if we later add support for other formats like PPT, would we need to update the prompt again? Separately, I’m still unclear about the root cause of the issue. Is it: Failing to extract the correct line-numbered content? A prompt design issue? Or a function naming problem? |
Yes, the prompt can be relatively free and there is no need to limit the file format. But if we add this limitation for LLM, it can stably infer correct line numbers. For example, in my test case, LLM infers “37-39, 4-5“ line ranges of a markdown file, which will output [4, 5, 37, 38, 39] after |
This PR aims to fix the bug of md content extraction #296 . @BukeLy @KylinMountain