[Bug] #6614

dab512 · 2025-03-01T09:51:53Z

📦 Platform

Other

📦 Deploymenet mode

server db(lobe-chat-database image)

📌 Version

Latest

💻 Operating System

Windows

🌐 Browser

Chrome

🐛 Bug Description

When I chunk files it is not chunk by semantic blocks. Every time I got equal length chunks.

But recursive text splitter or md splitter should split by headers or lines.

Maybe I miss some settings but I can’t found anything like this.

📷 Recurrence Steps

No response

🚦 Expected Behavior

No response

📝 Additional Information

No response

lobehubbot · 2025-03-01T09:52:06Z

👀 @dab512

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事，并尽快回复您。请确保您已经提供了尽可能多的背景信息。

dosubot · 2025-03-01T09:53:35Z

The issue you're experiencing with file chunking not splitting by semantic blocks might be related to the current limitations or settings in the lobe-chat framework. As of now, there isn't specific documentation or configuration settings available in the repository that control how files are chunked by semantic blocks or headers. The default behavior seems to result in equal-length chunks, which might not align with your expectations for semantic or header-based splitting.

There is a known limitation where the chatbot processes only a limited portion of chunks, which might affect how the data is handled ^[1]. Additionally, the logic for different file types and their loaders is implemented in the ChunkingLoader class, but specific settings for semantic block chunking are not detailed ^[2].

If you are looking for specific settings or configurations, they might not be explicitly defined in the current version. You might need to explore the codebase further or wait for future updates that address these limitations.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

dab512 added the 🐛 Bug Something isn't working | 缺陷 label Mar 1, 2025

github-project-automation bot added this to Lobe Chat Routine Mar 1, 2025

github-project-automation bot moved this to Roadmap - Chat 1.x in Lobe Chat Routine Mar 1, 2025

dosubot bot added the files 上传文件/知识库 label Mar 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] #6614

[Bug] #6614

dab512 commented Mar 1, 2025

lobehubbot commented Mar 1, 2025

dosubot bot commented Mar 1, 2025

[Bug] #6614

[Bug] #6614

Comments

dab512 commented Mar 1, 2025

📦 Platform

📦 Deploymenet mode

📌 Version

💻 Operating System

🌐 Browser

🐛 Bug Description

📷 Recurrence Steps

🚦 Expected Behavior

📝 Additional Information

lobehubbot commented Mar 1, 2025

dosubot bot commented Mar 1, 2025