Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] #6614

Open
dab512 opened this issue Mar 1, 2025 · 2 comments
Open

[Bug] #6614

dab512 opened this issue Mar 1, 2025 · 2 comments
Labels
🐛 Bug Something isn't working | 缺陷 files 上传文件/知识库

Comments

@dab512
Copy link

dab512 commented Mar 1, 2025

📦 Platform

Other

📦 Deploymenet mode

server db(lobe-chat-database image)

📌 Version

Latest

💻 Operating System

Windows

🌐 Browser

Chrome

🐛 Bug Description

When I chunk files it is not chunk by semantic blocks. Every time I got equal length chunks.

But recursive text splitter or md splitter should split by headers or lines.

Maybe I miss some settings but I can’t found anything like this.

📷 Recurrence Steps

No response

🚦 Expected Behavior

No response

📝 Additional Information

No response

@dab512 dab512 added the 🐛 Bug Something isn't working | 缺陷 label Mar 1, 2025
@github-project-automation github-project-automation bot moved this to Roadmap - Chat 1.x in Lobe Chat Routine Mar 1, 2025
@lobehubbot
Copy link
Member

👀 @dab512

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。

Copy link

dosubot bot commented Mar 1, 2025

The issue you're experiencing with file chunking not splitting by semantic blocks might be related to the current limitations or settings in the lobe-chat framework. As of now, there isn't specific documentation or configuration settings available in the repository that control how files are chunked by semantic blocks or headers. The default behavior seems to result in equal-length chunks, which might not align with your expectations for semantic or header-based splitting.

There is a known limitation where the chatbot processes only a limited portion of chunks, which might affect how the data is handled [1]. Additionally, the logic for different file types and their loaders is implemented in the ChunkingLoader class, but specific settings for semantic block chunking are not detailed [2].

If you are looking for specific settings or configurations, they might not be explicitly defined in the current version. You might need to explore the codebase further or wait for future updates that address these limitations.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added the files 上传文件/知识库 label Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug Something isn't working | 缺陷 files 上传文件/知识库
Projects
Status: Roadmap - Chat 1.x
Development

No branches or pull requests

2 participants