-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR dataset results #46
Comments
I’m very sorry, but we couldn’t find the OCR results after conducting the experiments.
Could you please explain this in more detail? I’m having trouble understanding it. |
Hi, thanks for the response. When eval.sh script is called it produces embeddings.corpus, embeddings.query, test_result.log and test..rec files |
Let me try to find it~ |
Not too important, if there are multiple then one with the best results. |
We extract text page by page. |
Could you please share results from using OCR text extracting approaches on the datasets shared on HF? I am having trouble with PPOCR and would like to replicate results with text for RAG.
What would work best for me is the output of calling eval on ChartQA, InfoVQA, MP-DocVQA and SlideVQA datasets but with their text content being used for RAG instead of images (model used and OCR method does not matter too much, would prefer the one with best results).
The text was updated successfully, but these errors were encountered: