Does anyone have experience in optimizing the conversion of Chinese PDF to Markdown format? I can achieve the conversion, but the effect is not satisfactory, and it is not even as accurate as using OCR directly. #758

RuiZheZhangQ · 2025-01-16T08:56:50Z

Question

...
Does anyone have experience in optimizing the conversion of Chinese PDF to Markdown format? I can achieve the conversion, but the effect is not satisfactory, and it is not even as accurate as using OCR directly.

use tesseract

Python 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] on linux

PeterStaar-IBM · 2025-01-28T07:38:39Z

@RuiZheZhangQ Do you have an example pdf to reproduce?

RuiZheZhangQ added the question Further information is requested label Jan 16, 2025

PeterStaar-IBM added the pdf label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does anyone have experience in optimizing the conversion of Chinese PDF to Markdown format? I can achieve the conversion, but the effect is not satisfactory, and it is not even as accurate as using OCR directly. #758

Does anyone have experience in optimizing the conversion of Chinese PDF to Markdown format? I can achieve the conversion, but the effect is not satisfactory, and it is not even as accurate as using OCR directly. #758

RuiZheZhangQ commented Jan 16, 2025

PeterStaar-IBM commented Jan 28, 2025

Does anyone have experience in optimizing the conversion of Chinese PDF to Markdown format? I can achieve the conversion, but the effect is not satisfactory, and it is not even as accurate as using OCR directly. #758

Does anyone have experience in optimizing the conversion of Chinese PDF to Markdown format? I can achieve the conversion, but the effect is not satisfactory, and it is not even as accurate as using OCR directly. #758

Comments

RuiZheZhangQ commented Jan 16, 2025

Question

PeterStaar-IBM commented Jan 28, 2025