Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get overlapped BoundingBox with RIL_SYMBOL #4384

Open
benjerming opened this issue Jan 22, 2025 · 0 comments
Open

get overlapped BoundingBox with RIL_SYMBOL #4384

benjerming opened this issue Jan 22, 2025 · 0 comments

Comments

@benjerming
Copy link

benjerming commented Jan 22, 2025

Current Behavior

Hello,
I just iterate RIL_SYMBOL and got 2 problems, and I report them to find someone's help, thans very much!!
Q1. some SYMBOL's BoundingBox looks like overlapped.
Q2. some SYMBOL's BoundingBox is too high.

the output like this:

  c    l    t    r    b

...
 智  330  330  352  353
 慧  357  329  391  354
 物  392  331  414  354
 流  414  329  443  353
 地  443  330  459  353
 图  466  331  484  354
 计  891  330  914  354
 算  918  329 1019  354
 机  951  325  977  366
 视  976  329 1005  354
 觉 1004  329 1019  354
 智   81  488  119  512
 能  121  488  132  512
 硬  144  489  168  512
 件  169  489  185  512
 数  615  488  639  512
 字  642  488  676  513
 化  676  489  692  512

here I can see:
Q1: 算 918 329 1019 354, cover 机 951 325 977 366 completely.
Q2: 机 951 325 977 366, its height is 41, but on real, its height is same with others height is only about 24 pixels.

here is the commandline:
./main ./demo.png ./tessdata chi_sim

here is the code:

#include <memory>
#include <string>

#include <stdio.h>
#include <tesseract/capi.h>
#include <leptonica/allheaders.h>

static int ocr(const std::string &image_path, const std::string &tessdata, const std::string &lang)
{
    auto api = std::shared_ptr<TessBaseAPI>(
        TessBaseAPICreate(), 
        [](TessBaseAPI *p) { TessBaseAPIDelete(p); }
    );
    if (api->Init(tessdata.c_str(), lang.c_str()))
    {
        fprintf(stderr, "Could not initialize tesseract.\n");
        return -1;
    }
    auto image = std::shared_ptr<Pix>(
        pixRead(image_path.c_str()),
        [](Pix *p) { pixDestroy(&p); }
    );

    api->SetImage(image.get());

    if (api->Recognize(nullptr))
    {
        fprintf(stderr, "Recognize failed\n");
        return -1;
    }

    auto res_it = std::shared_ptr<tesseract::ResultIterator>(api->GetIterator());


    fprintf(stderr, "%4s %4s %4s %4s %4s\n", "c", "l", "t", "r", "b");

    while (!res_it->Empty(tesseract::RIL_TEXTLINE))
    {
        if (res_it->Empty(tesseract::RIL_WORD))
        {
            res_it->Next(tesseract::RIL_WORD);
            continue;
        }

        int line_bbox[4], word_bbox[4];
        int line_conf, word_conf;
        res_it->BoundingBox(tesseract::RIL_TEXTLINE, &line_bbox[0], &line_bbox[1], &line_bbox[2], &line_bbox[3]);
        res_it->BoundingBox(tesseract::RIL_WORD, &word_bbox[0], &word_bbox[1], &word_bbox[2], &word_bbox[3]);
        line_conf = res_it->Confidence(tesseract::RIL_TEXTLINE);
        word_conf = res_it->Confidence(tesseract::RIL_WORD);

        // auto line_box = std::shared_ptr<Box>(
        //     boxCreate(line_bbox[0], line_bbox[1], line_bbox[2] - line_bbox[0], line_bbox[3] - line_bbox[1]),
        //     [](Box *p){ boxDestroy(&p);}
        // );
        // pixRenderBoxArb(image.get(), line_box.get(), 1, 0xff, 0xff, 0);

        // auto word_box = std::shared_ptr<Box>(
        //     boxCreate(word_bbox[0], word_bbox[1], word_bbox[2] - word_bbox[0], word_bbox[3] - word_bbox[1]),
        //     [](Box *p){ boxDestroy(&p);}
        // );
        // pixRenderBoxArb(image.get(), word_box.get(), 1, 0, 0xff, 0);

        do
        {
            int char_bbox[4];
            res_it->BoundingBox(tesseract::RIL_SYMBOL, &char_bbox[0], &char_bbox[1], &char_bbox[2], &char_bbox[3]);
            auto text = std::shared_ptr<char>(res_it->GetUTF8Text(tesseract::RIL_SYMBOL));

            fprintf(stderr, "%4s %4d %4d %4d %4d\n",
                    text.get(), char_bbox[0], char_bbox[1], char_bbox[2], char_bbox[3]);

            auto box = std::shared_ptr<Box>(
                boxCreate(char_bbox[0], char_bbox[1], char_bbox[2] - char_bbox[0], char_bbox[3] - char_bbox[1]),
                [](Box *p){ boxDestroy(&p);}
            );
            pixRenderBoxArb(image.get(), box.get(), 1, 0, 0, 0xff);

            res_it->Next(tesseract::RIL_SYMBOL);
        } while (!res_it->Empty(tesseract::RIL_BLOCK) && !res_it->IsAtBeginningOf(tesseract::RIL_WORD));
    }

    const auto ocr_box_image_path = image_path + ".ocr_box.png";
    if (pixWrite(ocr_box_image_path.c_str(), image.get(), IFF_PNG))
    {
        fprintf(stderr, "Failed to write ocr box image to %s\n", ocr_box_image_path.c_str());
        return -1;
    }

    return 0;
}

int main(int argc, char **argv)
{
    if (argc != 4)
    {
        fprintf(stderr, "Usage: %s <image_path> <tessdata_path> <lang>\n", argv[0]);   
        return 1;
    }
    return ocr(argv[1], argv[2], argv[3]);
}

and I upload the origin image, and draw BoundingBox image to compare:

Image
Image

Expected Behavior

except the BoundingBox gives values with a smaller error

Suggested Fix

There is no suggested fix, I report to find some help, thanks!!

tesseract -v

tesseract 5.5.0
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.3
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.1 OpenSSL/3.4.0 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0 nghttp2/1.64.0 nghttp3/1.6.0

Operating System

No response

Other Operating System

Manjaro Linux x86_64

uname -a

Linux assasin-21d8a009cd 6.11.11-1-MANJARO #1 SMP PREEMPT_DYNAMIC Thu, 05 Dec 2024 16:26:44 +0000 x86_64 GNU/Linux

Compiler

gcc (GCC) 14.2.1 20240910

CPU

CPU: 12th Gen Intel(R) Core(TM) i7-12700H (20) @ 4.70 GHz

Virtualization / Containers

none

Other Information

OS: Manjaro Linux x86_64
Kernel: Linux 6.11.11-1-MANJARO
Shell: zsh 5.9
Display (BOE098E): 1920x1080 @ 60 Hz in 16" [Built-in]
DE: KDE Plasma 6.2.4
WM: KWin (Wayland)
WM Theme: Breeze
Terminal: konsole 24.8.3
Terminal Font: Hack Nerd Font Mono (11pt)
CPU: 12th Gen Intel(R) Core(TM) i7-12700H (20) @ 4.70 GHz
GPU 1: NVIDIA T600 Laptop GPU
GPU 2: Intel Alder Lake-P Integrated Graphics Controller @ 1.40 GHz [Integrated]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants