Adding Differential Binarization model from PaddleOCR to Keras3 #1739

gowthamkpr · 2024-08-06T08:24:44Z

This adds the Differentiable Binarization model for scene text detection.

I implemented the architecture based on ResNet50_vd from PaddleOCR and ported the weights.

Demo colab: https://colab.research.google.com/gist/gowthamkpr/bd4a7f7742e92e66cfc57827052b8619/keras_paddleocr_v3.ipynb

mattdangerw · 2024-08-06T17:44:02Z

Let's split this up. Start with ResNetVD backbone?

Some notes...

Remove the aliases. One ResNetVDBackbone can handle all of these with different presets.
Conversion scripts as scripts not colabs.
Follow the local style for backbones as closely as possible. See some comments here Add VGG16 and VGG19 backbone #1737
Keep models a flat directory. No backbones/xx etc.
Add some tests.

divyashreepathihalli · 2024-09-25T19:42:33Z

@gowthamkpr is the PR ready for review?

divyashreepathihalli

Thanks for the PR! I have left a reorganization comment.

example for structuring the code - https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/sam

keras_nlp/src/models/diffbin/diffbin.py

keras_nlp/src/models/diffbin/losses.py

divyashreepathihalli · 2024-10-24T19:47:39Z

Hi @gowthamkpr! can you please refactor the code to KerasHub style?

Add a preprocessor flow
subclass image segementer model for the task class
add preset class
add standard test routines

divyashreepathihalli

Thanks Gowtham! left a few comments!

keras_hub/src/models/differential_binarization/differential_binarization_backbone.py

divyashreepathihalli · 2024-11-06T18:05:54Z

keras_hub/src/models/differential_binarization/differential_binarization_backbone_test.py

+                56,
+                256,
+            ),
+            run_mixed_precision_check=False,


does the mixed precision check pass?

No. I tried adding an explicit dtype argument, but the problem remains that the mixed precision check checks against each sublayer of the model. The ResNet backbone, which is instantiated separately, therefore has the wrong dtype.

keras_hub/src/models/differential_binarization/differential_binarization_test.py

keras_hub/src/models/differential_binarization/differential_binarization.py

divyashreepathihalli

Thanks for the PR Gowtham! Left a few comments. Can you please also add a demo colab in the PR description to verify the model is working before merging?

keras_hub/src/models/differential_binarization/differential_binarization_backbone.py

divyashreepathihalli · 2024-11-13T22:34:33Z

keras_hub/src/models/differential_binarization/differential_binarization_image_converter.py

+
+@keras_hub_export("keras_hub.layers.DifferentialBinarizationImageConverter")
+class DifferentialBinarizationImageConverter(ImageConverter):
+    backbone_cls = DifferentialBinarizationBackbone


there should be some resizing/rescaling ops here right?

Depends. Basically these image operations are implemented in the super class, ImageConverter, and can be used as depicted in the demo colab I've added in the PR description. Dedicated code in this class might make sense to resize to resolutions of multiples of 32, which the model requires. On the other hand, it might be confusing for the user if the masks that are predicted have different resolutions than the input.

you might want to look into Segformer for this. The output masks will need to be resized as well

divyashreepathihalli · 2024-11-13T22:35:38Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+
+
+@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
+class DifferentialBinarizationOCR(ImageSegmenter):


we need to add a new base class for ocr, I don't think ImageSegmenter is a good. one. Do you have a specific reason you chose to subclass ImageSegmenter?

Actually you suggested to subclass ImageSegmenter (here) if I understood correctly. Technically, the task is somewhat similar to segmentation tasks. We can of course add a separate base class for it to catch the semantic differences, but I would rather name it "scene text detection".

Yeah I think it is better to add a new base class for OCR.

Sure. I suggest to create an ImageTextDetector base class and include in this class the code (from the notebook) for translating the segmentation mask output into polygons, which are often needed in such applications. I'll try to get rid of the OpenCV and shapely dependencies in this code, I believe we don't have them in our requirements.
Does this work for you?

sounds good!

@gowthamkpr what's the high level use journey we'd expect this to be used with? Is this eventually going to be part of an ocr pipeline? What would the full ocr pipeline code look like? E.g.

ocr = keras_hub.OCR.from_preset("something_paddle_paddle") ocr.predict(image) # returns text and polygons for text?

Or does this look like a two stage thing where we first go image -> polygons, and second go image and polygons -> text?

I think the thing we are missing is the high level look of the user journey we are trying to cover. Once we have that figured out, the rest of this will become a lot clearer.

I would suggest exposing this two stage process to the user, i.e. image -> polygons and image+polygons -> text.

Differential Binarization is not able to perform the actual OCR, so we'll have to staple another model to it to perform the full pipeline. Additionally, it might be useful to have additional preprocessing steps in between, e.g., to rotate a region if a flat, but tilted bounding box has been detected. This is too much detail for hiding it from the user imho.

Maybe something like

# prediction text_detector = keras_hub.OCR.from_preset("diffbin") map_output = text_detector.predict(image) # get polygons polygons = text_detector.postprocess_to_polygons(map_output) # OCR (not yet available) ocr = keras_hub.OCR.from_preset("something") for polygon in polygons: image_patch = keras_hub.utils.extract_patch(image, polygon) text = ocr.predict(image_patch)

Alternatively, we could combine predict() and postprocess_to_polygons() to something like

# prediction text_detector = keras_hub.OCR.from_preset("diffbin") polygons = text_detector.predict_polygons(image) # OCR (not yet available) ocr = keras_hub.OCR.from_preset("something") for polygon in polygons: image_patch = keras_hub.utils.extract_patch(image, polygon) text = ocr.predict(image_patch)

Some rules for designing here.

We should aim for simple usage that stays close to regular Keras abstractions.

A task should definite a type of input and output data for any model that we build for that task. The goal is standardization, swap the model you are using without changing the high-level code using the model. So in this case, using OCR for two different types of inputs and outputs is not good.

Task class name is always SomeModelSomeTask(SomeTask). So if the task is called ImageTextExtractor the subclass is DiffBinImageTextExtractor. If the task is OCR the subclass is SomeModelOCR. I don't think we need exceptions to this.

We can bake preprocessing and postprocessing into the preprocessor layer, and leave it configurable there.

Questions.

What's the format of a polygon list. Do we keep it as a padded tensor?

What's the format of a patch? It's not rectangular right? Is it a image plus a mask?

What will training or fine-tuning look like? What's the format of the input data?

Seems like our main options are, splitting detection and extraction as you are doing so far...

# Inference. detector = keras_hub.TextDetector.from_preset("diff_bin_small") # Some for of masking here so these can stay square tensors? polygons = detector.predict(images) extractor = keras_hub.TextExtractor.from_preset("other_thing") texts = extractor.predict({ "images": images, "polygons": polygons, }) # Fine-tuning. # load some data. think this through! what is the format? do we need masking? images, polygons, texts = ... detector.fit(x=images, y=polygons) extractor.fit(x={"images": images, "polygons": polygons}, y=texts)

Or trying to bake this all into one.

# Inference. ocr = keras_hub.OCR.from_preset("diff_bin_and_other_thing") # Some for of masking here so these can stay square tensors? texts, polygons = ocr.predict(images) # Fine-tuning. All at once? Split?

I think the two stage process is fine if it's standard, but we might want to clean it up a bit.

Some rules for designing here.

* We should aim for simple usage that stays close to regular Keras abstractions. * A task should definite a type of input and output data for any model that we build for that task. The goal is standardization, swap the model you are using without changing the high-level code using the model. So in this case, using OCR for two different types of inputs and outputs is not good. * Task class name is always `SomeModelSomeTask(SomeTask)`. So if the task is called `ImageTextExtractor` the subclass is `DiffBinImageTextExtractor`. If the task is `OCR` the subclass is `SomeModelOCR`. I don't think we need exceptions to this.

Changed.

* What's the format of a polygon list. Do we keep it as a padded tensor?

At the moment I return a list(batch) of lists (image) of lists(polygon) of 2d tuples (points in a polygon). Considering this nested structure, and considering that the number vertices in a polygon might vary wildly, I think it makes sense to avoid forcing this structure into a tensor.

I'd have to check if it would even be possible to rewrite the contour finding algorithms to only use tensor operations. When using OpenCV as contour finder (which I currently optionally allow), it of course isn't.

* What's the format of a patch? It's not rectangular right? Is it a image plus a mask?

The text recognition model will accept a rectangular patch that only contains the text region. Within this patch, we could possibly mask out regions that are outside the polygon.

* What will training or fine-tuning look like? What's the format of the input data?

Currently the segmentation masks need to be provided as input for fine-tuning, just as the model outputs segmentation masks before postprocessing. Changing this to accept a polygon representation for training should be possible, but then I'd prefer to override train_step for doing the conversion. Not sure how well this can be performed in the computation gradph, and I'm also not sure how sophisticated Keras' support for ragged tensors is to represent this nested structure of points.

Seems like our main options are, splitting detection and extraction as you are doing so far...

# Inference. detector = keras_hub.TextDetector.from_preset("diff_bin_small") # Some for of masking here so these can stay square tensors? polygons = detector.predict(images) extractor = keras_hub.TextExtractor.from_preset("other_thing") texts = extractor.predict({ "images": images, "polygons": polygons, }) # Fine-tuning. # load some data. think this through! what is the format? do we need masking? images, polygons, texts = ... detector.fit(x=images, y=polygons) extractor.fit(x={"images": images, "polygons": polygons}, y=texts)

Or trying to bake this all into one.

# Inference. ocr = keras_hub.OCR.from_preset("diff_bin_and_other_thing") # Some for of masking here so these can stay square tensors? texts, polygons = ocr.predict(images) # Fine-tuning. All at once? Split?

As a user, I'd have a strong preference for the split approach. Both model types are pretty complex, and I'd like to know what comes out of the first one, possibly identifying problems, before passing it on to the other model, when using the models.

mattdangerw

Mostly questions and some style stuff.

I am curious what to do with the task here. Does this output a segmentation mask? Maybe most importantly, does this fit into a bigger picture of an OCR system? If so, how do we expect the whole thing to work?

keras_hub/src/models/differential_binarization/differential_binarization_backbone.py

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

mattdangerw · 2024-11-21T05:03:58Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+
+
+@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
+class DifferentialBinarizationOCR(ImageSegmenter):


What's the output of the task? Seems like this is not quite OCR right? Not text output?

Is this just a piece of what we would need for a full OCR setup?

mattdangerw · 2024-11-21T05:05:14Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+        backbone=backbone
+    )
+
+    detector(input_data)


What does the output of the task look like?

We get the probability map, threshold map and binary map output in the last dimension from the model. I've added some documentation here.

keras_hub/src/models/differential_binarization/differential_binarization_presets.py

keras_hub/src/models/differential_binarization/losses.py

mattdangerw · 2024-12-17T21:26:49Z

keras_hub/src/models/image_text_detector.py

+
+    `ImageTextDetector` tasks wrap a `keras_hub.models.Task` and
+    a `keras_hub.models.Preprocessor` to create a model that can be used for
+    image segmentation.


this is false right? This task will detect polygons in an image?

Yes, sure. Fixed

mattdangerw · 2024-12-17T21:27:25Z

keras_hub/src/models/image_text_detector.py

+    can be used to load a pre-trained config and weights.
+
+    Args:
+        detection_thresh: The value for thresholding predicted mask outputs.


Are these general concepts that will apply to other models? We should only leave common stuff here. If this would not apply to a non-diff bin model, we should leave it out.

It's pretty typical for such models to output a mask, which then needs to be translated into polygon form, but there are also models that follow different approaches. So as you prefer - we can move it into diffbin_textdetector.py or even create a separate file for the postproc.

mattdangerw · 2024-12-17T21:28:18Z

keras_hub/src/models/image_text_detector.py

+        )
+        return config
+
+    def postprocess_to_polygons(self, masks, contour_finder="simple"):


We might want to consider rolling this into the preprocessing layer. That is what we do for generation.

Not sure what you mean here. Could you please elaborate or provide pointers for how it is implemented for generation?

mattdangerw · 2024-12-17T21:59:11Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+
+
+@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
+class DifferentialBinarizationOCR(ImageSegmenter):


Some rules for designing here.

We should aim for simple usage that stays close to regular Keras abstractions.

A task should definite a type of input and output data for any model that we build for that task. The goal is standardization, swap the model you are using without changing the high-level code using the model. So in this case, using OCR for two different types of inputs and outputs is not good.

Task class name is always SomeModelSomeTask(SomeTask). So if the task is called ImageTextExtractor the subclass is DiffBinImageTextExtractor. If the task is OCR the subclass is SomeModelOCR. I don't think we need exceptions to this.

We can bake preprocessing and postprocessing into the preprocessor layer, and leave it configurable there.

Questions.

What's the format of a polygon list. Do we keep it as a padded tensor?

What's the format of a patch? It's not rectangular right? Is it a image plus a mask?

What will training or fine-tuning look like? What's the format of the input data?

Seems like our main options are, splitting detection and extraction as you are doing so far...

# Inference. detector = keras_hub.TextDetector.from_preset("diff_bin_small") # Some for of masking here so these can stay square tensors? polygons = detector.predict(images) extractor = keras_hub.TextExtractor.from_preset("other_thing") texts = extractor.predict({ "images": images, "polygons": polygons, }) # Fine-tuning. # load some data. think this through! what is the format? do we need masking? images, polygons, texts = ... detector.fit(x=images, y=polygons) extractor.fit(x={"images": images, "polygons": polygons}, y=texts)

Or trying to bake this all into one.

# Inference. ocr = keras_hub.OCR.from_preset("diff_bin_and_other_thing") # Some for of masking here so these can stay square tensors? texts, polygons = ocr.predict(images) # Fine-tuning. All at once? Split?

divyashreepathihalli

@gowthamkpr if you have addressed the comments can you please respond or resolve - so we can merge this PR

divyashreepathihalli · 2025-01-22T00:41:56Z

keras_hub/src/models/diffbin/diffbin_preprocessor.py

+
+
+@keras_hub_export("keras_hub.models.DiffBinPreprocessor")
+class DiffBinPreprocessor(ImageSegmenterPreprocessor):


if this model is going to be returning polygons - we might need to change the parent class

Yes. The functionality that we require here is pretty similar, though (resize the output mask in addition to the model's output). So basically, we'd have to copy ImageSegmenterPreprocessor.

mattdangerw changed the base branch from master to keras-hub August 6, 2024 17:36

mattdangerw requested a review from divyashreepathihalli August 6, 2024 20:48

divyashreepathihalli mentioned this pull request Aug 8, 2024

Add OCR model to Keras-nlp/keras hub branch #1727

Open

gowthamkpr mentioned this pull request Aug 9, 2024

Add the ResNet_vd backbone #1766

Merged

mattdangerw force-pushed the keras-hub branch 2 times, most recently from 1826dce to 753047d Compare September 11, 2024 00:01

gowthamkpr force-pushed the diffbin branch from 4dc7f78 to 3d06308 Compare September 13, 2024 13:44

mattdangerw force-pushed the keras-hub branch from 753047d to a5e5d8f Compare September 13, 2024 20:00

gowthamkpr force-pushed the diffbin branch from b9e7a3c to beaf088 Compare September 17, 2024 16:12

divyashreepathihalli requested a review from fchollet September 25, 2024 19:42

divyashreepathihalli reviewed Sep 26, 2024

View reviewed changes

keras_nlp/src/models/diffbin/diffbin.py Outdated Show resolved Hide resolved

keras_nlp/src/models/diffbin/diffbin.py Outdated Show resolved Hide resolved

keras_nlp/src/models/diffbin/losses.py Outdated Show resolved Hide resolved

gowthamkpr added 7 commits October 22, 2024 21:28

Add DifferentialBinarization model

49f6bb1

Added tests for DifferentialBinarization losses

5b4e011

Moved DifferentialBinarization to keras_hub

12ab81c

Renamed to differential_binarization.py

e68512c

Refactorings for DifferentialBinarization

0c3235c

More refactorings

6797231

Fix tests

4845b6a

gowthamkpr force-pushed the diffbin branch from beaf088 to 4845b6a Compare October 22, 2024 20:15

gowthamkpr changed the base branch from keras-hub to master October 22, 2024 20:24

gowthamkpr added 7 commits October 29, 2024 20:02

Add preprocessor and image converter

83edf9a

Add presets

f15b7b9

Run formatting script

392dbff

Impl additional tests

db70eb5

Fixed formatting

18fcbfb

Removed copyright statements

898235d

Fix tests, run api_gen.sh

eaec868

divyashreepathihalli reviewed Nov 6, 2024

View reviewed changes

gowthamkpr added 3 commits November 11, 2024 20:38

Addressed comments

9fb6e65

Merge with local branch

83b66ed

Fixed torch and jax tests

e4a334d

divyashreepathihalli requested changes Nov 13, 2024

View reviewed changes

Improved code readability

49d6f6d

mattdangerw reviewed Nov 21, 2024

View reviewed changes

gowthamkpr added 3 commits November 22, 2024 20:58

Improved/added docstrings

d96b899

Added ImageTextDetector task

2f27981

Run api_gen.sh

66afeb9

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Dec 10, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 10, 2024

gowthamkpr added 3 commits December 17, 2024 17:23

Fix tensor/array usage

1d91e76

Sync with master (new linter)

e999ad9

Shorten docstring

3fd3b6f

gowthamkpr added the kokoro:force-run Runs Tests on GPU label Dec 17, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 17, 2024

mattdangerw reviewed Dec 17, 2024

View reviewed changes

Renamed to DiffBin

af934f5

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jan 22, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jan 22, 2025

divyashreepathihalli reviewed Jan 22, 2025

View reviewed changes

gowthamkpr added 3 commits January 22, 2025 21:09

Fixed docstring

c111bd1

Rename DiffBinOCR -> DiffBinImageTextDetector

738786c

Merge branch 'keras-hub' into diffbin3

c06d6cb

gowthamkpr force-pushed the diffbin branch from 442ae86 to c06d6cb Compare January 27, 2025 15:15

gowthamkpr added 4 commits January 27, 2025 16:25

diffbin_ocr_test.py -> diffbin_textdetector_test.py

f047012

Added weight conversion script

26c16e4

Fixed formatting

464482c

Corrected a few comments

ca83afe



		@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
		class DifferentialBinarizationOCR(ImageSegmenter):



		@keras_hub_export("keras_hub.models.DiffBinPreprocessor")
		class DiffBinPreprocessor(ImageSegmenterPreprocessor):

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Are you sure you want to change the base?

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Conversation

gowthamkpr commented Aug 6, 2024 • edited Loading

mattdangerw commented Aug 6, 2024

divyashreepathihalli commented Sep 25, 2024

divyashreepathihalli left a comment

Choose a reason for hiding this comment

divyashreepathihalli commented Oct 24, 2024

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gowthamkpr Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gowthamkpr Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gowthamkpr Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gowthamkpr commented Aug 6, 2024 •

edited

Loading

gowthamkpr Dec 3, 2024 •

edited

Loading

gowthamkpr Jan 27, 2025 •

edited

Loading

gowthamkpr Jan 27, 2025 •

edited

Loading