Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Here is a solution on how tu run yolov4 tflite-int8 model #214

Open
murdockhou opened this issue Aug 23, 2020 · 35 comments
Open

Here is a solution on how tu run yolov4 tflite-int8 model #214

murdockhou opened this issue Aug 23, 2020 · 35 comments

Comments

@murdockhou
Copy link

Hi, thanks for your nicely work. As you say, Yolov4 and Yolov4-tiny int8 quantization have some issues. I will try to fix that. I have a solution to fix this, and below is the code:

First, when Convert darknet weights to tensorflow, we should not contain post process in saved_model:

# save_model.py

import tensorflow as tf
from absl import app, flags, logging
from absl.flags import FLAGS
from core.yolov4 import YOLO, decode, filter_boxes
import core.utils as utils
from core.config import cfg

flags.DEFINE_string('weights', './data/yolov4.weights', 'path to weights file')
flags.DEFINE_string('output', './checkpoints/yolov4-416', 'path to output')
flags.DEFINE_boolean('tiny', False, 'is yolo-tiny or not')
flags.DEFINE_integer('input_size', 416, 'define input size of export model')
flags.DEFINE_float('score_thres', 0.2, 'define score threshold')
flags.DEFINE_string('framework', 'tf', 'define what framework do you want to convert (tf, trt, tflite)')
flags.DEFINE_string('model', 'yolov4', 'yolov3 or yolov4')

def save_tf():
  STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)

  input_layer = tf.keras.layers.Input([FLAGS.input_size, FLAGS.input_size, 3])
  feature_maps = YOLO(input_layer, NUM_CLASS, FLAGS.model, FLAGS.tiny)
  # bbox_tensors = []
  # prob_tensors = []
  # if FLAGS.tiny:
  #   for i, fm in enumerate(feature_maps):
  #     if i == 0:
  #       output_tensors = decode(fm, FLAGS.input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     else:
  #       output_tensors = decode(fm, FLAGS.input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     bbox_tensors.append(output_tensors[0])
  #     prob_tensors.append(output_tensors[1])
  # else:
  #   for i, fm in enumerate(feature_maps):
  #     if i == 0:
  #       output_tensors = decode(fm, FLAGS.input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     elif i == 1:
  #       output_tensors = decode(fm, FLAGS.input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     else:
  #       output_tensors = decode(fm, FLAGS.input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     bbox_tensors.append(output_tensors[0])
  #     prob_tensors.append(output_tensors[1])
  # pred_bbox = tf.concat(bbox_tensors, axis=1)
  # pred_prob = tf.concat(prob_tensors, axis=1)
  # if FLAGS.framework == 'tflite':
  #   pred = (pred_bbox, pred_prob)
  # else:
  #   boxes, pred_conf = filter_boxes(pred_bbox, pred_prob, score_threshold=FLAGS.score_thres, input_shape=tf.constant([FLAGS.input_size, FLAGS.input_size]))
  #   pred = tf.concat([boxes, pred_conf], axis=-1)
  model = tf.keras.Model(input_layer, feature_maps)
  utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
  model.summary()
  model.save(FLAGS.output)

def main(_argv):
  save_tf()

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

Then, run this command line convert this to tflite-int8:

python convert_tflite.py --weights ./checkpoints/yolov4-416 --output ./checkpoints/yolov4-416-int8.tflite --quantize_mode int8 

I disable this param --dataset ./coco_dataset/coco/val207.txt since I do not have a coco dataset, also, you should to comment line 43

Finally, add some code in detect.py (actually they are pose process code we comment in the first step):

if FLAGS.framework == 'tflite':
        interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()
        print(input_details)
        print(output_details)
        interpreter.set_tensor(input_details[0]['index'], images_data)
        interpreter.invoke()
        pred = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
        # add post process code here
        bbox_tensors = []
        prob_tensors = []
        for i, fm in enumerate(pred):
            if i == 0:
                output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
            elif i == 1:
                output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
            else:
                output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
            bbox_tensors.append(output_tensors[0])
            prob_tensors.append(output_tensors[1])
        pred_bbox = tf.concat(bbox_tensors, axis=1)
        pred_prob = tf.concat(prob_tensors, axis=1)
        pred = (pred_bbox, pred_prob)

        if FLAGS.model == 'yolov3' and FLAGS.tiny == True:
            boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
        else:
            boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))

run python detect.py --weights ./checkpoints/yolov4-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite and we will get this result like this:

result-416-int8

Hope that can help you.

@murdockhou murdockhou changed the title Here is a solution on how tu run tflite-int8 model Here is a solution on how tu run yolov4 tflite-int8 model Aug 23, 2020
@alexanderswerdlow
Copy link

Were you able to fully compile the model for the edge TPU? I was only able to get 39 supported ops + 8 that will run on the CPU for Yolo v4 Tiny.

@murdockhou
Copy link
Author

@alexanderswerdlow I only tried this on cpu, don't know how it compatiable with TPU.

@hhk7734
Copy link
Contributor

hhk7734 commented Aug 26, 2020

@alexanderswerdlow
The total number of layers is only 47?

@alexanderswerdlow
Copy link

@hhk7734 Yep. I’ll double check tomorrow but I’m almost positive. The only significant change I made was leakyRelu -> Relu. I was able to get it running on the edge TPU but it runs quite horribly, compared to even yolo v3 tiny. The performance is fine with ~20ms per frame but the accuracy is very poor.

@hhk7734
Copy link
Contributor

hhk7734 commented Aug 26, 2020

@alexanderswerdlow
Can you share .tflite file? not _edgetpu.tflite
I want to compare it with the layer I created.

@alexanderswerdlow
Copy link

alexanderswerdlow commented Aug 27, 2020

Can't share the tflite but happy to share my config. It's basically the stock config just without leakyRelu, 416x416, the right filter value, etc.

yolo-tiny-v4-obj.txt

Edit if you're curious, here's my edgetpu_compiler output:

Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 599 ms.

Input model: v4.tflite
Input size: 5.72MiB
Output model: v4_edgetpu.tflite
Output size: 5.75MiB
On-chip memory used for caching model parameters: 4.82MiB
On-chip memory remaining for caching model parameters: 1.88MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 47
Operation log: v4_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 39
Number of operations that will run on CPU: 8

Operator                       Count      Status

PAD                            2          Mapped to Edge TPU
CONCATENATION                  1          More than one subgraph is not supported
CONCATENATION                  6          Mapped to Edge TPU
SPLIT                          3          Mapped to Edge TPU
MAX_POOL_2D                    3          Mapped to Edge TPU
DEQUANTIZE                     2          Operation is working on an unsupported data type
RESIZE_BILINEAR                1          Operation version not supported
QUANTIZE                       6          Mapped to Edge TPU
QUANTIZE                       1          Operation is otherwise supported, but not mapped due to some unspecified limitation
QUANTIZE                       1          More than one subgraph is not supported
CONV_2D                        19         Mapped to Edge TPU
CONV_2D                        2          More than one subgraph is not supported

@hhk7734
Copy link
Contributor

hhk7734 commented Aug 28, 2020

@alexanderswerdlow
If you have time, I want you to see the below link.
https://github.com/hhk7734/tensorflow-yolov4/blob/master/test/make_edgetpu_tflite.ipynb
hhk7734#20

using input_size=416, inference time is ~60ms.

@ItsMeTheBee
Copy link

@alexanderswerdlow how did you manage to convert the yolo model?
I´m using the same config file as you and I managed to run the model with detect.py after quantizing it with
python convert_tflite.py --weights ./checkpoints/yolov4 --output ./checkpoints/yolov4-int8.tflite --quantize_mode int8

But if I try to convert it with
edgetpu_compiler ./checkpoints/yolov4-int8.tflite

I get

Edge TPU Compiler version 14.1.317412892
Invalid model: ./checkpoints/yolov4-int8.tflite
Model not quantized

I thought that running convert_tflite.py with --quantize_mode int8 takes care of this so I don´t really get what I´m missing.

@JimBratsos
Copy link

@alexanderswerdlow @murdockhou I would like to know this too. I've been trying to convert the weights of YOLOv3 and v4 to tflite ( fully int 8 quantized ) using these helpful steps but I didn't get a result sadly. I get the exact same result as @ItsMeTheBee above. I am a beginner and I would like to use this for a project. Any guidance would be extremely appreciated

@4yougames
Copy link

4yougames commented Sep 5, 2020

Tel me plese, how to use this int8 model in Android\ios project?
I have error in my Android project:
Cannot copy from a TensorFlowLite tensor (Identity) with shape [1, 13, 13, 993] to a Java object with shape [1, 2535, 4].

With float16 and float32 models run is ok.

@JimBratsos
Copy link

No idea. It does not compile if used at the coral compiler, as it needs quantization even though I already did it

@ItsMeTheBee
Copy link

ItsMeTheBee commented Sep 17, 2020

Okay a little something about that:

I was able to compile the model after some modifications in core/common.py and convert_tflite.py
In core/common.py I changed leaky relu to relu.

In convert_tflite.py I tried using the old converter:

elif FLAGS.quantize_mode == 'int8':
    converter.experimental_new_converter = False
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
    converter.allow_custom_ops = True
    converter.representative_dataset = representative_data_gen

With this the edgetpu_compiler can convert 17 ops:

Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 340 ms.

Input model: checkpoints/yolov3-1800-int8.tflite
Input size: 8.42MiB
Output model: yolov3-1800-int8_edgetpu.tflite
Output size: 8.44MiB
On-chip memory used for caching model parameters: 1.84MiB
On-chip memory remaining for caching model parameters: 256.75KiB
Off-chip memory used for streaming uncached model parameters: 5.64MiB
Number of Edge TPU subgraphs: 1
Total number of operations: 135
Operation log: yolov3-1800-int8_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 17
Number of operations that will run on CPU: 118

Operator                       Count      Status

RESHAPE                        18         More than one subgraph is not supported
ADD                            6          More than one subgraph is not supported
MUL                            18         More than one subgraph is not supported
DEQUANTIZE                     8          Operation is working on an unsupported data type
CONV_2D                        2          More than one subgraph is not supported
CONV_2D                        11         Mapped to Edge TPU
CONCATENATION                  9          More than one subgraph is not supported
STRIDED_SLICE                  12         More than one subgraph is not supported
MAX_POOL_2D                    6          Mapped to Edge TPU
LOGISTIC                       12         More than one subgraph is not supported
QUANTIZE                       17         More than one subgraph is not supported
QUANTIZE                       7          Operation is otherwise supported, but not mapped due to some unspecified limitation
SPLIT_V                        2          Operation not supported
RESIZE_BILINEAR                1          Operation version not supported
EXP                            6          Operation is working on an unsupported data type

Now this was done for a tiny v3 model so it might not work for you, you can still give it a try though =)
I´ve been hoping to get a better conversion performance so I´d still be glad about any hint in the right direction.

@JimBratsos
Copy link

JimBratsos commented Sep 17, 2020

Wow great work there. Have you tested it in Coral? What are the FPS @ItsMeTheBee?

EDIT: I tried following your steps, but adding this line
converter.representative_dataset = representative_data_gen
caused an error about min/max tensors. Any idea for a workaround? I renamed all the leaky_relu instances to relu and then changed the tf.nn.leaky_relu(conv, alpha=something) to tf.nn.relu(conv) at common.py. Also I made the changes specified in convert_tflite.py, except the line I told you, to avoid the error. After that, I tried to compile the tflite model and the compiler stated the model was not quantized

@w840401
Copy link

w840401 commented Sep 29, 2020

HI @murdockhou
After I used your method, I successfully implemented the YOLOV4-int8 function and successfully detected the kite.jpg
But when using the yolov4-tiny-int8 version, the model was successfully converted, but an error occurred when using detect.py
The error is as follows:

錯誤1

Tensorflow Version:tensorflow-gpu=2.3.0 / 2.3.0rc0 (All test)
Code:
python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

@JimBratsos
Copy link

Hey @w840401 ,
Ive had this error too, it occurs because you're expecting more outputs than your model already has. Our model has 2 output branches while the code expects 3. I sadly do not know what the solution is but I'd like it too @murdockhou

@w840401
Copy link

w840401 commented Sep 30, 2020

I naively thought it was caused by a OS problem~
The same error occurred when the windows was changed to Linux 😆 please somebudy help us. @murdockhou
LINUX錯誤

@JimBratsos
Copy link

I tried using the coco dataset to convert to tflite, but I had empty min/max tensors. I really want to know how you got this to work @murdockhou . If as I said I omit the converter.representative_dataset = representative_data_gen line, I fail to quantize the model. The script runs, but if I compile with the coral model compiler, an error pops up saying The model is not quantized. Also, if I try to run the deepsort using this model, no bounding boxes are produced, which is weird. Could you help?

@w840401
Copy link

w840401 commented Oct 12, 2020

I tried using the coco dataset to convert to tflite, but I had empty min/max tensors. I really want to know how you got this to work @murdockhou . If as I said I omit the converter.representative_dataset = representative_data_gen line, I fail to quantize the model. The script runs, but if I compile with the coral model compiler, an error pops up saying The model is not quantized. Also, if I try to run the deepsort using this model, no bounding boxes are produced, which is weird. Could you help?

我成功解決您說的此問題,是因為val2017.txt 路徑問題 你重新執行coco_dataset 路徑對,應該就沒問題了

@JimBratsos
Copy link

Hey @w840401 ,
I tried changing the path of val2017.txt, but it still has the same problem.

RuntimeError: Max and min for dynamic tensors should be recorded during calibration: Failed for tensor StatefulPartitionedCall/functional_1/zero_padding2d/Pad
Empty min/max for tensor StatefulPartitionedCall/functional_1/zero_padding2d/Pad

Any ideas? My val2017.txt is at the same folder as the pictures, and points to them. For example

000000000139.jpg ..... for this specific picture.

Thanks for the help

@w840401
Copy link

w840401 commented Oct 14, 2020

1602642062232

@JimBratsos I don't think need at the same folder as the pictures
Maybe can't points !

Quantification still has problems after this
I did not choose to use tensorflow=2.3.0
I used tf-nighty~
Resource: https://pypi.org/project/tf-nightly/

My english is poor~ Sorry

@deep-rooteddz
Copy link

When using the yolov4-tiny-int8 version, and run the model with detect.py after quantizing it with

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

I got this error too.

output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
IndexError: list index out of range

Then, I added additional code in detect.py , and I can successfully detect image now. The same as detectvideo.py.

        # add post process code here    
        bbox_tensors = []
        prob_tensors = []
        if FLAGS.tiny:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[1], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[0], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])
        else:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                elif i == 1:
                    output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])

        pred_bbox = tf.concat(bbox_tensors, axis=1)
        pred_prob = tf.concat(prob_tensors, axis=1)
        pred = (pred_bbox, pred_prob)

Although the yolov4-tiny-int8 model is successfully detected, it runs slowly, takes about 300ms for each image on CPU. Does anyone have suggestions about this?

@zero90169
Copy link

@deep-rooteddz Thanks for sharing. I follow your step to modify my detect.py file and run, but the results went wrongs. It didn't get the similar results as the @murdockhou. Would you mind to show your inference results on kit.jpg? tks.

@deep-rooteddz
Copy link

deep-rooteddz commented Jan 9, 2021

@deep-rooteddz Thanks for sharing. I follow your step to modify my detect.py file and run, but the results went wrongs. It didn't get the similar results as the @murdockhou. Would you mind to show your inference results on kit.jpg? tks.

@zero90169
I found that I can run @hunglc007 's code without problems, maybe he has fixed it. Of course, you can continue to use @murdockhou 's method. Using yolov4-tiny-int8.tflite to detect kite.jpg, the accuracy is the same.

If you don’t get the similar result, check config.py to confirm

YOLO.CLASSES = "./data/classes/coco.names"
TRAIN.ANNOT_PATH = "./data/dataset/val2017.txt"
TEST.ANNOT_PATH = "./data/dataset/val2017.txt"

And you may need to modify YOLO.ANCHORS_TINY to make it the same size as yolov4-tiny.weight(But not necessary, it will affect boundingbox size). e.g.

__C.YOLO.ANCHORS_TINY         = [10,14, 23,27, 37,58, 81,82, 135,169, 344,319]
# __C.YOLO.ANCHORS_TINY         = [23,27, 37,58, 81,82, 81,82, 135,169, 344,319]

Here is my yolov4-tiny-int8.tflite results on kite.jpg.
result

@zero90169
Copy link

zero90169 commented Jan 11, 2021

@deep-rooteddz Thanks for your kindly and quickly reply. I expected to get as many boxes as the kite.jpg which is shown by @murdockhou but my int8.tflite inference result on kite.jpg is more similar to yours. Maybe it is caused by "yolov4" and "yolov4-tiny". I will try the @hunglc007 's code again!

@ybloch
Copy link

ybloch commented Feb 9, 2021

@deep-rooteddz Thanks. Your solution works for me, like you I experience significant slowness in relation to FP32, it is around 300MS instead of 100MS inference time, have you been able to resolve this? I use tiny yolo v4

@farhantandia
Copy link

farhantandia commented Feb 16, 2021

When using the yolov4-tiny-int8 version, and run the model with detect.py after quantizing it with

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

I got this error too.

output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
IndexError: list index out of range

Then, I added additional code in detect.py , and I can successfully detect image now. The same as detectvideo.py.

        # add post process code here    
        bbox_tensors = []
        prob_tensors = []
        if FLAGS.tiny:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[1], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[0], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])
        else:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                elif i == 1:
                    output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])

        pred_bbox = tf.concat(bbox_tensors, axis=1)
        pred_prob = tf.concat(prob_tensors, axis=1)
        pred = (pred_bbox, pred_prob)

Although the yolov4-tiny-int8 model is successfully detected, it runs slowly, takes about 300ms for each image on CPU. Does anyone have suggestions about this?

thanks, it works in my raspi 4, but why the int 8 model run slower than fp16 or 32?? I still not using any accelerator, full cpu.
for int8 i got 1.3fps and fp16,32 almost similar at 2.3fps

@mhyeonsoo
Copy link

mhyeonsoo commented Apr 6, 2021

did anyone solve this issue?
my int8 converted model does not work with the script,

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

and also not working in android app too (fp16 converted model works though)

when I tested with int8 converted model, this error message is shown

ValueError: Shapes (1, 13, 13) and (1, 26, 26) are incompatible

@mipsan
Copy link

mipsan commented May 18, 2021

@deep-rooteddz

Thanks, it works !

@KuoEuran
Copy link

@murdockhou @deep-rooteddz @mipsan
Hi, everyone
I am using @hunglc007 's orginal detect.py to detect my own yolov4-tiny-int8.tflite.
Unfortunately, it shows the same error as @mhyeonsoo
image
My command: python detect.py --weights ./checkpoints/yolov4-tiny-224-int8.tflite --size 224 --model yolov4 --image ./data/person.jpg --framework tflite --tiny
and my environment: python 3.8 tf-nightly 2.6.0
Please give me some advice about the error, thks.

@Hanseyyyy
Copy link

did anyone solve this issue? my int8 converted model does not work with the script,

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

and also not working in android app too (fp16 converted model works though)

when I tested with int8 converted model, this error message is shown

ValueError: Shapes (1, 13, 13) and (1, 26, 26) are incompatible

Have you solved it? I have encountered the same problem. Looking forward to your reply, thanks!

@juandoso
Copy link

juandoso commented May 22, 2022

I'm also having problems trying to convert a YOLOv4 model to full int8 quantization, and I don't have an answer yet but I found a bug in the convert_tflite.py script that will always result in a model that is not fully int8 quantized. The lines:

  elif FLAGS.quantize_mode == 'int8':
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]

The second assignment to converter.target_spec.supported_ops supersedes the first that contains the int8 flag

@UcefMountacer
Copy link

did anyone solve this issue? my int8 converted model does not work with the script,

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

and also not working in android app too (fp16 converted model works though)
when I tested with int8 converted model, this error message is shown

ValueError: Shapes (1, 13, 13) and (1, 26, 26) are incompatible

Have you solved it? I have encountered the same problem. Looking forward to your reply, thanks!

The reason is that you should add the post processing after invoking the model, since this was deleted from save_model.py script.

@UcefMountacer
Copy link

UcefMountacer commented Aug 26, 2022

Hello,

I have tried to run the int8 model on the raspberry pi4. I got this error :

RuntimeError: Select TensorFlow op(s), included in the given model, is(are) not supported by this interpreter. Make sure you apply/link the Flex delegate before inference. For the Android, it can be resolved by adding "org.tensorflow:tensorflow-lite-select-tf-ops" dependency. See instructions: https://www.tensorflow.org/lite/guide/ops_selectNode number 79 (FlexFusedBatchNormV3) failed to prepare.

EDIT:

The same with float 16

@UcefMountacer
Copy link

thanks, it works in my raspi 4, but why the int 8 model run slower than fp16 or 32?? I still not using any accelerator, full cpu. for int8 i got 1.3fps and fp16,32 almost similar at 2.3fps

Hello,
Can you confirm that the float16 model is faster on the raspberry pi ? Thanks

@ashray21
Copy link

ashray21 commented Feb 17, 2023

Hello,

While running this
python convert_tflite.py --weights ./checkpoints/yolov4-416 --output ./checkpoints/yolov4-416-int8.tflite --quantize_mode int8

I'm getting the following error

[{'name': 'input_1', 'index': 0, 'shape': array([ 1, 416, 416, 3], dtype=int32), 'shape_signature': array([ -1, 416, 416, 3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
[{'name': 'Identity', 'index': 937, 'shape': array([1, 1, 1], dtype=int32), 'shape_signature': array([-1, -1, -1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
Traceback (most recent call last):
File "convert_tflite.py", line 76, in
app.run(main)
File "/home/ashray/lib/python3.8/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ashray/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "convert_tflite.py", line 72, in main
demo()
File "convert_tflite.py", line 66, in demo
output_data = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
File "convert_tflite.py", line 66, in
output_data = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
File "/home/ashray/lib/python3.8/site-packages/tensorflow/lite/python/interpreter.py", line 459, in get_tensor
return self._interpreter.GetTensor(tensor_index)
ValueError: Invalid tensor size.

Can you please comment how to solve this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests