Questions about v4 tiny on Edge TPU #20

hhk7734 · 2020-08-08T17:15:58Z

@hhk7734 Very interesting to see v4 tiny on Edge TPU. I have two questions

What ops where not mapped to the TPU?
Did you quantize (post-training/training-aware) to INT8?

Thanks

Originally posted by @ankandrew in #4 (comment)

paradigmn · 2020-12-10T10:18:58Z

My results are quite the opposite. I used the tiny_yolov4 with relu activation and weights provided by your repo. The input tensor has a size of (608, 608, 3). With the -a flag i get three subgraphs with 97 of 128 operations running on the tpu. Without the flag I have one subgraph with 42/128 operations mapped. This gives me the following inference times for 5 runs:

	Run 1	Run 2	Run 3	Run 4	Run 5
with -a flag	0.0722s	0.0650s	0.0642s	0.0629s	0.0687
without -a flag	0.2012s	0.1675s	0.1743s	0.1665s	0.1702s

hhk7734 · 2021-02-21T16:11:41Z

Model

EdgeTPU Ops: https://coral.ai/docs/edgetpu/models-intro/#supported-operations

yolov4-tiny

image -> conv2d -> ... -> conv2d -> yolo_0
                   ... -> conv2d -> yolo_1

yolo layer

input
x, y, w, h, o, c0, c1, ...

output
(scale * logistc(x) - 0.5 * (scale - 1) + cx) / grid_width,
(scale * logistc(y) - 0.5 * (scale - 1) + cy) / grid_height,
prior * exp(w) / net_width
prior * exp(h) / net_height
logistic(o)
logistic(c0)
logistic(c1)
...

prior == anchor == biases

EdgeTPU

In the current situation, not all layers are mapped to the TPU, because of SPLIT_V, EXP, ...
Even if you can map all of them, Too many layers have too much information loss at 8-bit precision.

We have to choose whether to change the model so that it can use TPU more or to give up some and run it on the CPU.
This can be a question of whether you choose speed or precision.

When using TPU, I removed all operations from yolo except logistic.

Converted model

Identity - x0, Identity_1 - logistic(x0)
Identity_2 - x1, Identity_3 - logistic(x1)

FPS test

Model only

input shape (1, 416, 416, 3)

In [9]: def model(x):
   ...:     yolo._interpreter.set_tensor(yolo._input_details["index"], x)
   ...:     yolo._interpreter.invoke()
   ...:         # [yolo0, yolo1, ...]
   ...:         # yolo == Dim(1, height, width, channels)
   ...:         # yolo_tpu == x, logistic(x)
   ...:
   ...:     return [
   ...:         yolo._interpreter.get_tensor(output_detail["index"])
   ...:         for output_detail in yolo._output_details
   ...:     ]
   ...:

In [10]: 100/timeit.timeit(lambda: model(x), number=100)
Out[10]: 31.288735650288498

model + scale_x_y + copy x[..., wh] to logistic(x)[..., wh]

tensorflow-yolov4/py_src/yolov4/tflite/__init__.py

Lines 97 to 129 in b67ca45

    
           def _predict(self, x: np.ndarray) -> List[np.ndarray]: 
        
               self._interpreter.set_tensor(self._input_details["index"], x) 
        
               self._interpreter.invoke() 
        
               # [yolo0, yolo1, ...] 
        
               # yolo == Dim(1, height, width, channels) 
        
               # yolo_tpu == x, logistic(x) 
        
               yolos = [ 
        
                   self._interpreter.get_tensor(output_detail["index"]) 
        
                   for output_detail in self._output_details 
        
               ] 
        
               if self._tpu: 
        
                   _yolos = [] 
        
                   if self._new_coords: 
        
                       for i, scale_x_y in enumerate(self._scale_x_y): 
        
                           _yolo_tpu_layer_new_coords( 
        
                               yolos[i], self._num_masks, scale_x_y 
        
                           ) 
        
                           _yolos.append(yolos[i]) 
        
                   else: 
        
                       for i, scale_x_y in enumerate(self._scale_x_y): 
        
                           _yolo_tpu_layer( 
        
                               yolos[2 * i], 
        
                               yolos[2 * i + 1], 
        
                               self._num_masks, 
        
                               scale_x_y, 
        
                           ) 
        
                           _yolos.append(yolos[2 * i + 1]) 
        
                   return _yolos 
        
               return yolos

input shape (1, 416, 416, 3)

In [14]: 100/timeit.timeit(lambda: yolo._predict(x), number=100)                                                         
Out[14]: 30.969583151128262

resize -> ... -> diounms

tensorflow-yolov4/py_src/yolov4/common/base_class.py

Lines 189 to 191 in b67ca45

    
           predict_start_time = time.time() 
        
           bboxes = self.predict(frame, prob_thresh=prob_thresh) 
        
           predict_exec_time = time.time() - predict_start_time

yolo.predict(x, prob_thresh) do resize image -> _predict -> diounms -> fit pred bbox to original image shape.

probability thresh 25%
image shape (640, 480, 3)
input shape (1, 416, 416, 3)

24 ~ 29 FPS depending on the number of objects found.

Plan

Train yolov4-tiny-relu and yolov4-tiny-relu-new_coords on darknet to get AP50 35%~ (coco val2017)

farhantandia · 2021-02-23T03:38:43Z

@hhk7734 what tpu you are using?

hhk7734 · 2021-02-23T06:28:33Z

@farhantandia Coral dev board

hhk7734 · 2021-06-21T23:38:17Z

#49 #86