Skip to content
This repository has been archived by the owner on Aug 11, 2023. It is now read-only.

Questions about v4 tiny on Edge TPU #20

Open
hhk7734 opened this issue Aug 8, 2020 · 46 comments
Open

Questions about v4 tiny on Edge TPU #20

hhk7734 opened this issue Aug 8, 2020 · 46 comments
Labels

Comments

@hhk7734
Copy link
Owner

hhk7734 commented Aug 8, 2020

@hhk7734 Very interesting to see v4 tiny on Edge TPU. I have two questions

  1. What ops where not mapped to the TPU?
  2. Did you quantize (post-training/training-aware) to INT8?

Thanks

Originally posted by @ankandrew in #4 (comment)

@hhk7734

This comment has been minimized.

@ankandrew

This comment has been minimized.

@hhk7734

This comment has been minimized.

@hhk7734

This comment has been minimized.

@hhk7734

This comment has been minimized.

@hhk7734

This comment has been minimized.

@ankandrew

This comment has been minimized.

@hhk7734

This comment has been minimized.

@hhk7734 hhk7734 added the ❤️ Happy feedback label Aug 11, 2020
@agjunyent

This comment has been minimized.

@hhk7734

This comment has been minimized.

@tgx-lim

This comment has been minimized.

@hhk7734

This comment has been minimized.

@ankandrew

This comment has been minimized.

@albertfaromatics

This comment has been minimized.

@hhk7734

This comment has been minimized.

@IlkayW

This comment has been minimized.

@raz-SX

This comment has been minimized.

@simondenhaene

This comment has been minimized.

@JimBratsos

This comment has been minimized.

@agjunyent

This comment has been minimized.

@JimBratsos

This comment has been minimized.

@agjunyent

This comment has been minimized.

@ownbee

This comment has been minimized.

@agjunyent

This comment has been minimized.

@JimBratsos

This comment has been minimized.

@ichakroun

This comment has been minimized.

@itsmasabdi

This comment has been minimized.

@hhk7734

This comment has been minimized.

@paradigmn

This comment has been minimized.

@hhk7734

This comment has been minimized.

@paradigmn
Copy link

My results are quite the opposite. I used the tiny_yolov4 with relu activation and weights provided by your repo. The input tensor has a size of (608, 608, 3). With the -a flag i get three subgraphs with 97 of 128 operations running on the tpu. Without the flag I have one subgraph with 42/128 operations mapped. This gives me the following inference times for 5 runs:

Run 1 Run 2 Run 3 Run 4 Run 5
with -a flag 0.0722s 0.0650s 0.0642s 0.0629s 0.0687
without -a flag 0.2012s 0.1675s 0.1743s 0.1665s 0.1702s

@hhk7734

This comment has been minimized.

@paradigmn

This comment has been minimized.

@farhantandia

This comment has been minimized.

@hhk7734

This comment has been minimized.

@farhantandia

This comment has been minimized.

@hhk7734

This comment has been minimized.

@farhantandia

This comment has been minimized.

@hhk7734

This comment has been minimized.

@farhantandia

This comment has been minimized.

@farhantandia

This comment has been minimized.

@hhk7734

This comment has been minimized.

@hhk7734
Copy link
Owner Author

hhk7734 commented Feb 21, 2021

Model

EdgeTPU Ops: https://coral.ai/docs/edgetpu/models-intro/#supported-operations

yolov4-tiny

image -> conv2d -> ... -> conv2d -> yolo_0
                   ... -> conv2d -> yolo_1

yolo layer

input
x, y, w, h, o, c0, c1, ...

output
(scale * logistc(x) - 0.5 * (scale - 1) + cx) / grid_width,
(scale * logistc(y) - 0.5 * (scale - 1) + cy) / grid_height,
prior * exp(w) / net_width
prior * exp(h) / net_height
logistic(o)
logistic(c0)
logistic(c1)
...

prior == anchor == biases

EdgeTPU

In the current situation, not all layers are mapped to the TPU, because of SPLIT_V, EXP, ...
Even if you can map all of them, Too many layers have too much information loss at 8-bit precision.

We have to choose whether to change the model so that it can use TPU more or to give up some and run it on the CPU.
This can be a question of whether you choose speed or precision.

When using TPU, I removed all operations from yolo except logistic.

Converted model

model

Identity - x0, Identity_1 - logistic(x0)
Identity_2 - x1, Identity_3 - logistic(x1)

FPS test

Model only

  • input shape (1, 416, 416, 3)
In [9]: def model(x):
   ...:     yolo._interpreter.set_tensor(yolo._input_details["index"], x)
   ...:     yolo._interpreter.invoke()
   ...:         # [yolo0, yolo1, ...]
   ...:         # yolo == Dim(1, height, width, channels)
   ...:         # yolo_tpu == x, logistic(x)
   ...:
   ...:     return [
   ...:         yolo._interpreter.get_tensor(output_detail["index"])
   ...:         for output_detail in yolo._output_details
   ...:     ]
   ...:

In [10]: 100/timeit.timeit(lambda: model(x), number=100)
Out[10]: 31.288735650288498

model + scale_x_y + copy x[..., wh] to logistic(x)[..., wh]

def _predict(self, x: np.ndarray) -> List[np.ndarray]:
self._interpreter.set_tensor(self._input_details["index"], x)
self._interpreter.invoke()
# [yolo0, yolo1, ...]
# yolo == Dim(1, height, width, channels)
# yolo_tpu == x, logistic(x)
yolos = [
self._interpreter.get_tensor(output_detail["index"])
for output_detail in self._output_details
]
if self._tpu:
_yolos = []
if self._new_coords:
for i, scale_x_y in enumerate(self._scale_x_y):
_yolo_tpu_layer_new_coords(
yolos[i], self._num_masks, scale_x_y
)
_yolos.append(yolos[i])
else:
for i, scale_x_y in enumerate(self._scale_x_y):
_yolo_tpu_layer(
yolos[2 * i],
yolos[2 * i + 1],
self._num_masks,
scale_x_y,
)
_yolos.append(yolos[2 * i + 1])
return _yolos
return yolos

  • input shape (1, 416, 416, 3)
In [14]: 100/timeit.timeit(lambda: yolo._predict(x), number=100)                                                         
Out[14]: 30.969583151128262

resize -> ... -> diounms

predict_start_time = time.time()
bboxes = self.predict(frame, prob_thresh=prob_thresh)
predict_exec_time = time.time() - predict_start_time

yolo.predict(x, prob_thresh) do resize image -> _predict -> diounms -> fit pred bbox to original image shape.

  • probability thresh 25%
  • image shape (640, 480, 3)
  • input shape (1, 416, 416, 3)

24 ~ 29 FPS depending on the number of objects found.

Plan

  • Train yolov4-tiny-relu and yolov4-tiny-relu-new_coords on darknet to get AP50 35%~ (coco val2017)

@farhantandia
Copy link

@hhk7734 what tpu you are using?

@hhk7734
Copy link
Owner Author

hhk7734 commented Feb 23, 2021

@farhantandia Coral dev board

@hhk7734
Copy link
Owner Author

hhk7734 commented Jun 21, 2021

#49 #86

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests