Skip to content
This repository has been archived by the owner on Aug 11, 2023. It is now read-only.

segmentation fault at training if classes too few #81

Open
tino926 opened this issue Apr 20, 2021 · 2 comments
Open

segmentation fault at training if classes too few #81

tino926 opened this issue Apr 20, 2021 · 2 comments

Comments

@tino926
Copy link

tino926 commented Apr 20, 2021

Hi,
I followed https://wiki.loliot.net/docs/lang/python/libraries/yolov4/python-yolov4-edge-tpu/ to train a model with only one class.

If I use the original yolov4-tiny.cfg, the training works normally.

However, if I set classes<59 in the .cfg file, I received such error:

2021-04-20 14:54:05.484701: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.10.1
2021-04-20 14:54:05.601503: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
Segmentation fault (core dumped)

Maybe this is a bug?
This is the training script is used:

from tensorflow.keras import callbacks

from yolov4.tf import YOLOv4, YOLODataset, SaveWeightsCallback

import os

yolo = YOLOv4()

yolo.config.parse_names("high_top_4.names")
yolo.config.parse_cfg("yolov4-tiny-relu_1.cfg")

yolo.make_model()
# yolo.load_weights(
#     "yolov4-tiny.conv.29",
#     weights_type="yolo",
# )
yolo.summary(summary_type="yolo")

# for i in range(29):
#     yolo.model.get_layer(index=i).trainable = False

yolo.summary()

train_dataset = YOLODataset(
    config=yolo.config,
    dataset_list="train_high_top_4.txt",
    image_path_prefix="./",
    training=True,
)

val_dataset = YOLODataset(
    config=yolo.config,
    dataset_list="val_high_top_4.txt",
    image_path_prefix="./",
    training=False,
)

yolo.compile()

_callbacks = [
    callbacks.TerminateOnNaN(),
    callbacks.TensorBoard(
        log_dir="./logs",
        update_freq=200,
        histogram_freq=1,
    ),
    SaveWeightsCallback(
        yolo=yolo,
        dir_path="./trained",
        weights_type="yolo",
        step_per_save=2000,
    ),
]

yolo.fit(
    train_dataset,
    callbacks=_callbacks,
    validation_data=val_dataset,
    verbose=3,  # 3: print step info
)

and the "yolov4-tiny-relu_1.cfg" contains:

[net]
batch=32
width=416
height=416
channels=3

learning_rate=0.00261
burn_in=1000

max_batches=240000
policy=steps
steps=192000,216000
scales=.1,.1

mosaic=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=relu

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=relu

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=relu

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=relu

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=relu

[route]
layers=-1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=relu

[route]
layers=-6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=relu

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=relu

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=relu

[route]
layers=-1,-2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=relu

[route]
layers=-6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=relu

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=relu

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=relu

[route]
layers=-1,-2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=relu

[route]
layers=-6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=relu

##################################

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=relu

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=relu

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask=3,4,5
anchors=10,14, 23,27, 37,58, 81,82, 135,169, 344,319
num=6
scale_x_y=1.05
classes=1
iou_thresh=0.213
iou_loss=ciou
iou_normalizer=0.7
obj_normalizer=1.0
label_smooth_eps=0.01
cls_normalizer=1.0
nms_kind=greedynms
beta_nms=0.6

[route]
layers=-4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=relu

[upsample]
stride=2

[route]
layers=-1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=relu

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask=1,2,3
anchors=10,14, 23,27, 37,58, 81,82, 135,169, 344,319
num=6
scale_x_y=1.05
classes=1
iou_thresh=0.213
iou_loss=ciou
iou_normalizer=0.7
obj_normalizer=1.0
label_smooth_eps=0.01
cls_normalizer=1.0
nms_kind=greedynms
beta_nms=0.6
@hitch22
Copy link

hitch22 commented Apr 20, 2021

I think you will have better luck with darknet for the training part. I had issues with training being inaccurate, but using darknet directly to handle the part resolved my issues.

@hhk7734
Copy link
Owner

hhk7734 commented Apr 20, 2021

v3 training is not yet supported.
I agree with @hitch22. Train with darknet.

I've seen a lot of code implemented with tensorflow and pytorch, but I haven't yet seen a library that implements the training part of darknet well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants