A YOLOv7 Computer Vision Model with Inference API that:
- Preprocesses Synthetic Data (Images and Labels)
- Fine-tunes a YOLOv7 model on Custom Synthetic Data
- Classifes material handling equipment (jacks, dollies, and bins)
- Working on Dockerizing the ONNX model API
- This repo focused on 2 approaches, the first one was fine-tuning the YOLOv7 model and the second one had the goal of implementing the YOLOv3 model from scratch and training it.
- The purpose of this repo is only for practice. The code in the repo offers a solid foundation for CNN and computer vision models, fine tuning them on custom, preprocessed data and inferring from them.
- The data files were not included in this repo due to their size. The data was stored as follows:
BMW_data
├── train.csv
├── test.csv
├── images
│ ├── train
│ ├── test
│ └── val
└── labels
├── train
├── test
└── val
- In order to train our YOLOv7 model, we need to convert the data from json to txt YOLO format
- The yolo format is as follows: class_label x_center y_center width height
- This has been handled in the
data_pipeline.ipynb
notebook. - Additionally, the data has been visualized for ground truth reference.
- In some instances, noisy data induced some of the normalized bounding box dimensions to be larger than 1. This means that in the original data, the coordinates for the bounding boxes were larger than the size / boundaries of the said image.
- To address this issue, a condition was added in the codee to clamp the data points with noisy data, since they could affect the process during data augmentation and model training.
- The faulty data accounted for < 5% of the overall data (136 instances).
- The data was then augmented using an Albumentations transformation pipeline
- After formatting our data and augmenting it, we split the data into train and val
- Although some references suggest not to include augmented data in validation, in our scenario augmented data can help the model better by exposing it to a wider variety of examples during validation
- The train val split was handled in
data_pipeline.ipynb
- The split ratio was chosen to be 20% of training data for validation
- While cleaning the data, we realized that the class labels were:
- unordered, which may cause issues when encoding while training the model, since the model might infer that there are other classes in between. Therefore, additional classes wuld have to be passed to the model which is inefficient and unnecessary, and could lead to inconveniences.
- different classes between train (and val) and test. In fact, the classes 4, 5 and 7 were used for
dollies
,bins
andjacks
respectively; while in test, the labels were 4, 11 and 9 respectively.
- To address this issue, a method parsing the entire txt dataset insured that the classes for train and test were reformatted as follows:
dollies
: 0bins
: 1jacks
: 2
- This insured data consistency in the whole dataset.
- The first step was cloning the repo https://github.com/WongKinYiu/yolov7/
cd /yolov7
andpip install requirements.txt
- In addition, we downloaded pretrained yolov7 frozen weights in order to perform fine tuning on top of them
- We made sure to create the
bmw_data.yaml
file for the yolov7 model to look for. This file will be the checkpoint and reference for the location of train, test, val for images and labels. - To train the model:
!python train.py --img-size 640 --cfg cfg/training/yolov7.yaml --hyp data/hyp.scratch.custom.yaml --batch 4 --epochs 50 --data data/bmw_data.yaml --weights yolov7.pt --workers 24 --name yolo_bmw_det2
- Training the model on images of size 3 * 640 * 640, configuration yolov7.yaml (default), custom hyperparameter that will later be tuned, 50 epochs
- The new weights will be saved under the name
yolo_bmw_det2
in theruns
folder
- While training the model, it is important to always monitor the performance and evolution of critical parameters such as precision, recall, mAP score, loss.
- Tensorboard was used in order to monitor these parameters, as shown below:
- Moreover, training was performed locally using Nvidia GEForce RTX 2060
- After training for 50 epochs, the loss functions as well as precision recall and f1 scores all indicated that a plateau was reached. This could hint at overfitting, as well as unnecessary computing beyond which accurarcy stays the same.
- After hyperparameter tuning, most notably reducing epochs to 30 instead of 50, training was performed again
- After training was finished, we obtained the results below:
- The
yolo_bmw_det3
model weights were exported as onnx:!python export.py --weights runs/train/yolo_bmw_det3/weights/best.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640
- Netro .svg and .png graphs can be found here
- The purpose of working on this model was to acquire a deeper understanding of the fundementals of backbone transfer learning.
- ResNet acts as a new backbone network architecture, which is larger and more accurate than starting from scratch. Below are some figures from the training phase:
- In order to take this project to the next level, we implemented a FastAPI interfacing framework that interacts with onnx inferrence sessions in order to provide a more convenient and scalable way to interact with our developed models.
- Even though the APIs are under development, they are a proof of scalability and consitute a stable foundation for later improvements.
- To activate runtime, navigate to api folder:
cd ./api
from root dir - Run the command:
uvicorn views:app --reload
- Open the localhost port that shows on the terminal, ex. http://127.0.0.1:8000, followed by /docs