- Beat TA’s Baseline
- You will be given a dataset of interest (with predetermined data split)
- You are NOT allowed to apply any external dataset or techniques like transfer learning
- Squeeze Your Model
- Design a model (e.g., those with fewer parameters, simpler designs, compact or simplified versions) which would achieve comparable performances but save computation or storage costs.
- Network borrowed from Github repo
- Related paper 1: "FaceNet: A Unified Embedding for Face Recognition and Clustering"
- Related paper 2: "Deep Face Recognition"
- Center loss is utilized for preventing overfitting
- Hard instance mining to prevent overfitting those easy-sample
- Triplet loss is also utilized on additional embedding layer as multi-task training tricks
- Paper: "FaceNet: A Unified Embedding for Face Recognition and Clustering"
- Code borrowed from Github: tensorflow-triplet-loss
- Training policy - multi-stage training
- Train without data augmentation:
python train_teacher.py --finetune_level 0
- Fine-tune with basic data augmentations:
python train_teacher.py --finetune_level 1
- Including rotation, horizontal flip, scale, crop, hue, contrast, brightness, gray-scale
- Center loss is included with weighting factor 1e-5
- pre-logit normalize with weighting 1e-5 also included in stage 2
- Fine-tune with seaweed augmentation:
python train_teacher.py --finetune_level 2
- Triplet loss is included
- Learning rate is decayed to 5e-5
- Weighting of center loss, PLN loss is increased to 1e-4
- Train without data augmentation:
- "SqueezeNext: Hardware-Aware Neural Network Design"
- Implemented SqNxt-23v5 following this repo
- Github
- Related paper: Distilling the Knowledge in a Neural Network(2015)
Model size | # of params | P. V. | P. T. | fps | weights | |
---|---|---|---|---|---|---|
In.-Res. | 124MB | 26,781,288 | 88.91% | 85.59% | 418.96 | link |
2.0 SqNxt-23v5 | 13.7MB | 3,399,352 | 71.28% | ~ | 635.68 | None |
2.0 SqNxt-23v5(T-S) | 13.7MB | 3,399,352 | 85.98% | 82.48% | 635.68 | link |
1.0 SqNxt-23v5(T-S) | 4.5MB | 1,106,456 | 78.42% | 73.6% | 805.36 | link |
- T-S refers to Teacher-Student training strategy
- P. V. refers to Performance on Validation set
- the T-S weight is for fine-tuning, thus contains weights of embedding layer
Basic A. | Seaweed | Center L. | P. N. L. | H. I. M. | Triplet L. | P. V. | P. T. |
---|---|---|---|---|---|---|---|
✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ~30% | |
✔️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | 67.4% | |
✔️ | ✔️ | ✖️ | ✖️ | ✖️ | ✖️ | 72% | |
✔️ | ✖️ | ✔️ | ✖️ | ✖️ | ✖️ | 75.75% | |
✔️ | ✔️ | ✔️ | ✖️ | ✖️ | ✖️ | 78.11% | 79.11% |
✔️ | ✔️ | ✔️ | ✔️ | ✖️ | ✖️ | 81.81% | 82.45% |
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | 88.91% | 85.59% |
- Basic A. refers to basic augmentations
- L. refers to loss
- P. N. L. refers to pre-logit norm loss
- H. I. M. refers to hard instance mining
- P. V. refers to Performance on Validation set
- P. T. refers to Performance on Test set (scores on Kaggle)