first commit

zcy5417 · May 18, 2019 · 8ce0c8b · 8ce0c8b
commit 8ce0c8b
Show file tree

Hide file tree

Showing 35 changed files with 2,739 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2017 Max deGroot, Ellis Brown
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,173 @@
+# SSD: Single Shot MultiBox Object Detector, in PyTorch
+A [PyTorch](http://pytorch.org/) implementation of [Single Shot MultiBox Detector](http://arxiv.org/abs/1512.02325) from the 2016 paper by Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang, and Alexander C. Berg.  The official and original Caffe code can be found [here](https://github.com/weiliu89/caffe/tree/ssd).
+
+
+<img align="right" src= "https://github.com/amdegroot/ssd.pytorch/blob/master/doc/ssd.png" height = 400/>
+
+### Table of Contents
+- <a href='#installation'>Installation</a>
+- <a href='#datasets'>Datasets</a>
+- <a href='#training-ssd'>Train</a>
+- <a href='#evaluation'>Evaluate</a>
+- <a href='#performance'>Performance</a>
+- <a href='#demos'>Demos</a>
+- <a href='#todo'>Future Work</a>
+- <a href='#references'>Reference</a>
+
+&nbsp;
+&nbsp;
+&nbsp;
+&nbsp;
+
+## Installation
+- Install [PyTorch](http://pytorch.org/) by selecting your environment on the website and running the appropriate command.
+- Clone this repository.
+  * Note: We currently only support Python 3+.
+- Then download the dataset by following the [instructions](#datasets) below.
+- We now support [Visdom](https://github.com/facebookresearch/visdom) for real-time loss visualization during training!
+  * To use Visdom in the browser:
+  ```Shell
+  # First install Python server and client
+  pip install visdom
+  # Start the server (probably in a screen or tmux)
+  python -m visdom.server
+  ```
+  * Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details).
+- Note: For training, we currently support [VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/), and aim to add [ImageNet](http://www.image-net.org/) support soon.
+
+## Datasets
+To make things easy, we provide bash scripts to handle the dataset downloads and setup for you.  We also provide simple dataset loaders that inherit `torch.utils.data.Dataset`, making them fully compatible with the `torchvision.datasets` [API](http://pytorch.org/docs/torchvision/datasets.html).
+
+
+### COCO
+Microsoft COCO: Common Objects in Context
+
+##### Download COCO 2014
+```Shell
+# specify a directory for dataset to be downloaded into, else default is ~/data/
+sh data/scripts/COCO2014.sh
+```
+
+### VOC Dataset
+PASCAL VOC: Visual Object Classes
+
+##### Download VOC2007 trainval & test
+```Shell
+# specify a directory for dataset to be downloaded into, else default is ~/data/
+sh data/scripts/VOC2007.sh # <directory>
+```
+
+##### Download VOC2012 trainval
+```Shell
+# specify a directory for dataset to be downloaded into, else default is ~/data/
+sh data/scripts/VOC2012.sh # <directory>
+```
+
+## Training SSD
+- First download the fc-reduced [VGG-16](https://arxiv.org/abs/1409.1556) PyTorch base network weights at:              https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
+- By default, we assume you have downloaded the file in the `ssd.pytorch/weights` dir:
+
+```Shell
+mkdir weights
+cd weights
+wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
+```
+
+- To train SSD using the train script simply specify the parameters listed in `train.py` as a flag or manually change them.
+
+```Shell
+python train.py
+```
+
+- Note:
+  * For training, an NVIDIA GPU is strongly recommended for speed.
+  * For instructions on Visdom usage/installation, see the <a href='#installation'>Installation</a> section.
+  * You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see `train.py` for options)
+
+## Evaluation
+To evaluate a trained network:
+
+```Shell
+python eval.py
+```
+
+You can specify the parameters listed in the `eval.py` file by flagging them or manually changing them.  
+
+
+<img align="left" src= "https://github.com/amdegroot/ssd.pytorch/blob/master/doc/detection_examples.png">
+
+## Performance
+
+#### VOC2007 Test
+
+##### mAP
+
+| Original | Converted weiliu89 weights | From scratch w/o data aug | From scratch w/ data aug |
+|:-:|:-:|:-:|:-:|
+| 77.2 % | 77.26 % | 58.12% | 77.43 % |
+
+##### FPS
+**GTX 1060:** ~45.45 FPS
+
+## Demos
+
+### Use a pre-trained SSD network for detection
+
+#### Download a pre-trained network
+- We are trying to provide PyTorch `state_dicts` (dict of weight tensors) of the latest SSD model definitions trained on different datasets.  
+- Currently, we provide the following PyTorch models:
+    * SSD300 trained on VOC0712 (newest PyTorch weights)
+      - https://s3.amazonaws.com/amdegroot-models/ssd300_mAP_77.43_v2.pth
+    * SSD300 trained on VOC0712 (original Caffe weights)
+      - https://s3.amazonaws.com/amdegroot-models/ssd_300_VOC0712.pth
+- Our goal is to reproduce this table from the [original paper](http://arxiv.org/abs/1512.02325)
+<p align="left">
+<img src="http://www.cs.unc.edu/~wliu/papers/ssd_results.png" alt="SSD results on multiple datasets" width="800px"></p>
+
+### Try the demo notebook
+- Make sure you have [jupyter notebook](http://jupyter.readthedocs.io/en/latest/install.html) installed.
+- Two alternatives for installing jupyter notebook:
+    1. If you installed PyTorch with [conda](https://www.continuum.io/downloads) (recommended), then you should already have it.  (Just  navigate to the ssd.pytorch cloned repo and run):
+    `jupyter notebook`
+
+    2. If using [pip](https://pypi.python.org/pypi/pip):
+
+```Shell
+# make sure pip is upgraded
+pip3 install --upgrade pip
+# install jupyter notebook
+pip install jupyter
+# Run this inside ssd.pytorch
+jupyter notebook
+```
+
+- Now navigate to `demo/demo.ipynb` at http://localhost:8888 (by default) and have at it!
+
+### Try the webcam demo
+- Works on CPU (may have to tweak `cv2.waitkey` for optimal fps) or on an NVIDIA GPU
+- This demo currently requires opencv2+ w/ python bindings and an onboard webcam
+  * You can change the default webcam in `demo/live.py`
+- Install the [imutils](https://github.com/jrosebr1/imutils) package to leverage multi-threading on CPU:
+  * `pip install imutils`
+- Running `python -m demo.live` opens the webcam and begins detecting!
+
+## TODO
+We have accumulated the following to-do list, which we hope to complete in the near future
+- Still to come:
+  * [x] Support for the MS COCO dataset
+  * [ ] Support for SSD512 training and testing
+  * [ ] Support for training on custom datasets
+
+## Authors
+
+* [**Max deGroot**](https://github.com/amdegroot)
+* [**Ellis Brown**](http://github.com/ellisbrown)
+
+***Note:*** Unfortunately, this is just a hobby of ours and not a full-time job, so we'll do our best to keep things up to date, but no guarantees.  That being said, thanks to everyone for your continued help and feedback as it is really appreciated. We will try to address everything as soon as possible.
+
+## References
+- Wei Liu, et al. "SSD: Single Shot MultiBox Detector." [ECCV2016]((http://arxiv.org/abs/1512.02325)).
+- [Original Implementation (CAFFE)](https://github.com/weiliu89/caffe/tree/ssd)
+- A huge thank you to [Alex Koltun](https://github.com/alexkoltun) and his team at [Webyclip](http://www.webyclip.com) for their help in finishing the data augmentation portion.
+- A list of other great SSD ports that were sources of inspiration (especially the Chainer repo):
+  * [Chainer](https://github.com/Hakuyume/chainer-ssd), [Keras](https://github.com/rykov8/ssd_keras), [MXNet](https://github.com/zhreshold/mxnet-ssd), [Tensorflow](https://github.com/balancap/SSD-Tensorflow)
diff --git a/demo/.ipynb_checkpoints/demo-checkpoint.ipynb b/demo/.ipynb_checkpoints/demo-checkpoint.ipynb
diff --git a/demo/__init__.py b/demo/__init__.py
diff --git a/demo/demo.ipynb b/demo/demo.ipynb
diff --git a/demo/live.py b/demo/live.py
@@ -0,0 +1,91 @@
+from __future__ import print_function
+import torch
+from torch.autograd import Variable
+import cv2
+import time
+from imutils.video import FPS, WebcamVideoStream
+import argparse
+
+parser = argparse.ArgumentParser(description='Single Shot MultiBox Detection')
+parser.add_argument('--weights', default='weights/ssd_300_VOC0712.pth',
+                    type=str, help='Trained state_dict file path')
+parser.add_argument('--cuda', default=False, type=bool,
+                    help='Use cuda in live demo')
+args = parser.parse_args()
+
+COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]
+FONT = cv2.FONT_HERSHEY_SIMPLEX
+
+
+def cv2_demo(net, transform):
+    def predict(frame):
+        height, width = frame.shape[:2]
+        x = torch.from_numpy(transform(frame)[0]).permute(2, 0, 1)
+        x = Variable(x.unsqueeze(0))
+        y = net(x)  # forward pass
+        detections = y.data
+        # scale each detection back up to the image
+        scale = torch.Tensor([width, height, width, height])
+        for i in range(detections.size(1)):
+            j = 0
+            while detections[0, i, j, 0] >= 0.6:
+                pt = (detections[0, i, j, 1:] * scale).cpu().numpy()
+                cv2.rectangle(frame,
+                              (int(pt[0]), int(pt[1])),
+                              (int(pt[2]), int(pt[3])),
+                              COLORS[i % 3], 2)
+                cv2.putText(frame, labelmap[i - 1], (int(pt[0]), int(pt[1])),
+                            FONT, 2, (255, 255, 255), 2, cv2.LINE_AA)
+                j += 1
+        return frame
+
+    # start video stream thread, allow buffer to fill
+    print("[INFO] starting threaded video stream...")
+    stream = WebcamVideoStream(src=0).start()  # default camera
+    time.sleep(1.0)
+    # start fps timer
+    # loop over frames from the video file stream
+    while True:
+        # grab next frame
+        frame = stream.read()
+        key = cv2.waitKey(1) & 0xFF
+
+        # update FPS counter
+        fps.update()
+        frame = predict(frame)
+
+        # keybindings for display
+        if key == ord('p'):  # pause
+            while True:
+                key2 = cv2.waitKey(1) or 0xff
+                cv2.imshow('frame', frame)
+                if key2 == ord('p'):  # resume
+                    break
+        cv2.imshow('frame', frame)
+        if key == 27:  # exit
+            break
+
+
+if __name__ == '__main__':
+    import sys
+    from os import path
+    sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))
+
+    from data import BaseTransform, VOC_CLASSES as labelmap
+    from ssd import build_ssd
+
+    net = build_ssd('test', 300, 21)    # initialize SSD
+    net.load_state_dict(torch.load(args.weights))
+    transform = BaseTransform(net.size, (104/256.0, 117/256.0, 123/256.0))
+
+    fps = FPS().start()
+    cv2_demo(net.eval(), transform)
+    # stop the timer and display FPS information
+    fps.stop()
+
+    print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
+    print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
+
+    # cleanup
+    cv2.destroyAllWindows()
+    stream.stop()
diff --git a/doc/SSD.jpg b/doc/SSD.jpg
diff --git a/doc/detection_example.png b/doc/detection_example.png
diff --git a/doc/detection_example2.png b/doc/detection_example2.png
diff --git a/doc/detection_examples.png b/doc/detection_examples.png
diff --git a/doc/ssd.png b/doc/ssd.png