initial commit of dlrm files

rohithkrn · Jun 11, 2019 · a73259b · a73259b
1 parent 48d4748
commit a73259b
Show file tree

Hide file tree

Showing 17 changed files with 4,078 additions and 33 deletions.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,5 @@
+# Code of Conduct
+
+Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
+Please read the [full text](https://code.fb.com/codeofconduct/)
+so that you can understand what actions will and will not be tolerated.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,36 @@
+# Contributing to DLRM
+We want to make contributing to this project as easy and transparent as
+possible.
+
+## Pull Requests
+We actively welcome your pull requests.
+
+1. Fork the repo and create your branch from `master`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+
+Complete your CLA here: <https://code.facebook.com/cla>
+
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+
+Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
+disclosure of security bugs. In those cases, please go through the process
+outlined on that page and do not file a public issue.
+
+## Coding Style
+* 4 spaces for indentation rather than tabs
+* 80 character line length
+* in general, please maintain a consistent style with the rest of the code
+
+## License
+By contributing to DLRM, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) Facebook, Inc. and its affiliates.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,11 +1,14 @@
-### Description: ###
+Description:
+===========
+Copyright (c) Facebook, Inc. and its affiliates.
+
 An implementation of a deep learning recommendation model (DLRM)
 The model input consists of dense and sparse features. The former is a vector
 of floating point values. The latter is a list of sparse indices into
 embedding tables, which consist of vectors of floating point values.
 The selected vectors are passed to mlp networks denoted by triangles,
 in some cases the vectors are interacted through operators (Ops).
-
+```
 output:
                     probability of a click
 model:                        |
@@ -22,46 +25,74 @@ model:                        |
    |                   |_Emb_|____|__|    ...  |_Emb_|__|___|
 input:
 [ dense features ]     [sparse indices] , ..., [sparse indices]
-
+```
  More precise definition of model layers:
  1) fully connected layers of an mlp
- z = f(y)
- y = Wx + b
+
+    z = f(y)
+
+    y = Wx + b
 
  2) embedding lookup (for a list of sparse indices p=[p1,...,pk])
- z = Op(e1,...,ek)
- obtain vectors e1=E[:,p1], ..., ek=E[:,pk]
+
+    z = Op(e1,...,ek)
+
+    obtain vectors e1=E[:,p1], ..., ek=E[:,pk]
 
  3) Operator Op can be one of the following
- Sum(e1,...,ek) = e1 + ... + ek
- Dot(e1,...,ek) = [e1'e1, ..., e1'ek, ..., ek'e1, ..., ek'ek]
- Cat(e1,...,ek) = [e1', ..., ek']'
- where ' denotes transpose operation
-
-# References:
-# [1] Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang,
-# Narayanan Sundaram, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu,
-# Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii,
-# Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko,
-# Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong,
-# Misha Smelyanskiy, "Deep Learning Recommendation Model for Personalization and
-# Recommendation Systems", CoRR, arXiv:1906.00091, 2019
-
-### How to run dlrm code? ###
-1) A sample run of the code, with a tiny model is shown below
-> dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --debug-mode
 
-2) The code supports interface with the Kaggle Display Advertising Challenge Dataset
-   (https://labs.criteo.com/2014/09/kaggle-contest-dataset-now-available-academic-use/)
-   In order to prepare the dataset for use please do the following:
-> First, specify the raw data file (train.txt) as downloaded with --raw-data-file=<path/train.txt>
-> This is then pre-processed (categorize, concat across days...) to allow using with dlrm code
-> The processed data is stored as *.npz file in <root_dir>/input/kaggle_data/*.npz
-> Once generated the processed file (*.npz) can be used for subsequent runs with --processed-data-file=<path/*.npz>
+    Sum(e1,...,ek) = e1 + ... + ek
+
+    Dot(e1,...,ek) = [e1'e1, ..., e1'ek, ..., ek'e1, ..., ek'ek]
+
+    Cat(e1,...,ek) = [e1', ..., ek']'
+
+    where ' denotes transpose operation
+
+Reference:
+> Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang,
+ Narayanan Sundaram, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu,
+ Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii,
+ Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko,
+ Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong,
+ Misha Smelyanskiy, "Deep Learning Recommendation Model for Personalization and
+ Recommendation Systems", CoRR, [arXiv:1906.00091](https://arxiv.org/abs/1906.00091), May, 2019
+
+Implementation
+--------------
+**DLRM PyTorch**. Implementation of DLRM in PyTorch framework:
+
+       dlrm_s_pytorch.py
+
+**DLRM Caffe2**. Implementation of DLRM in Caffe2 framework:
+
+       dlrm_s_caffe2.py
+
+**DLRM Data**. Implementation of DLRM data generation and loading:
+
+       dlrm_data_pytorch.py, dlrm_data_caffe2.py, data_utils.py
+
+**DLRM Tests**. Implementation of DLRM tests in ./test
 
-3) The test and benchmarking scripts can be found in test and bench directories, respectively.
+       dlrm_s_test.sh
 
-> dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --debug-mode
+**DLRM Benchmarks**. Implementation of DLRM benchmarks in ./bench
+
+       dlrm_s_benchmark.sh, dlrm_s_criteo_kaggle.sh
+
+How to run dlrm code?
+--------------------
+1) A sample run of the code, with a tiny model is shown below
+```
+$ python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6
+time/loss/accuracy (if enabled):
+Finished training it 1/3 of epoch 0, -1.00 ms/it, loss 0.451893, accuracy 0.000%
+Finished training it 2/3 of epoch 0, -1.00 ms/it, loss 0.402002, accuracy 0.000%
+Finished training it 3/3 of epoch 0, -1.00 ms/it, loss 0.275460, accuracy 0.000%
+```
+2) A sample run of the code, with a tiny model in debug mode
+```
+$ python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --debug-mode
 model arch:
 mlp top arch 3 layers, with input to output dimensions:
 [8 4 2 1]
@@ -175,3 +206,60 @@ updated parameters (weights and bias):
 [0.92754 0.75067]
 [[0.57379 0.7514 ]]
 [0.07908]
+```
+
+Testing
+-------
+Testing scripts to confirm functional correctness of the code
+
+```
+./test/dlrm_s_tests.sh
+Running commands ...
+python dlrm_s_pytorch.py
+python dlrm_s_caffe2.py
+Checking results ...
+diff test1 (no numeric values in the output = SUCCESS)
+diff test2 (no numeric values in the output = SUCCESS)
+diff test3 (no numeric values in the output = SUCCESS)
+diff test4 (no numeric values in the output = SUCCESS)
+```
+
+*NOTE: Testing scripts accept extra arguments which will passed along, such as --use-gpu*
+
+Benchmarking
+------------
+1) Performance benchmarking
+    ```
+    ./bench/dlrm_s_benchmark.sh
+    ```
+
+2) The code supports interface with the [Kaggle Display Advertising Challenge Dataset](https://labs.criteo.com/2014/09/kaggle-contest-dataset-now-available-academic-use/).
+   Please do the following to prepare the dataset for use with DLRM code:
+     - First, specify the raw data file (train.txt) as downloaded with --raw-data-file=<path/train.txt>
+     - This is then pre-processed (categorize, concat across days...) to allow using with dlrm code
+     - The processed data is stored as *.npz file in <root_dir>/input/kaggle_data/*.npz
+     - The processed file (*.npz) can be used for subsequent runs with --processed-data-file=<path/*.npz>
+
+     ```
+     ./bench/dlrm_s_criteo_kaggle.sh
+     ```
+<img src="./kaggle_dac_loss_accuracy_plots.png" width="900" height="320">
+
+*NOTE: Benchmarking scripts accept extra arguments which will passed along, such as --num-batches=100 to limit the number of data samples*
+
+Version
+-------
+0.1 : Initial release of the DLRM code
+
+Requirements
+------------
+pytorch-nightly (*6/10/19*)
+
+onnx (*optional*)
+
+torchviz (*optional*)
+
+License
+-------
+This source code is licensed under the MIT license found in the
+LICENSE file in the root directory of this source tree.
diff --git a/bench/dlrm_s_benchmark.sh b/bench/dlrm_s_benchmark.sh
@@ -0,0 +1,149 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+#check if extra argument is passed to the test
+if [[ $# == 1 ]]; then
+    dlrm_extra_option=$1
+else
+    dlrm_extra_option=""
+fi
+#echo $dlrm_extra_option
+
+build=1
+cpu=1
+gpu=1
+pt=1
+c2=1
+
+ncores=28 #12 #6
+nsockets="0"
+
+ngpus="1 2 4 8"
+
+numa_cmd="numactl --physcpubind=0-$((ncores-1)) -m $nsockets" #run on one socket, without HT
+dlrm_pt_bin="python dlrm_s_pytorch.py"
+dlrm_c2_bin="python dlrm_s_caffe2.py"
+
+data=random #synthetic
+print_freq=100
+rand_seed=727
+
+c2_net="async_scheduling"
+
+#Model param
+mb_size=2048 #1024 #512 #256
+nbatches=1000 #500 #100
+bot_mlp="512-512-64"
+top_mlp="1024-1024-1024-1"
+emb_size=64
+nindices=100
+emb="1000000-1000000-1000000-1000000-1000000-1000000-1000000-1000000"
+interaction="dot"
+
+#_args="--mini-batch-size="${mb_size}\
+_args=" --num-batches="${nbatches}\
+" --data-generation="${data}\
+" --arch-mlp-bot="${bot_mlp}\
+" --arch-mlp-top="${top_mlp}\
+" --arch-sparse-feature-size="${emb_size}\
+" --arch-embedding-size="${emb}\
+" --num-indices-per-lookup="${nindices}\
+" --arch-interaction-op="${interaction}\
+" --numpy-rand-seed="${rand_seed}\
+" --print-freq="${print_freq}\
+" --print-time"\
+" --enable-profiling "
+
+c2_args=" --caffe2-net-type="${c2_net}
+
+if [ $build = 1 ]; then
+  BUCK_DISTCC=0 buck build @mode/opt //experimental/mnaumov/hw/dlrm:dlrm_s_pytorch //experimental/mnaumov/hw/dlrm:dlrm_s_caffe2
+fi
+
+# CPU Benchmarking
+if [ $cpu = 1 ]; then
+  echo "--------------------------------------------"
+  echo "CPU Benchmarking - running on $ncores cores"
+  echo "--------------------------------------------"
+  if [ $pt = 1 ]; then
+    outf="model1_CPU_PT_$ncores.log"
+    outp="dlrm_s_pytorch.prof"
+    echo "-------------------------------"
+    echo "Running PT (log file: $outf)"
+    echo "-------------------------------"
+    cmd="$numa_cmd $dlrm_pt_bin --mini-batch-size=$mb_size $_args $dlrm_extra_option > $outf"
+    echo $cmd
+    eval $cmd
+    min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
+    echo "Min time per iteration = $min"
+    # move profiling file(s)
+    mv $outp ${outf//".log"/".prof"}
+    mv ${outp//".prof"/".json"} ${outf//".log"/".json"}
+
+  fi
+  if [ $c2 = 1 ]; then
+    outf="model1_CPU_C2_$ncores.log"
+    outp="dlrm_s_caffe2.prof"
+    echo "-------------------------------"
+    echo "Running C2 (log file: $outf)"
+    echo "-------------------------------"
+    cmd="$numa_cmd $dlrm_c2_bin --mini-batch-size=$mb_size $_args $c2_args $dlrm_extra_option 1> $outf 2> $outp"
+    echo $cmd
+    eval $cmd
+    min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
+    echo "Min time per iteration = $min"
+    # move profiling file (collected from stderr above)
+    mv $outp ${outf//".log"/".prof"}
+  fi
+fi
+
+# GPU Benchmarking
+if [ $gpu = 1 ]; then
+  echo "--------------------------------------------"
+  echo "GPU Benchmarking - running on $ngpus GPUs"
+  echo "--------------------------------------------"
+  for _ng in $ngpus
+  do
+    # weak scaling
+    # _mb_size=$((mb_size*_ng))
+    # strong scaling
+    _mb_size=$((mb_size*1))
+    _gpus=$(seq -s, 0 $((_ng-1)))
+    cuda_arg="CUDA_VISIBLE_DEVICES=$_gpus"
+    echo "-------------------"
+    echo "Using GPUS: "$_gpus
+    echo "-------------------"
+    if [ $pt = 1 ]; then
+      outf="model1_GPU_PT_$_ng.log"
+      outp="dlrm_s_pytorch.prof"
+      echo "-------------------------------"
+      echo "Running PT (log file: $outf)"
+      echo "-------------------------------"
+      cmd="$cuda_arg $dlrm_pt_bin --mini-batch-size=$_mb_size $_args --use-gpu $dlrm_extra_option > $outf"
+      echo $cmd
+      eval $cmd
+      min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
+      echo "Min time per iteration = $min"
+      # move profiling file(s)
+      mv $outp ${outf//".log"/".prof"}
+      mv ${outp//".prof"/".json"} ${outf//".log"/".json"}
+    fi
+    if [ $c2 = 1 ]; then
+      outf="model1_GPU_C2_$_ng.log"
+      outp="dlrm_s_caffe2.prof"
+      echo "-------------------------------"
+      echo "Running C2 (log file: $outf)"
+      echo "-------------------------------"
+      cmd="$cuda_arg $dlrm_c2_bin --mini-batch-size=$_mb_size $_args $c2_args --use-gpu $dlrm_extra_option 1> $outf 2> $outp"
+      echo $cmd
+      eval $cmd
+      min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
+      echo "Min time per iteration = $min"
+      # move profiling file (collected from stderr above)
+      mv $outp ${outf//".log"/".prof"}
+    fi
+  done
+fi