Skip to content

Commit

Permalink
Deployed d3fd609 to main with MkDocs 1.5.3 and mike 2.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Mar 15, 2024
1 parent 68dfbf1 commit 7a98519
Show file tree
Hide file tree
Showing 9 changed files with 177 additions and 158 deletions.
42 changes: 21 additions & 21 deletions main/benchmarking/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -339,31 +339,31 @@ <h2 id="benchmarking-machines">Benchmarking machines:</h2>
<li>inf2.xlarge instance (2 Neuron Cores with 1 used)</li>
</ul>
<h2 id="reproduction-steps">Reproduction steps:</h2>
<p>Install the environment</p>
<pre><code class="language-bash">pip install "infinity_emb[all]==0.0.25"
</code></pre>
<p>Install the environment
<div class="language-bash highlight"><pre><span></span><code><span id="__span-0-1"><a href="#__codelineno-0-1" id="__codelineno-0-1" name="__codelineno-0-1"></a>pip<span class="w"> </span>install<span class="w"> </span><span class="s2">"infinity_emb[all]==0.0.25"</span>
</span></code></pre></div></p>
<h3 id="sentence-transformers-fastembed-infinity">sentence-transformers, fastembed, infinity</h3>
<pre><code class="language-bash">git clone https://github.com/michaelfeil/infinity.git
cd infinity
git checkout tags/0.0.25
python ./docs/benchmarks/simple_app.py
</code></pre>
<div class="language-bash highlight"><pre><span></span><code><span id="__span-1-1"><a href="#__codelineno-1-1" id="__codelineno-1-1" name="__codelineno-1-1"></a>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/michaelfeil/infinity.git
</span><span id="__span-1-2"><a href="#__codelineno-1-2" id="__codelineno-1-2" name="__codelineno-1-2"></a><span class="nb">cd</span><span class="w"> </span>infinity
</span><span id="__span-1-3"><a href="#__codelineno-1-3" id="__codelineno-1-3" name="__codelineno-1-3"></a>git<span class="w"> </span>checkout<span class="w"> </span>tags/0.0.25
</span><span id="__span-1-4"><a href="#__codelineno-1-4" id="__codelineno-1-4" name="__codelineno-1-4"></a>python<span class="w"> </span>./docs/benchmarks/simple_app.py
</span></code></pre></div>
<h3 id="huggingfacetext-embeddings-inference">huggingface/text-embeddings-inference</h3>
<p>using the <em>cpu</em> and <em>89-cuda</em> container (note that cc-89 matches to Nvidia L4)</p>
<pre><code class="language-bash">docker run -it -p 7997:80 --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-0.6
--model-id BAAI/bge-small-en-v1.5 --max-client-batch-size 256
</code></pre>
<pre><code class="language-bash">docker run -it -p "7997:80" --gpus all --pull always ghcr.io/huggingface/text-embeddings-inference:89-0.6
--model-id BAAI/bge-large-en-v1.5 --max-client-batch-size 256
</code></pre>
<p>using the <em>cpu</em> and <em>89-cuda</em> container (note that cc-89 matches to Nvidia L4)
<div class="language-bash highlight"><pre><span></span><code><span id="__span-2-1"><a href="#__codelineno-2-1" id="__codelineno-2-1" name="__codelineno-2-1"></a>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>-p<span class="w"> </span><span class="m">7997</span>:80<span class="w"> </span>--pull<span class="w"> </span>always<span class="w"> </span>ghcr.io/huggingface/text-embeddings-inference:cpu-0.6<span class="w"> </span>
</span><span id="__span-2-2"><a href="#__codelineno-2-2" id="__codelineno-2-2" name="__codelineno-2-2"></a>--model-id<span class="w"> </span>BAAI/bge-small-en-v1.5<span class="w"> </span>--max-client-batch-size<span class="w"> </span><span class="m">256</span>
</span></code></pre></div></p>
<div class="language-bash highlight"><pre><span></span><code><span id="__span-3-1"><a href="#__codelineno-3-1" id="__codelineno-3-1" name="__codelineno-3-1"></a>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>-p<span class="w"> </span><span class="s2">"7997:80"</span><span class="w"> </span>--gpus<span class="w"> </span>all<span class="w"> </span>--pull<span class="w"> </span>always<span class="w"> </span>ghcr.io/huggingface/text-embeddings-inference:89-0.6<span class="w"> </span>
</span><span id="__span-3-2"><a href="#__codelineno-3-2" id="__codelineno-3-2" name="__codelineno-3-2"></a>--model-id<span class="w"> </span>BAAI/bge-large-en-v1.5<span class="w"> </span>--max-client-batch-size<span class="w"> </span><span class="m">256</span>
</span></code></pre></div>
<h3 id="tensorrt-onnx-gpu">tensorrt, onnx-gpu:</h3>
<pre><code class="language-bash">docker buildx build --target production-tensorrt -t inf-trt . &amp;&amp; docker run -it -p "7997:7997" --gpus all inf-trt --model-name-or-path BAAI/bge-large-
en-v1.5 --engine optimum --device "cuda OR tensorrt"
</code></pre>
<div class="language-bash highlight"><pre><span></span><code><span id="__span-4-1"><a href="#__codelineno-4-1" id="__codelineno-4-1" name="__codelineno-4-1"></a>docker<span class="w"> </span>buildx<span class="w"> </span>build<span class="w"> </span>--target<span class="w"> </span>production-tensorrt<span class="w"> </span>-t<span class="w"> </span>inf-trt<span class="w"> </span>.<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>-p<span class="w"> </span><span class="s2">"7997:7997"</span><span class="w"> </span>--gpus<span class="w"> </span>all<span class="w"> </span>inf-trt<span class="w"> </span>--model-name-or-path<span class="w"> </span>BAAI/bge-large-
</span><span id="__span-4-2"><a href="#__codelineno-4-2" id="__codelineno-4-2" name="__codelineno-4-2"></a>en-v1.5<span class="w"> </span>--engine<span class="w"> </span>optimum<span class="w"> </span>--device<span class="w"> </span><span class="s2">"cuda OR tensorrt"</span>
</span></code></pre></div>
<h2 id="results">Results</h2>
<p>To launch the benchmarks</p>
<pre><code class="language-bash">make benchmark_embed
</code></pre>
<p>To launch the benchmarks
<div class="language-bash highlight"><pre><span></span><code><span id="__span-5-1"><a href="#__codelineno-5-1" id="__codelineno-5-1" name="__codelineno-5-1"></a>make<span class="w"> </span>benchmark_embed
</span></code></pre></div></p>
<p>Below are the following metrics:
- Requests # / sec (1 request = 256 sentences / 115_000 tokens)
- time to run benchmark (10 requests / 1_150_000)</p>
Expand Down
32 changes: 16 additions & 16 deletions main/contribution/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -185,22 +185,22 @@
<article class="md-content__inner md-typeset">
<h1 id="contribute-and-develop">Contribute and Develop</h1>
<h2 id="developer-setup">Developer setup</h2>
<p>Install via Poetry 1.7.1 and Python3.11 on Ubuntu 22.04</p>
<pre><code class="language-bash">git clone https://github.com/michaelfeil/infinity
cd infinity
cd libs/infinity_emb
poetry install --extras all --with test
</code></pre>
<p>To pass the CI:</p>
<pre><code class="language-bash">cd libs/infinity_emb
make format
make lint
poetry run pytest ./tests
</code></pre>
<p>as alternative, you may also use:</p>
<pre><code class="language-bash">cd libs/infinity_emb
make precommit
</code></pre>
<p>Install via Poetry 1.7.1 and Python3.11 on Ubuntu 22.04
<div class="language-bash highlight"><pre><span></span><code><span id="__span-0-1"><a href="#__codelineno-0-1" id="__codelineno-0-1" name="__codelineno-0-1"></a>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/michaelfeil/infinity
</span><span id="__span-0-2"><a href="#__codelineno-0-2" id="__codelineno-0-2" name="__codelineno-0-2"></a><span class="nb">cd</span><span class="w"> </span>infinity
</span><span id="__span-0-3"><a href="#__codelineno-0-3" id="__codelineno-0-3" name="__codelineno-0-3"></a><span class="nb">cd</span><span class="w"> </span>libs/infinity_emb
</span><span id="__span-0-4"><a href="#__codelineno-0-4" id="__codelineno-0-4" name="__codelineno-0-4"></a>poetry<span class="w"> </span>install<span class="w"> </span>--extras<span class="w"> </span>all<span class="w"> </span>--with<span class="w"> </span><span class="nb">test</span>
</span></code></pre></div></p>
<p>To pass the CI:
<div class="language-bash highlight"><pre><span></span><code><span id="__span-1-1"><a href="#__codelineno-1-1" id="__codelineno-1-1" name="__codelineno-1-1"></a><span class="nb">cd</span><span class="w"> </span>libs/infinity_emb
</span><span id="__span-1-2"><a href="#__codelineno-1-2" id="__codelineno-1-2" name="__codelineno-1-2"></a>make<span class="w"> </span>format
</span><span id="__span-1-3"><a href="#__codelineno-1-3" id="__codelineno-1-3" name="__codelineno-1-3"></a>make<span class="w"> </span>lint
</span><span id="__span-1-4"><a href="#__codelineno-1-4" id="__codelineno-1-4" name="__codelineno-1-4"></a>poetry<span class="w"> </span>run<span class="w"> </span>pytest<span class="w"> </span>./tests
</span></code></pre></div>
as alternative, you may also use:
<div class="language-bash highlight"><pre><span></span><code><span id="__span-2-1"><a href="#__codelineno-2-1" id="__codelineno-2-1" name="__codelineno-2-1"></a><span class="nb">cd</span><span class="w"> </span>libs/infinity_emb
</span><span id="__span-2-2"><a href="#__codelineno-2-2" id="__codelineno-2-2" name="__codelineno-2-2"></a>make<span class="w"> </span>precommit
</span></code></pre></div></p>
<h2 id="cla">CLA</h2>
<p>All contributions must be made in a way to be compatible with the MIT License of this repo. </p>
</article>
Expand Down
44 changes: 26 additions & 18 deletions main/deploy/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -185,25 +185,33 @@
<article class="md-content__inner md-typeset">
<h1 id="deployment">Deployment</h1>
<h3 id="docker-launch-the-cli-using-a-pre-built-docker-container">Docker: Launch the CLI using a pre-built docker container</h3>
<pre><code class="language-bash">model=BAAI/bge-small-en-v1.5
port=7997
docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port
</code></pre>
<p>The download path at runtime, can be controlled via the environment variable <code>HF_HOME</code>.</p>
<p><div class="language-bash highlight"><pre><span></span><code><span id="__span-0-1"><a href="#__codelineno-0-1" id="__codelineno-0-1" name="__codelineno-0-1"></a><span class="nv">model</span><span class="o">=</span>BAAI/bge-small-en-v1.5
</span><span id="__span-0-2"><a href="#__codelineno-0-2" id="__codelineno-0-2" name="__codelineno-0-2"></a><span class="nv">port</span><span class="o">=</span><span class="m">7997</span>
</span><span id="__span-0-3"><a href="#__codelineno-0-3" id="__codelineno-0-3" name="__codelineno-0-3"></a>docker<span class="w"> </span>run<span class="w"> </span><span class="se">\</span>
</span><span id="__span-0-4"><a href="#__codelineno-0-4" id="__codelineno-0-4" name="__codelineno-0-4"></a><span class="w"> </span>-it<span class="w"> </span>
</span><span id="__span-0-5"><a href="#__codelineno-0-5" id="__codelineno-0-5" name="__codelineno-0-5"></a><span class="w"> </span>--gpus<span class="w"> </span>all<span class="w"> </span>
</span><span id="__span-0-6"><a href="#__codelineno-0-6" id="__codelineno-0-6" name="__codelineno-0-6"></a><span class="w"> </span>-p<span class="w"> </span><span class="nv">$port</span>:<span class="nv">$port</span><span class="w"> </span>
</span><span id="__span-0-7"><a href="#__codelineno-0-7" id="__codelineno-0-7" name="__codelineno-0-7"></a><span class="w"> </span>michaelf34/infinity:latest<span class="w"> </span>
</span><span id="__span-0-8"><a href="#__codelineno-0-8" id="__codelineno-0-8" name="__codelineno-0-8"></a><span class="w"> </span>--model-name-or-path<span class="w"> </span><span class="nv">$model</span><span class="w"> </span>
</span><span id="__span-0-9"><a href="#__codelineno-0-9" id="__codelineno-0-9" name="__codelineno-0-9"></a><span class="w"> </span>--port<span class="w"> </span><span class="nv">$port</span>
</span></code></pre></div>
The download path at runtime, can be controlled via the environment variable <code>HF_HOME</code>.</p>
<h3 id="dstack">dstack</h3>
<p>dstack allows you to provision a VM instance on the cloud of your choice. Write a service configuration file as below for the deployment of <code>BAAI/bge-small-en-v1.5</code> model wrapped in Infinity.</p>
<pre><code class="language-yaml">type: service

image: michaelf34/infinity:latest
env:
- MODEL_ID=BAAI/bge-small-en-v1.5
commands:
- infinity_emb --model-name-or-path $MODEL_ID --port 80
port: 80
</code></pre>
<p>Then, simply run the following dstack command. After this, a prompt will appear to let you choose which VM instance to deploy the Infinity.</p>
<pre><code class="language-shell">dstack run . -f infinity/serve.dstack.yml --gpu 16GB
</code></pre>
<p>dstack allows you to provision a VM instance on the cloud of your choice.
Write a service configuration file as below for the deployment of <code>BAAI/bge-small-en-v1.5</code> model wrapped in Infinity.</p>
<div class="language-yaml highlight"><pre><span></span><code><span id="__span-1-1"><a href="#__codelineno-1-1" id="__codelineno-1-1" name="__codelineno-1-1"></a><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">service</span>
</span><span id="__span-1-2"><a href="#__codelineno-1-2" id="__codelineno-1-2" name="__codelineno-1-2"></a>
</span><span id="__span-1-3"><a href="#__codelineno-1-3" id="__codelineno-1-3" name="__codelineno-1-3"></a><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">michaelf34/infinity:latest</span>
</span><span id="__span-1-4"><a href="#__codelineno-1-4" id="__codelineno-1-4" name="__codelineno-1-4"></a><span class="nt">env</span><span class="p">:</span>
</span><span id="__span-1-5"><a href="#__codelineno-1-5" id="__codelineno-1-5" name="__codelineno-1-5"></a><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">MODEL_ID=BAAI/bge-small-en-v1.5</span>
</span><span id="__span-1-6"><a href="#__codelineno-1-6" id="__codelineno-1-6" name="__codelineno-1-6"></a><span class="nt">commands</span><span class="p">:</span>
</span><span id="__span-1-7"><a href="#__codelineno-1-7" id="__codelineno-1-7" name="__codelineno-1-7"></a><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">infinity_emb --model-name-or-path $MODEL_ID --port 80</span>
</span><span id="__span-1-8"><a href="#__codelineno-1-8" id="__codelineno-1-8" name="__codelineno-1-8"></a><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">80</span>
</span></code></pre></div>
<p>Then, simply run the following dstack command.
After this, a prompt will appear to let you choose which VM instance to deploy the Infinity.</p>
<div class="language-shell highlight"><pre><span></span><code><span id="__span-2-1"><a href="#__codelineno-2-1" id="__codelineno-2-1" name="__codelineno-2-1"></a>dstack<span class="w"> </span>run<span class="w"> </span>.<span class="w"> </span>-f<span class="w"> </span>infinity/serve.dstack.yml<span class="w"> </span>--gpu<span class="w"> </span>16GB
</span></code></pre></div>
</article>
</div>
<script>var target=document.getElementById(location.hash.slice(1));target&&target.name&&(target.checked=target.name.startsWith("__tabbed_"))</script>
Expand Down
Loading

0 comments on commit 7a98519

Please sign in to comment.