Skip to content

Commit

Permalink
Deployed 1e4f705 to main with MkDocs 1.5.3 and mike 2.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Mar 16, 2024
1 parent d3d3690 commit 8785217
Show file tree
Hide file tree
Showing 10 changed files with 43 additions and 32 deletions.
2 changes: 1 addition & 1 deletion main/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@


<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>


Expand Down
2 changes: 1 addition & 1 deletion main/benchmarking/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@
<li class="md-nav__item">
<a class="md-nav__link" href="../python_engine/">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
</li>
Expand Down
2 changes: 1 addition & 1 deletion main/contribution/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@
<li class="md-nav__item">
<a class="md-nav__link" href="../python_engine/">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
</li>
Expand Down
2 changes: 1 addition & 1 deletion main/deploy/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@
<li class="md-nav__item">
<a class="md-nav__link" href="../python_engine/">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
</li>
Expand Down
4 changes: 2 additions & 2 deletions main/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@
<li class="md-nav__item">
<a class="md-nav__link" href="python_engine/">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
</li>
Expand Down Expand Up @@ -237,7 +237,7 @@ <h2 id="why-infinity">Why Infinity:</h2>
<p>Infinity provides the following features:</p>
<ul>
<li><strong>Deploy any model from MTEB</strong>: deploy the model you know from <a href="https://github.com/UKPLab/sentence-transformers/">SentenceTransformers</a></li>
<li><strong>Fast inference backends</strong>: The inference server is built on top of <a href="https://github.com/pytorch/pytorch">torch</a>, <a href="https://github.com/qdrant/fastembed">optimum(onnx/tensorrt)</a> and <a href="https://github.com/OpenNMT/CTranslate2">CTranslate2</a>, using FlashAttention to get the most out of <strong>CUDA</strong>, <strong>ROCM</strong>, <strong>CPU</strong> or <strong>MPS</strong> chips.</li>
<li><strong>Fast inference backends</strong>: The inference server is built on top of <a href="https://github.com/pytorch/pytorch">torch</a>, <a href="https://huggingface.co/docs/optimum/index">optimum(onnx/tensorrt)</a> and <a href="https://github.com/OpenNMT/CTranslate2">CTranslate2</a>, using FlashAttention to get the most out of <strong>CUDA</strong>, <strong>ROCM</strong>, <strong>CPU</strong> or <strong>MPS</strong> device.</li>
<li><strong>Dynamic batching</strong>: New embedding requests are queued while GPU is busy with the previous ones. New requests are squeezed intro your device as soon as ready. Similar max throughput on GPU as text-embeddings-inference.</li>
<li><strong>Correct and tested implementation</strong>: Unit and end-to-end tested. Embeddings via infinity are identical to <a href="https://github.com/UKPLab/sentence-transformers/">SentenceTransformers</a> (up to numerical precision). Lets API users create embeddings till infinity and beyond.</li>
<li><strong>Easy to use</strong>: The API is built on top of <a href="https://fastapi.tiangolo.com/">FastAPI</a>, <a href="https://swagger.io/">Swagger</a> makes it fully documented. API are aligned to <a href="https://platform.openai.com/docs/guides/embeddings/what-are-embeddings">OpenAI's Embedding specs</a>. See below on how to get started.</li>
Expand Down
2 changes: 1 addition & 1 deletion main/integrations/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@
<li class="md-nav__item">
<a class="md-nav__link" href="../python_engine/">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
</li>
Expand Down
55 changes: 33 additions & 22 deletions main/python_engine/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<link href="../swagger_ui/" rel="next"/>
<link href="../assets/images/favicon.png" rel="icon"/>
<meta content="mkdocs-1.5.3, mkdocs-material-9.5.13" name="generator"/>
<title>Python engine - Infinity</title>
<title>Python Engine Integration - Infinity</title>
<link href="../assets/stylesheets/main.7e359304.min.css" rel="stylesheet"/>
<link crossorigin="" href="https://fonts.gstatic.com" rel="preconnect"/>
<link href="https://fonts.googleapis.com/css?family=Roboto:300,300i,400,400i,700,700i%7CRoboto+Mono:400,400i,700,700i&amp;display=fallback" rel="stylesheet"/>
Expand Down Expand Up @@ -47,7 +47,7 @@
<div class="md-header__topic" data-md-component="header-topic">
<span class="md-ellipsis">

Python engine
Python Engine Integration

</span>
</div>
Expand Down Expand Up @@ -109,13 +109,13 @@
<input class="md-nav__toggle md-toggle" id="__toc" type="checkbox"/>
<label class="md-nav__link md-nav__link--active" for="__toc">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
<span class="md-nav__icon md-icon"></span>
</label>
<a class="md-nav__link md-nav__link--active" href="./">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
<nav aria-label="Table of contents" class="md-nav md-nav--secondary">
Expand Down Expand Up @@ -197,28 +197,39 @@
</div>
<div class="md-content" data-md-component="content">
<article class="md-content__inner md-typeset">
<p>Enhancing the document involves improving clarity, structure, and adding helpful context where necessary. Here's an enhanced version:</p>
<h1 id="python-engine-integration">Python Engine Integration</h1>
<h2 id="launching-embedding-generation-with-python">Launching Embedding generation with Python</h2>
<p>Use asynchronous programming in Python using <code>asyncio</code> for flexible and efficient embedding processing with Infinity. This advanced method allows for concurrent execution, making it ideal for high-throughput embedding generation.</p>
<p>Use asynchronous programming in Python using <code>asyncio</code> for flexible and efficient embedding processing with Infinity. This advanced method allows for concurrent execution of different requests, making it ideal for high-throughput embedding generation.</p>
<div class="language-python highlight"><pre><span></span><code><span id="__span-0-1"><a href="#__codelineno-0-1" id="__codelineno-0-1" name="__codelineno-0-1"></a><span class="kn">import</span> <span class="nn">asyncio</span>
</span><span id="__span-0-2"><a href="#__codelineno-0-2" id="__codelineno-0-2" name="__codelineno-0-2"></a><span class="kn">from</span> <span class="nn">infinity_emb</span> <span class="kn">import</span> <span class="n">AsyncEmbeddingEngine</span><span class="p">,</span> <span class="n">EngineArgs</span>
</span><span id="__span-0-3"><a href="#__codelineno-0-3" id="__codelineno-0-3" name="__codelineno-0-3"></a>
</span><span id="__span-0-4"><a href="#__codelineno-0-4" id="__codelineno-0-4" name="__codelineno-0-4"></a><span class="c1"># Define sentences for embedding</span>
</span><span id="__span-0-5"><a href="#__codelineno-0-5" id="__codelineno-0-5" name="__codelineno-0-5"></a><span class="n">sentences</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"Embed this sentence via Infinity."</span><span class="p">,</span> <span class="s2">"Paris is in France."</span><span class="p">]</span>
</span><span id="__span-0-6"><a href="#__codelineno-0-6" id="__codelineno-0-6" name="__codelineno-0-6"></a><span class="c1"># Initialize the embedding engine with model specifications</span>
</span><span id="__span-0-7"><a href="#__codelineno-0-7" id="__codelineno-0-7" name="__codelineno-0-7"></a><span class="n">engine</span> <span class="o">=</span> <span class="n">AsyncEmbeddingEngine</span><span class="o">.</span><span class="n">from_args</span><span class="p">(</span>
</span><span id="__span-0-8"><a href="#__codelineno-0-8" id="__codelineno-0-8" name="__codelineno-0-8"></a> <span class="n">EngineArgs</span><span class="p">(</span><span class="n">model_name_or_path</span><span class="o">=</span><span class="s2">"BAAI/bge-small-en-v1.5"</span><span class="p">,</span> <span class="n">engine</span><span class="o">=</span><span class="s2">"torch"</span><span class="p">,</span>
</span><span id="__span-0-9"><a href="#__codelineno-0-9" id="__codelineno-0-9" name="__codelineno-0-9"></a> <span class="n">lengths_via_tokenize</span><span class="o">=</span><span class="kc">True</span>
</span><span id="__span-0-10"><a href="#__codelineno-0-10" id="__codelineno-0-10" name="__codelineno-0-10"></a> <span class="p">)</span>
</span><span id="__span-0-11"><a href="#__codelineno-0-11" id="__codelineno-0-11" name="__codelineno-0-11"></a><span class="p">)</span>
</span><span id="__span-0-12"><a href="#__codelineno-0-12" id="__codelineno-0-12" name="__codelineno-0-12"></a>
</span><span id="__span-0-13"><a href="#__codelineno-0-13" id="__codelineno-0-13" name="__codelineno-0-13"></a><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span><span id="__span-0-14"><a href="#__codelineno-0-14" id="__codelineno-0-14" name="__codelineno-0-14"></a> <span class="k">async</span> <span class="k">with</span> <span class="n">engine</span><span class="p">:</span> <span class="c1"># Context manager initializes and terminates the engine</span>
</span><span id="__span-0-15"><a href="#__codelineno-0-15" id="__codelineno-0-15" name="__codelineno-0-15"></a> <span class="c1"># usage is total token count according to tokenizer.</span>
</span><span id="__span-0-16"><a href="#__codelineno-0-16" id="__codelineno-0-16" name="__codelineno-0-16"></a> <span class="n">embeddings</span><span class="p">,</span> <span class="n">usage</span> <span class="o">=</span> <span class="k">await</span> <span class="n">engine</span><span class="o">.</span><span class="n">embed</span><span class="p">(</span><span class="n">sentences</span><span class="o">=</span><span class="n">sentences</span><span class="p">)</span>
</span><span id="__span-0-17"><a href="#__codelineno-0-17" id="__codelineno-0-17" name="__codelineno-0-17"></a> <span class="c1"># Embeddings are now available for use</span>
</span><span id="__span-0-18"><a href="#__codelineno-0-18" id="__codelineno-0-18" name="__codelineno-0-18"></a><span class="n">asyncio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</span><span id="__span-0-3"><a href="#__codelineno-0-3" id="__codelineno-0-3" name="__codelineno-0-3"></a><span class="kn">from</span> <span class="nn">infinity_emb.log_handler</span> <span class="kn">import</span> <span class="n">logger</span>
</span><span id="__span-0-4"><a href="#__codelineno-0-4" id="__codelineno-0-4" name="__codelineno-0-4"></a><span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="c1"># Debug</span>
</span><span id="__span-0-5"><a href="#__codelineno-0-5" id="__codelineno-0-5" name="__codelineno-0-5"></a>
</span><span id="__span-0-6"><a href="#__codelineno-0-6" id="__codelineno-0-6" name="__codelineno-0-6"></a><span class="c1"># Define sentences for embedding</span>
</span><span id="__span-0-7"><a href="#__codelineno-0-7" id="__codelineno-0-7" name="__codelineno-0-7"></a><span class="n">sentences</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"Embed this sentence via Infinity."</span><span class="p">,</span> <span class="s2">"Paris is in France."</span><span class="p">]</span>
</span><span id="__span-0-8"><a href="#__codelineno-0-8" id="__codelineno-0-8" name="__codelineno-0-8"></a><span class="c1"># Initialize the embedding engine with model specifications</span>
</span><span id="__span-0-9"><a href="#__codelineno-0-9" id="__codelineno-0-9" name="__codelineno-0-9"></a><span class="n">engine</span> <span class="o">=</span> <span class="n">AsyncEmbeddingEngine</span><span class="o">.</span><span class="n">from_args</span><span class="p">(</span>
</span><span id="__span-0-10"><a href="#__codelineno-0-10" id="__codelineno-0-10" name="__codelineno-0-10"></a> <span class="n">EngineArgs</span><span class="p">(</span>
</span><span id="__span-0-11"><a href="#__codelineno-0-11" id="__codelineno-0-11" name="__codelineno-0-11"></a> <span class="n">model_name_or_path</span><span class="o">=</span><span class="s2">"BAAI/bge-small-en-v1.5"</span><span class="p">,</span>
</span><span id="__span-0-12"><a href="#__codelineno-0-12" id="__codelineno-0-12" name="__codelineno-0-12"></a> <span class="n">engine</span><span class="o">=</span><span class="s2">"torch"</span><span class="p">,</span>
</span><span id="__span-0-13"><a href="#__codelineno-0-13" id="__codelineno-0-13" name="__codelineno-0-13"></a> <span class="n">lengths_via_tokenize</span><span class="o">=</span><span class="kc">True</span>
</span><span id="__span-0-14"><a href="#__codelineno-0-14" id="__codelineno-0-14" name="__codelineno-0-14"></a> <span class="p">)</span>
</span><span id="__span-0-15"><a href="#__codelineno-0-15" id="__codelineno-0-15" name="__codelineno-0-15"></a><span class="p">)</span>
</span><span id="__span-0-16"><a href="#__codelineno-0-16" id="__codelineno-0-16" name="__codelineno-0-16"></a>
</span><span id="__span-0-17"><a href="#__codelineno-0-17" id="__codelineno-0-17" name="__codelineno-0-17"></a><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span><span id="__span-0-18"><a href="#__codelineno-0-18" id="__codelineno-0-18" name="__codelineno-0-18"></a> <span class="k">async</span> <span class="k">with</span> <span class="n">engine</span><span class="p">:</span> <span class="c1"># Context manager initializes and terminates the engine</span>
</span><span id="__span-0-19"><a href="#__codelineno-0-19" id="__codelineno-0-19" name="__codelineno-0-19"></a>
</span><span id="__span-0-20"><a href="#__codelineno-0-20" id="__codelineno-0-20" name="__codelineno-0-20"></a> <span class="n">job1</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">engine</span><span class="o">.</span><span class="n">embed</span><span class="p">(</span><span class="n">sentences</span><span class="o">=</span><span class="n">sentences</span><span class="p">))</span>
</span><span id="__span-0-21"><a href="#__codelineno-0-21" id="__codelineno-0-21" name="__codelineno-0-21"></a> <span class="c1"># submit a second job in parallel</span>
</span><span id="__span-0-22"><a href="#__codelineno-0-22" id="__codelineno-0-22" name="__codelineno-0-22"></a> <span class="n">job2</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">engine</span><span class="o">.</span><span class="n">embed</span><span class="p">(</span><span class="n">sentences</span><span class="o">=</span><span class="p">[</span><span class="s2">"Hello world"</span><span class="p">]))</span>
</span><span id="__span-0-23"><a href="#__codelineno-0-23" id="__codelineno-0-23" name="__codelineno-0-23"></a> <span class="c1"># usage is total token count according to tokenizer.</span>
</span><span id="__span-0-24"><a href="#__codelineno-0-24" id="__codelineno-0-24" name="__codelineno-0-24"></a> <span class="n">embeddings</span><span class="p">,</span> <span class="n">usage</span> <span class="o">=</span> <span class="k">await</span> <span class="n">job1</span>
</span><span id="__span-0-25"><a href="#__codelineno-0-25" id="__codelineno-0-25" name="__codelineno-0-25"></a> <span class="n">embeddings2</span><span class="p">,</span> <span class="n">usage2</span> <span class="o">=</span> <span class="k">await</span> <span class="n">job2</span>
</span><span id="__span-0-26"><a href="#__codelineno-0-26" id="__codelineno-0-26" name="__codelineno-0-26"></a> <span class="c1"># Embeddings are now available for use - they ran in the same batch.</span>
</span><span id="__span-0-27"><a href="#__codelineno-0-27" id="__codelineno-0-27" name="__codelineno-0-27"></a> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"for </span><span class="si">{</span><span class="n">sentences</span><span class="si">}</span><span class="s2">, generated embeddings </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">embeddings</span><span class="p">)</span><span class="si">}</span><span class="s2"> with tot_tokens=</span><span class="si">{</span><span class="n">usage</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</span><span id="__span-0-28"><a href="#__codelineno-0-28" id="__codelineno-0-28" name="__codelineno-0-28"></a><span class="n">asyncio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span>
</span><span id="__span-0-29"><a href="#__codelineno-0-29" id="__codelineno-0-29" name="__codelineno-0-29"></a> <span class="n">main</span><span class="p">()</span>
</span><span id="__span-0-30"><a href="#__codelineno-0-30" id="__codelineno-0-30" name="__codelineno-0-30"></a><span class="p">)</span>
</span></code></pre></div>
<h2 id="reranker">Reranker</h2>
<p>Enhance search results by reranking based on the similarity between a query and a set of documents. This feature is particularly useful in conjunction with vector databases and embeddings, or as a standalone solution for small datasets. Ensure you choose a Hugging Face model designed for sequence classification with a single output class, e.g. "BAAI/bge-reranker-base". Further models are usually listed as <code>rerank</code> models on HuggingFace https://huggingface.co/models?pipeline_tag=text-classification&amp;sort=trending&amp;search=rerank. </p>
Expand Down
Binary file modified main/sitemap.xml.gz
Binary file not shown.
4 changes: 2 additions & 2 deletions main/swagger_ui/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@
<li class="md-nav__item">
<a class="md-nav__link" href="../python_engine/">
<span class="md-ellipsis">
Python engine
Python Engine Integration
</span>
</a>
</li>
Expand Down Expand Up @@ -137,7 +137,7 @@
<h1 id="swagger-ui">Swagger UI</h1>
<p>Disclaimer: This is the current Swagger UI based on the main branch - which may differ from the Swagger UI of this release.
The Swagger UI and <code>openapi.json</code> will be available under <code>{url}:{port}/docs</code>, in this case <code>http://localhost:7997/docs</code>.</p>
<p><iframe class="swagger-ui-iframe" frameborder="0" id="d72d40d9" src="swagger-d72d40d9.html" style="overflow:hidden;width:100%;" width="100%"></iframe></p>
<p><iframe class="swagger-ui-iframe" frameborder="0" id="3d6482e2" src="swagger-3d6482e2.html" style="overflow:hidden;width:100%;" width="100%"></iframe></p>
</article>
</div>
<script>var target=document.getElementById(location.hash.slice(1));target&&target.name&&(target.checked=target.name.startsWith("__tabbed_"))</script>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
}

const resize_ob = new ResizeObserver(function(entries) {
parent.update_swagger_ui_iframe_height("d72d40d9");
parent.update_swagger_ui_iframe_height("3d6482e2");
});

// start observing for resizing
Expand Down

0 comments on commit 8785217

Please sign in to comment.