The 2025.0 is a major release adding support for Windows native deployments and improvements to the generative use cases.
New feature - Windows native server deployment
-
This release enables model server deployment on Windows operating systems as a binary application
-
Full support for generative endpoints – text generation and embeddings based on OpenAI API, reranking based on Cohere API
-
Functional parity with linux version with several minor differences: cloud storage, CAPI interface, DAG pipelines - read more
-
It is targeted on client machines with Windows 11 and Data Center environment with Windows 2022 Server OS
-
Demos are updated to work both on Linux and Windows. Check the installation guide
Other Changes and Improvements
-
Added official support for Battle Mage GPU, Arrow Lake CPU, iGPU, NPU and Lunar Lake CPU, iGPU and NPU
-
Updated base docker images – added Ubuntu 24 and RedHat UBI 9, dropped Ubuntu 20 and RedHat UBI 8
-
Extended chat/completions API to support
max_completion_tokens
parameter and messages content as an array. Those changes are to make the API keep compatibility with OpenAI API. -
Truncate option in embeddings endpoint – It is now possible to export the embeddings model with option to truncate the input automatically to match the embeddings context length. By default, the error is raised when too long input is passed.
-
Speculative decoding algorithm added to text generations – Check the demo
-
Added direct support for models without named outputs – when models don’t have named outputs, generic names will be assigned in the model initialization with a pattern
out_<index>
-
Added histogram metric for tracking MediaPipe graph processing duration
-
Performance improvements
Breaking changes
- Discontinued support for NVIDIA plugin
Bug fixes
-
Corrected behavior of cancelling text generation for disconnected clients
-
Fixed detecting of the model context length for embeddings endpoint
-
Security and stability improvements
You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2025.0
- CPU device supportdocker pull openvino/model_server:2025.0-gpu
- GPU, NPU and CPU device support
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog