Resolve

h2oai · Aug 7, 2024 · 757f4bf · 757f4bf
2 parents ac3bf0d + d3146d2
commit 757f4bf
Show file tree

Hide file tree

Showing 123 changed files with 6,878 additions and 2,321 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,74 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.3.11] - 2024-08-02
+
+### Added
+
+- **📊 Model Information Display**: Added visuals for model selection, including images next to model names for more intuitive navigation.
+- **🗣 ElevenLabs Voice Adaptations**: Voice enhancements including support for ElevenLabs voice ID by name for personalized vocal interactions.
+- **⌨️ Arrow Keys Model Selection**: Users can now use arrow keys for quicker model selection, enhancing accessibility.
+- **🔍 Fuzzy Search in Model Selector**: Enhanced model selector with fuzzy search to locate models swiftly, including descriptions.
+- **🕹️ ComfyUI Flux Image Generation**: Added support for the new Flux image gen model; introduces environment controls like weight precision and CLIP model options in Settings.
+- **💾 Display File Size for Uploads**: Enhanced file interface now displays file size, preparing for upcoming upload restrictions.
+- **🎚️ Advanced Params "Min P"**: Added 'Min P' parameter in the advanced settings for customized model precision control.
+- **🔒 Enhanced OAuth**: Introduced custom redirect URI support for OAuth behind reverse proxies, enabling safer authentication processes.
+- **🖥 Enhanced Latex Rendering**: Adjustments made to latex rendering processes, now accurately detecting and presenting latex inputs from text.
+- **🌐 Internationalization**: Enhanced with new Romanian and updated Vietnamese and Ukrainian translations, helping broaden accessibility for international users.
+
+### Fixed
+
+- **🔧 Tags Handling in Document Upload**: Tags are now properly sent to the upload document handler, resolving issues with missing metadata.
+- **🖥️ Sensitive Input Fields**: Corrected browser misinterpretation of secure input fields, preventing misclassification as password fields.
+- **📂 Static Path Resolution in PDF Generation**: Fixed static paths that adjust dynamically to prevent issues across various environments.
+
+### Changed
+
+- **🎨 UI/UX Styling Enhancements**: Multiple minor styling updates for a cleaner and more intuitive user interface.
+- **🚧 Refactoring Various Components**: Numerous refactoring changes across styling, file handling, and function simplifications for clarity and performance.
+- **🎛️ User Valves Management**: Moved user valves from settings to direct chat controls for more user-friendly access during interactions.
+
+### Removed
+
+- **⚙️ Health Check Logging**: Removed verbose logging from the health checking processes to declutter logs and improve backend performance.
+
+## [0.3.10] - 2024-07-17
+
+### Fixed
+
+- **🔄 Improved File Upload**: Addressed the issue where file uploads lacked animation.
+- **💬 Chat Continuity**: Fixed a problem where existing chats were not functioning properly in some instances.
+- **🗂️ Chat File Reset**: Resolved the issue of chat files not resetting for new conversations, now ensuring a clean slate for each chat session.
+- **📁 Document Workspace Uploads**: Corrected the handling of document uploads in the workspace using the Files API.
+
+## [0.3.9] - 2024-07-17
+
+### Added
+
+- **📁 Files Chat Controls**: We've reverted to the old file handling behavior where uploaded files are always included. You can now manage files directly within the chat controls section, giving you the ability to remove files as needed.
+- **🔧 "Action" Function Support**: Introducing a new "Action" function to write custom buttons to the message toolbar. This feature enables more interactive messaging, with documentation coming soon.
+- **📜 Citations Handling**: For newly uploaded files in documents workspace, citations will now display the actual filename. Additionally, you can click on these filenames to open the file in a new tab for easier access.
+- **🛠️ Event Emitter and Call Updates**: Enhanced 'event_emitter' to allow message replacement and 'event_call' to support text input for Tools and Functions. Detailed documentation will be provided shortly.
+- **🎨 Styling Refactor**: Various styling updates for a cleaner and more cohesive user interface.
+- **🌐 Enhanced Translations**: Improved translations for Catalan, Ukrainian, and Brazilian Portuguese.
+
+### Fixed
+
+- **🔧 Chat Controls Priority**: Resolved an issue where Chat Controls values were being overridden by model information parameters. The priority is now Chat Controls, followed by Global Settings, then Model Settings.
+- **🪲 Debug Logs**: Fixed an issue where debug logs were not being logged properly.
+- **🔑 Automatic1111 Auth Key**: The auth key for Automatic1111 is no longer required.
+- **📝 Title Generation**: Ensured that the title generation runs only once, even when multiple models are in a chat.
+- **✅ Boolean Values in Params**: Added support for boolean values in parameters.
+- **🖼️ Files Overlay Styling**: Fixed the styling issue with the files overlay.
+
+### Changed
+
+- **⬆️ Dependency Updates**
+  - Upgraded 'pydantic' from version 2.7.1 to 2.8.2.
+  - Upgraded 'sqlalchemy' from version 2.0.30 to 2.0.31.
+  - Upgraded 'unstructured' from version 0.14.9 to 0.14.10.
+  - Upgraded 'chromadb' from version 0.5.3 to 0.5.4.
+
 ## [0.3.8] - 2024-07-09
 
 ### Added

diff --git a/Dockerfile b/Dockerfile
@@ -151,7 +151,7 @@ COPY --chown=$UID:$GID ./backend .
 
 EXPOSE 8080
 
-HEALTHCHECK CMD curl --silent --fail http://localhost:8080/health | jq -e '.status == true' || exit 1
+HEALTHCHECK CMD curl --silent --fail http://localhost:${PORT:-8080}/health | jq -ne 'input.status == true' || exit 1
 
 USER $UID:$GID
 

diff --git a/backend/apps/audio/main.py b/backend/apps/audio/main.py
@@ -10,12 +10,12 @@
     File,
     Form,
 )
-
 from fastapi.responses import StreamingResponse, JSONResponse, FileResponse
 
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
 
+from typing import List
 import uuid
 import requests
 import hashlib
@@ -31,6 +31,7 @@
 )
 from utils.misc import calculate_sha256
 
+
 from config import (
     SRC_LOG_LEVELS,
     CACHE_DIR,
@@ -43,6 +44,7 @@
     AUDIO_STT_OPENAI_API_KEY,
     AUDIO_TTS_OPENAI_API_BASE_URL,
     AUDIO_TTS_OPENAI_API_KEY,
+    AUDIO_TTS_API_KEY,
     AUDIO_STT_ENGINE,
     AUDIO_STT_MODEL,
     AUDIO_TTS_ENGINE,
@@ -75,6 +77,7 @@
 app.state.config.TTS_ENGINE = AUDIO_TTS_ENGINE
 app.state.config.TTS_MODEL = AUDIO_TTS_MODEL
 app.state.config.TTS_VOICE = AUDIO_TTS_VOICE
+app.state.config.TTS_API_KEY = AUDIO_TTS_API_KEY
 
 # setting device type for whisper model
 whisper_device_type = DEVICE_TYPE if DEVICE_TYPE and DEVICE_TYPE == "cuda" else "cpu"
@@ -87,6 +90,7 @@
 class TTSConfigForm(BaseModel):
     OPENAI_API_BASE_URL: str
     OPENAI_API_KEY: str
+    API_KEY: str
     ENGINE: str
     MODEL: str
     VOICE: str
@@ -137,6 +141,7 @@ async def get_audio_config(user=Depends(get_admin_user)):
         "tts": {
             "OPENAI_API_BASE_URL": app.state.config.TTS_OPENAI_API_BASE_URL,
             "OPENAI_API_KEY": app.state.config.TTS_OPENAI_API_KEY,
+            "API_KEY": app.state.config.TTS_API_KEY,
             "ENGINE": app.state.config.TTS_ENGINE,
             "MODEL": app.state.config.TTS_MODEL,
             "VOICE": app.state.config.TTS_VOICE,
@@ -156,6 +161,7 @@ async def update_audio_config(
 ):
     app.state.config.TTS_OPENAI_API_BASE_URL = form_data.tts.OPENAI_API_BASE_URL
     app.state.config.TTS_OPENAI_API_KEY = form_data.tts.OPENAI_API_KEY
+    app.state.config.TTS_API_KEY = form_data.tts.API_KEY
     app.state.config.TTS_ENGINE = form_data.tts.ENGINE
     app.state.config.TTS_MODEL = form_data.tts.MODEL
     app.state.config.TTS_VOICE = form_data.tts.VOICE
@@ -169,6 +175,7 @@ async def update_audio_config(
         "tts": {
             "OPENAI_API_BASE_URL": app.state.config.TTS_OPENAI_API_BASE_URL,
             "OPENAI_API_KEY": app.state.config.TTS_OPENAI_API_KEY,
+            "API_KEY": app.state.config.TTS_API_KEY,
             "ENGINE": app.state.config.TTS_ENGINE,
             "MODEL": app.state.config.TTS_MODEL,
             "VOICE": app.state.config.TTS_VOICE,
@@ -194,55 +201,111 @@ async def speech(request: Request, user=Depends(get_verified_user)):
     if file_path.is_file():
         return FileResponse(file_path)
 
-    headers = {}
-    headers["Authorization"] = f"Bearer {app.state.config.TTS_OPENAI_API_KEY}"
-    headers["Content-Type"] = "application/json"
-
-    try:
-        body = body.decode("utf-8")
-        body = json.loads(body)
-        body["model"] = app.state.config.TTS_MODEL
-        body = json.dumps(body).encode("utf-8")
-    except Exception as e:
-        pass
-
-    r = None
-    try:
-        r = requests.post(
-            url=f"{app.state.config.TTS_OPENAI_API_BASE_URL}/audio/speech",
-            data=body,
-            headers=headers,
-            stream=True,
-        )
-
-        r.raise_for_status()
-
-        # Save the streaming content to a file
-        with open(file_path, "wb") as f:
-            for chunk in r.iter_content(chunk_size=8192):
-                f.write(chunk)
-
-        with open(file_body_path, "w") as f:
-            json.dump(json.loads(body.decode("utf-8")), f)
-
-        # Return the saved file
-        return FileResponse(file_path)
+    if app.state.config.TTS_ENGINE == "openai":
+        headers = {}
+        headers["Authorization"] = f"Bearer {app.state.config.TTS_OPENAI_API_KEY}"
+        headers["Content-Type"] = "application/json"
+
+        try:
+            body = body.decode("utf-8")
+            body = json.loads(body)
+            body["model"] = app.state.config.TTS_MODEL
+            body = json.dumps(body).encode("utf-8")
+        except Exception as e:
+            pass
+
+        r = None
+        try:
+            r = requests.post(
+                url=f"{app.state.config.TTS_OPENAI_API_BASE_URL}/audio/speech",
+                data=body,
+                headers=headers,
+                stream=True,
+            )
 
-    except Exception as e:
-        log.exception(e)
-        error_detail = "Open WebUI: Server Connection Error"
-        if r is not None:
-            try:
-                res = r.json()
-                if "error" in res:
-                    error_detail = f"External: {res['error']['message']}"
-            except:
-                error_detail = f"External: {e}"
+            r.raise_for_status()
+
+            # Save the streaming content to a file
+            with open(file_path, "wb") as f:
+                for chunk in r.iter_content(chunk_size=8192):
+                    f.write(chunk)
+
+            with open(file_body_path, "w") as f:
+                json.dump(json.loads(body.decode("utf-8")), f)
+
+            # Return the saved file
+            return FileResponse(file_path)
+
+        except Exception as e:
+            log.exception(e)
+            error_detail = "Open WebUI: Server Connection Error"
+            if r is not None:
+                try:
+                    res = r.json()
+                    if "error" in res:
+                        error_detail = f"External: {res['error']['message']}"
+                except:
+                    error_detail = f"External: {e}"
+
+            raise HTTPException(
+                status_code=r.status_code if r != None else 500,
+                detail=error_detail,
+            )
 
-        raise HTTPException(
-            status_code=r.status_code if r != None else 500,
-            detail=error_detail,
-        )
+    elif app.state.config.TTS_ENGINE == "elevenlabs":
+        payload = None
+        try:
+            payload = json.loads(body.decode("utf-8"))
+        except Exception as e:
+            log.exception(e)
+            raise HTTPException(status_code=400, detail="Invalid JSON payload")
+
+        voice_id = payload.get("voice", "")
+        url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
+
+        headers = {
+            "Accept": "audio/mpeg",
+            "Content-Type": "application/json",
+            "xi-api-key": app.state.config.TTS_API_KEY,
+        }
+
+        data = {
+            "text": payload["input"],
+            "model_id": app.state.config.TTS_MODEL,
+            "voice_settings": {"stability": 0.5, "similarity_boost": 0.5},
+        }
+
+        try:
+            r = requests.post(url, json=data, headers=headers)
+
+            r.raise_for_status()
+
+            # Save the streaming content to a file
+            with open(file_path, "wb") as f:
+                for chunk in r.iter_content(chunk_size=8192):
+                    f.write(chunk)
+
+            with open(file_body_path, "w") as f:
+                json.dump(json.loads(body.decode("utf-8")), f)
+
+            # Return the saved file
+            return FileResponse(file_path)
+
+        except Exception as e:
+            log.exception(e)
+            error_detail = "Open WebUI: Server Connection Error"
+            if r is not None:
+                try:
+                    res = r.json()
+                    if "error" in res:
+                        error_detail = f"External: {res['error']['message']}"
+                except:
+                    error_detail = f"External: {e}"
+
+            raise HTTPException(
+                status_code=r.status_code if r != None else 500,
+                detail=error_detail,
+            )
 
 
 @app.post("/transcriptions")
@@ -373,3 +436,69 @@ def transcribe(
             status_code=status.HTTP_400_BAD_REQUEST,
             detail=ERROR_MESSAGES.DEFAULT(e),
         )
+
+
+def get_available_models() -> List[dict]:
+    if app.state.config.TTS_ENGINE == "openai":
+        return [{"id": "tts-1"}, {"id": "tts-1-hd"}]
+    elif app.state.config.TTS_ENGINE == "elevenlabs":
+        headers = {
+            "xi-api-key": app.state.config.TTS_API_KEY,
+            "Content-Type": "application/json",
+        }
+
+        try:
+            response = requests.get(
+                "https://api.elevenlabs.io/v1/models", headers=headers
+            )
+            response.raise_for_status()
+            models = response.json()
+            return [
+                {"name": model["name"], "id": model["model_id"]} for model in models
+            ]
+        except requests.RequestException as e:
+            log.error(f"Error fetching voices: {str(e)}")
+    return []
+
+
+@app.get("/models")
+async def get_models(user=Depends(get_verified_user)):
+    return {"models": get_available_models()}
+
+
+def get_available_voices() -> List[dict]:
+    if app.state.config.TTS_ENGINE == "openai":
+        return [
+            {"name": "alloy", "id": "alloy"},
+            {"name": "echo", "id": "echo"},
+            {"name": "fable", "id": "fable"},
+            {"name": "onyx", "id": "onyx"},
+            {"name": "nova", "id": "nova"},
+            {"name": "shimmer", "id": "shimmer"},
+        ]
+    elif app.state.config.TTS_ENGINE == "elevenlabs":
+        headers = {
+            "xi-api-key": app.state.config.TTS_API_KEY,
+            "Content-Type": "application/json",
+        }
+
+        try:
+            response = requests.get(
+                "https://api.elevenlabs.io/v1/voices", headers=headers
+            )
+            response.raise_for_status()
+            voices_data = response.json()
+
+            voices = []
+            for voice in voices_data.get("voices", []):
+                voices.append({"name": voice["name"], "id": voice["voice_id"]})
+            return voices
+        except requests.RequestException as e:
+            log.error(f"Error fetching voices: {str(e)}")
+
+    return []
+
+
+@app.get("/voices")
+async def get_voices(user=Depends(get_verified_user)):
+    return {"voices": get_available_voices()}