wagtail · tomusher · Mar 26, 2024 · Jan 5, 2024 · Feb 29, 2024 · Feb 29, 2024
diff --git a/docs/.pages b/docs/.pages
@@ -1,6 +1,7 @@
 nav:
   - installation.md
   - editor-integration.md
+  - images-integration.md
   - ai-backends.md
   - text-splitting.md
   - contributing.md
diff --git a/docs/ai-backends.md b/docs/ai-backends.md
@@ -2,11 +2,11 @@
 
 Wagtail AI can be configured to use different backends to support different AI services.
 
-Currently the only (and default) backend available in Wagtail AI is the ["LLM" backend](#the-llm-backend).
+The default backend for text completion available in Wagtail AI is the ["LLM" backend](#the-llm-backend). To enable [image description](../images-integration/), you can use the ["OpenAI" backend](#the-openai-backend).
 
 ## The "LLM" backend
 
-This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins.
+This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins. At the moment it only supports [text completion](../editor-integration/).
 
 By default, it is configured to use OpenAI's `gpt-3.5-turbo` model.
 
@@ -155,3 +155,40 @@ You can find the "LLM" library specific instructions at: https://llm.datasette.i
        }
    }
    ```
+
+## The "OpenAI" backend
+
+Wagtail AI includes a backend for OpenAI that supports both [text completion](../editor-integration/) and [image description](../images-integration/).
+
+To use the OpenAI backend, you need an API key, which must be set in the `OPENAI_API_KEY` environment variable. Then, configure it in your Django project settings:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "default": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4",
+            },
+        },
+    },
+}
+```
+
+### Specifying another OpenAI model
+
+The OpenAI backend supports the use of custom models. For newer models that are not known to Wagtail AI, you must also specify a token limit:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "vision": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4-vision-preview",
+                "TOKEN_LIMIT": 300,
+            },
+        },
+    },
+}
+```
diff --git a/docs/editor-integration.md b/docs/editor-integration.md
@@ -15,3 +15,21 @@ When creating prompts you can provide a label and description to help describe t
 
 - 'Append after existing content' - keep your existing content intact and add the response from the AI to the end (useful for completions/suggestions).
 - 'Replace content' - replace the content in the editor with the response from the AI (useful for corrections, rewrites and translations.)
+
+### Configuring the AI backend
+
+By default, the `"default"` model will be used for text operations in the editor. To use a different model, set `TEXT_COMPLETION_BACKEND` to the name of another model:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "gpt4": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4",
+            },
+        },
+    },
+    "TEXT_COMPLETION_BACKEND": "gpt4",
+}
+```
diff --git a/docs/images-integration.md b/docs/images-integration.md
@@ -0,0 +1,48 @@
+# Images integration
+
+Wagtail AI integrates with the image edit form to provide AI-generated descriptions to images. The integration requires a backend that supports image descriptions, such as [the OpenAI backend](../ai-backends/#the-openai-backend).
+
+## Configuration
+
+1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the model:
+   ```python
+   WAGTAIL_AI = {
+       "BACKENDS": {
+           "vision": {
+               "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+               "CONFIG": {
+                   "MODEL_ID": "gpt-4-vision-preview",
+                   "TOKEN_LIMIT": 300,
+               },
+           },
+       },
+       "IMAGE_DESCRIPTION_BACKEND": "vision",
+   }
+   ```
+2. In the Django project settings, configure a [custom Wagtail image base form](https://docs.wagtail.org/en/stable/reference/settings.html#wagtailimages-image-form-base):
+   ```python
+   WAGTAILIMAGES_IMAGE_FORM_BASE = "wagtail_ai.forms.DescribeImageForm"
+   ```
+
+Now, when you upload or edit an image, a magic wand icon should appear next to the _title_ field. Clicking on the icon will invoke the AI backend to generate an image description.
+
+## Custom prompt
+
+Wagtail AI includes a simple prompt to ask the AI to generate an image description:
+
+> Describe this image. Make the description suitable for use as an alt-text.
+
+If you want to use a different prompt, override the `IMAGE_DESCRIPTION_PROMPT` value:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        # ...
+    },
+    "IMAGE_DESCRIPTION_PROMPT": "Describe this image in the voice of Sir David Attenborough.",
+}
+```
+
+## Custom form
+
+Wagtail AI includes an image form that enhances the `title` field with an AI button. If you are using a [custom image model](https://docs.wagtail.org/en/stable/advanced_topics/images/custom_image_model.html), can provide your own form to target another field. Check out the implementation of `DescribeImageForm` in [`forms.py`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/forms.py), adapt it to your needs, and set it as `WAGTAILIMAGES_IMAGE_FORM_BASE`.
diff --git a/docs/index.md b/docs/index.md
@@ -4,6 +4,7 @@ Wagtail AI integrates Wagtail with OpenAI's APIs (think ChatGPT) to help you wri
 
 Right now, it can:
 
-* Finish what you've started - write some text and tell Wagtail AI to finish it off for you
-* Correct your spelling/grammar
-* Let you add your own custom prompts
+* Finish what you've started - write some text and tell Wagtail AI to finish it off for you.
+* Correct your spelling/grammar.
+* Generate image descriptions - useful for [image alt text](https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageElement/alt).
+* Let you add your own custom prompts.
diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -22,6 +22,7 @@
     "postcss": "^8.4.5",
     "postcss-loader": "^6.2.1",
     "postcss-preset-env": "^8.0.1",
+    "raw-loader": "^4.0.2",
     "style-loader": "^3.3.1",
     "ts-loader": "^9.2.6",
     "typescript": "^4.5.5",

diff --git a/src/wagtail_ai/ai/__init__.py b/src/wagtail_ai/ai/__init__.py
@@ -11,7 +11,7 @@
 from ..text_splitters.length import NaiveTextSplitterCalculator
 from ..types import TextSplitterLengthCalculatorProtocol, TextSplitterProtocol
 from ..utils import deprecation
-from .base import AIBackend, BaseAIBackendConfigSettings
+from .base import AIBackend, BackendFeature, BaseAIBackendConfigSettings
 
 
 class TextSplittingSettingsDict(TypedDict):
@@ -161,3 +161,22 @@ def get_ai_backend(alias: str) -> AIBackend:
     )
 
     return ai_backend_cls(config=config)
+
+
+class BackendNotFound(Exception):
+    pass
+
+
+def get_backend(feature: BackendFeature = BackendFeature.TEXT_COMPLETION) -> AIBackend:
+    match feature:
+        case BackendFeature.TEXT_COMPLETION:
+            alias = settings.WAGTAIL_AI.get("TEXT_COMPLETION_BACKEND", "default")
+        case BackendFeature.IMAGE_DESCRIPTION:
+            alias = settings.WAGTAIL_AI.get("IMAGE_DESCRIPTION_BACKEND")
+        case _:
+            alias = None
+
+    if alias is None:
+        raise BackendNotFound(f"No backend found for {feature.name}")
+
+    return get_ai_backend(alias)
diff --git a/src/wagtail_ai/ai/base.py b/src/wagtail_ai/ai/base.py
@@ -1,5 +1,6 @@
-from abc import ABCMeta, abstractmethod
+from abc import ABCMeta
 from dataclasses import dataclass
+from enum import Enum
 from typing import (
     Any,
     ClassVar,
@@ -13,6 +14,7 @@
 )
 
 from django.core.exceptions import ImproperlyConfigured
+from django.core.files import File
 
 from .. import tokens
 from ..types import (
@@ -22,6 +24,11 @@
 )
 
 
+class BackendFeature(Enum):
+    TEXT_COMPLETION = "TEXT_COMPLETION"
+    IMAGE_DESCRIPTION = "IMAGE_DESCRIPTION"
+
+
 class BaseAIBackendConfigSettings(TypedDict):
     MODEL_ID: Required[str]
     TOKEN_LIMIT: NotRequired[int | None]
@@ -99,14 +106,13 @@ def __init__(
     ) -> None:
         self.config = config
 
-    @abstractmethod
     def prompt_with_context(
         self, *, pre_prompt: str, context: str, post_prompt: str | None = None
     ) -> AIResponse:
         """
         Given a prompt and a context, return a response.
         """
-        ...
+        raise NotImplementedError("This backend does not support text completion")
 
     def get_text_splitter(self) -> TextSplitterProtocol:
         return self.config.text_splitter_class(
@@ -116,3 +122,6 @@ def get_text_splitter(self) -> TextSplitterProtocol:
 
     def get_splitter_length_calculator(self) -> TextSplitterLengthCalculatorProtocol:
         return self.config.text_splitter_length_calculator_class()
+
+    def describe_image(self, *, image_file: File, prompt: str) -> AIResponse:
+        raise NotImplementedError("This backend does not support image description")
diff --git a/src/wagtail_ai/ai/echo.py b/src/wagtail_ai/ai/echo.py
@@ -5,6 +5,7 @@
 from typing import Any, NotRequired, Self
 
 from django.core.exceptions import ImproperlyConfigured
+from django.core.files import File
 
 from .base import (
     AIBackend,
@@ -62,10 +63,18 @@ class EchoBackend(AIBackend[EchoBackendConfig]):
     def prompt_with_context(
         self, *, pre_prompt: str, context: str, post_prompt: str | None = None
     ) -> AIResponse:
+        return self.get_response(
+            ["This", "is", "an", "echo", "backend:", *context.split()]
+        )
+
+    def describe_image(self, *, image_file: File, prompt: str) -> AIResponse:
+        return self.get_response(
+            ["This", "is", "an", "echo", "backend:", image_file.name]
+        )
+
+    def get_response(self, words):
         def response_iterator() -> Generator[str, None, None]:
-            response = ["This", "is", "an", "echo", "backend:"]
-            response += context.split()
-            for word in response:
+            for word in words:
                 if (
                     self.config.max_word_sleep_seconds is not None
                     and self.config.max_word_sleep_seconds > 0