Merge pull request #81 from mgax/image-description-openai

Image description with OpenAI
wagtail · Mar 26, 2024 · 49eada0 · 49eada0
2 parents ded87e6 + 7cdf140
commit 49eada0
Show file tree

Hide file tree

Showing 37 changed files with 1,170 additions and 2,620 deletions.
diff --git a/docs/.pages b/docs/.pages
@@ -1,6 +1,7 @@
 nav:
   - installation.md
   - editor-integration.md
+  - images-integration.md
   - ai-backends.md
   - text-splitting.md
   - contributing.md
diff --git a/docs/ai-backends.md b/docs/ai-backends.md
@@ -2,11 +2,11 @@
 
 Wagtail AI can be configured to use different backends to support different AI services.
 
-Currently the only (and default) backend available in Wagtail AI is the ["LLM" backend](#the-llm-backend).
+The default backend for text completion available in Wagtail AI is the ["LLM" backend](#the-llm-backend). To enable [image description](../images-integration/), you can use the ["OpenAI" backend](#the-openai-backend).
 
 ## The "LLM" backend
 
-This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins.
+This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins. At the moment it only supports [text completion](../editor-integration/).
 
 By default, it is configured to use OpenAI's `gpt-3.5-turbo` model.
 
@@ -155,3 +155,40 @@ You can find the "LLM" library specific instructions at: https://llm.datasette.i
        }
    }
    ```
+
+## The "OpenAI" backend
+
+Wagtail AI includes a backend for OpenAI that supports both [text completion](../editor-integration/) and [image description](../images-integration/).
+
+To use the OpenAI backend, you need an API key, which must be set in the `OPENAI_API_KEY` environment variable. Then, configure it in your Django project settings:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "default": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4",
+            },
+        },
+    },
+}
+```
+
+### Specifying another OpenAI model
+
+The OpenAI backend supports the use of custom models. For newer models that are not known to Wagtail AI, you must also specify a token limit:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "vision": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4-vision-preview",
+                "TOKEN_LIMIT": 300,
+            },
+        },
+    },
+}
+```
diff --git a/docs/editor-integration.md b/docs/editor-integration.md
@@ -15,3 +15,21 @@ When creating prompts you can provide a label and description to help describe t
 
 - 'Append after existing content' - keep your existing content intact and add the response from the AI to the end (useful for completions/suggestions).
 - 'Replace content' - replace the content in the editor with the response from the AI (useful for corrections, rewrites and translations.)
+
+### Configuring the AI backend
+
+By default, the `"default"` model will be used for text operations in the editor. To use a different model, set `TEXT_COMPLETION_BACKEND` to the name of another backend:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "gpt4": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4",
+            },
+        },
+    },
+    "TEXT_COMPLETION_BACKEND": "gpt4",
+}
+```
diff --git a/docs/images-integration.md b/docs/images-integration.md
@@ -0,0 +1,73 @@
+# Images Integration
+
+Wagtail AI integrates with the image edit form to provide AI-generated descriptions to images. The integration requires a backend that supports image descriptions, such as [the OpenAI backend](../ai-backends/#the-openai-backend).
+
+## Configuration
+
+1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the backend:
+   ```python
+   WAGTAIL_AI = {
+       "BACKENDS": {
+           "vision": {
+               "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+               "CONFIG": {
+                   "MODEL_ID": "gpt-4-vision-preview",
+                   "TOKEN_LIMIT": 300,
+               },
+           },
+       },
+       "IMAGE_DESCRIPTION_BACKEND": "vision",
+   }
+   ```
+2. In the Django project settings, configure a [custom Wagtail image base form](https://docs.wagtail.org/en/stable/reference/settings.html#wagtailimages-image-form-base):
+   ```python
+   WAGTAILIMAGES_IMAGE_FORM_BASE = "wagtail_ai.forms.DescribeImageForm"
+   ```
+
+Now, when you upload or edit an image, a magic wand icon should appear next to the _title_ field. Clicking on the icon will invoke the AI backend to generate an image description.
+
+## Separate backends for text completion and image description
+
+Multi-modal models are faily new, so you may want to configure two different backends for text completion and image description. The `default` model will be used for text completion:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        "default": {
+            "CLASS": "wagtail_ai.ai.llm.LLMBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-3.5-turbo",
+            },
+        },
+        "vision": {
+            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
+            "CONFIG": {
+                "MODEL_ID": "gpt-4-vision-preview",
+                "TOKEN_LIMIT": 300,
+            },
+        },
+    },
+    "IMAGE_DESCRIPTION_BACKEND": "vision",
+}
+```
+
+## Custom prompt
+
+Wagtail AI includes a simple prompt to ask the AI to generate an image description:
+
+> Describe this image. Make the description suitable for use as an alt-text.
+
+If you want to use a different prompt, override the `IMAGE_DESCRIPTION_PROMPT` value:
+
+```python
+WAGTAIL_AI = {
+    "BACKENDS": {
+        # ...
+    },
+    "IMAGE_DESCRIPTION_PROMPT": "Describe this image in the voice of Sir David Attenborough.",
+}
+```
+
+## Custom form
+
+Wagtail AI includes an image form that enhances the `title` field with an AI button. If you are using a [custom image model](https://docs.wagtail.org/en/stable/advanced_topics/images/custom_image_model.html), you can provide your own form to target another field. Check out the implementation of `DescribeImageForm` in [`forms.py`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/forms.py), adapt it to your needs, and set it as `WAGTAILIMAGES_IMAGE_FORM_BASE`.
diff --git a/docs/index.md b/docs/index.md
@@ -4,6 +4,7 @@ Wagtail AI integrates Wagtail with OpenAI's APIs (think ChatGPT) to help you wri
 
 Right now, it can:
 
-* Finish what you've started - write some text and tell Wagtail AI to finish it off for you
-* Correct your spelling/grammar
-* Let you add your own custom prompts
+* Finish what you've started - write some text and tell Wagtail AI to finish it off for you.
+* Correct your spelling/grammar.
+* Generate image descriptions - useful for [image alt text](https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageElement/alt).
+* Let you add your own custom prompts.