Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image description with OpenAI #81

Merged
merged 21 commits into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.pages
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
nav:
- installation.md
- editor-integration.md
- images-integration.md
- ai-backends.md
- text-splitting.md
- contributing.md
41 changes: 39 additions & 2 deletions docs/ai-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

Wagtail AI can be configured to use different backends to support different AI services.

Currently the only (and default) backend available in Wagtail AI is the ["LLM" backend](#the-llm-backend).
The default backend for text completion available in Wagtail AI is the ["LLM" backend](#the-llm-backend). To enable [image description](../images-integration/), you can use the ["OpenAI" backend](#the-openai-backend).

## The "LLM" backend

This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins.
This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins. At the moment it only supports [text completion](../editor-integration/).

By default, it is configured to use OpenAI's `gpt-3.5-turbo` model.

Expand Down Expand Up @@ -155,3 +155,40 @@ You can find the "LLM" library specific instructions at: https://llm.datasette.i
}
}
```

## The "OpenAI" backend

Wagtail AI includes a backend for OpenAI that supports both [text completion](../editor-integration/) and [image description](../images-integration/).

To use the OpenAI backend, you need an API key, which must be set in the `OPENAI_API_KEY` environment variable. Then, configure it in your Django project settings:

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4",
},
},
},
}
```

### Specifying another OpenAI model

The OpenAI backend supports the use of custom models. For newer models that are not known to Wagtail AI, you must also specify a token limit:

```python
WAGTAIL_AI = {
"BACKENDS": {
"vision": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4-vision-preview",
"TOKEN_LIMIT": 300,
},
},
},
}
```
18 changes: 18 additions & 0 deletions docs/editor-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,21 @@ When creating prompts you can provide a label and description to help describe t

- 'Append after existing content' - keep your existing content intact and add the response from the AI to the end (useful for completions/suggestions).
- 'Replace content' - replace the content in the editor with the response from the AI (useful for corrections, rewrites and translations.)

### Configuring the AI backend

By default, the `"default"` model will be used for text operations in the editor. To use a different model, set `TEXT_COMPLETION_BACKEND` to the name of another model:
mgax marked this conversation as resolved.
Show resolved Hide resolved

```python
WAGTAIL_AI = {
"BACKENDS": {
"gpt4": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4",
},
},
},
"TEXT_COMPLETION_BACKEND": "gpt4",
}
```
48 changes: 48 additions & 0 deletions docs/images-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Images integration

Wagtail AI integrates with the image edit form to provide AI-generated descriptions to images. The integration requires a backend that supports image descriptions, such as [the OpenAI backend](../ai-backends/#the-openai-backend).

## Configuration

1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the model:
mgax marked this conversation as resolved.
Show resolved Hide resolved
```python
WAGTAIL_AI = {
"BACKENDS": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This seems pretty clear but I'm not sure it's obvious enough from this example that you need to configure two separate backends if you want both image description and text completion features.

Do you think specifying a IMAGE_DESCRIPTION_MODEL_ID config setting (with a default value) on the OpenAIBackend would make it less of a burden to configure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended the docs to have an example with two backends. I don't think it would help to add another config, and have the backend switch models on the fly, that feels confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK, I am still a bit stuck on the idea that the simplest configuration to get all the fancy Wagtail AI features should be:

WAGTAIL_AI = {
    "BACKENDS": {
        "default": {
            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
        },
    },
}

but appreciate that we have to balance technical complexity here too, and your configuration doesn't necessarily prevent us from doing that in the future.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

I agree it's not the best experience, and ideally they should be able to just specify a default backend with minimal configuration. But, keeping in mind that the model we're suggesting for images is called gpt-4-vision-preview, I think we can hold off until the functionality becomes part of the mainline offering.

"vision": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4-vision-preview",
"TOKEN_LIMIT": 300,
},
},
},
"IMAGE_DESCRIPTION_BACKEND": "vision",
}
```
2. In the Django project settings, configure a [custom Wagtail image base form](https://docs.wagtail.org/en/stable/reference/settings.html#wagtailimages-image-form-base):
```python
WAGTAILIMAGES_IMAGE_FORM_BASE = "wagtail_ai.forms.DescribeImageForm"
```

Now, when you upload or edit an image, a magic wand icon should appear next to the _title_ field. Clicking on the icon will invoke the AI backend to generate an image description.

## Custom prompt

Wagtail AI includes a simple prompt to ask the AI to generate an image description:

> Describe this image. Make the description suitable for use as an alt-text.

If you want to use a different prompt, override the `IMAGE_DESCRIPTION_PROMPT` value:

```python
WAGTAIL_AI = {
"BACKENDS": {
# ...
},
"IMAGE_DESCRIPTION_PROMPT": "Describe this image in the voice of Sir David Attenborough.",
}
```

## Custom form

Wagtail AI includes an image form that enhances the `title` field with an AI button. If you are using a [custom image model](https://docs.wagtail.org/en/stable/advanced_topics/images/custom_image_model.html), can provide your own form to target another field. Check out the implementation of `DescribeImageForm` in [`forms.py`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/forms.py), adapt it to your needs, and set it as `WAGTAILIMAGES_IMAGE_FORM_BASE`.
mgax marked this conversation as resolved.
Show resolved Hide resolved
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Wagtail AI integrates Wagtail with OpenAI's APIs (think ChatGPT) to help you wri

Right now, it can:

* Finish what you've started - write some text and tell Wagtail AI to finish it off for you
* Correct your spelling/grammar
* Let you add your own custom prompts
* Finish what you've started - write some text and tell Wagtail AI to finish it off for you.
* Correct your spelling/grammar.
* Generate image descriptions - useful for [image alt text](https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageElement/alt).
* Let you add your own custom prompts.
21 changes: 21 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
"postcss": "^8.4.5",
"postcss-loader": "^6.2.1",
"postcss-preset-env": "^8.0.1",
"raw-loader": "^4.0.2",
"style-loader": "^3.3.1",
"ts-loader": "^9.2.6",
"typescript": "^4.5.5",
Expand Down
21 changes: 20 additions & 1 deletion src/wagtail_ai/ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from ..text_splitters.length import NaiveTextSplitterCalculator
from ..types import TextSplitterLengthCalculatorProtocol, TextSplitterProtocol
from ..utils import deprecation
from .base import AIBackend, BaseAIBackendConfigSettings
from .base import AIBackend, BackendFeature, BaseAIBackendConfigSettings


class TextSplittingSettingsDict(TypedDict):
Expand Down Expand Up @@ -161,3 +161,22 @@ def get_ai_backend(alias: str) -> AIBackend:
)

return ai_backend_cls(config=config)


class BackendNotFound(Exception):
pass


def get_backend(feature: BackendFeature = BackendFeature.TEXT_COMPLETION) -> AIBackend:
match feature:
case BackendFeature.TEXT_COMPLETION:
alias = settings.WAGTAIL_AI.get("TEXT_COMPLETION_BACKEND", "default")
case BackendFeature.IMAGE_DESCRIPTION:
alias = settings.WAGTAIL_AI.get("IMAGE_DESCRIPTION_BACKEND")
case _:
alias = None

if alias is None:
raise BackendNotFound(f"No backend found for {feature.name}")

return get_ai_backend(alias)
15 changes: 12 additions & 3 deletions src/wagtail_ai/ai/base.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from abc import ABCMeta, abstractmethod
from abc import ABCMeta
from dataclasses import dataclass
from enum import Enum
from typing import (
Any,
ClassVar,
Expand All @@ -13,6 +14,7 @@
)

from django.core.exceptions import ImproperlyConfigured
from django.core.files import File

from .. import tokens
from ..types import (
Expand All @@ -22,6 +24,11 @@
)


class BackendFeature(Enum):
TEXT_COMPLETION = "TEXT_COMPLETION"
IMAGE_DESCRIPTION = "IMAGE_DESCRIPTION"


class BaseAIBackendConfigSettings(TypedDict):
MODEL_ID: Required[str]
TOKEN_LIMIT: NotRequired[int | None]
Expand Down Expand Up @@ -99,14 +106,13 @@ def __init__(
) -> None:
self.config = config

@abstractmethod
def prompt_with_context(
self, *, pre_prompt: str, context: str, post_prompt: str | None = None
) -> AIResponse:
"""
Given a prompt and a context, return a response.
"""
...
raise NotImplementedError("This backend does not support text completion")

def get_text_splitter(self) -> TextSplitterProtocol:
return self.config.text_splitter_class(
Expand All @@ -116,3 +122,6 @@ def get_text_splitter(self) -> TextSplitterProtocol:

def get_splitter_length_calculator(self) -> TextSplitterLengthCalculatorProtocol:
return self.config.text_splitter_length_calculator_class()

def describe_image(self, *, image_file: File, prompt: str) -> AIResponse:
raise NotImplementedError("This backend does not support image description")
15 changes: 12 additions & 3 deletions src/wagtail_ai/ai/echo.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from typing import Any, NotRequired, Self

from django.core.exceptions import ImproperlyConfigured
from django.core.files import File

from .base import (
AIBackend,
Expand Down Expand Up @@ -62,10 +63,18 @@ class EchoBackend(AIBackend[EchoBackendConfig]):
def prompt_with_context(
self, *, pre_prompt: str, context: str, post_prompt: str | None = None
) -> AIResponse:
return self.get_response(
["This", "is", "an", "echo", "backend:", *context.split()]
)

def describe_image(self, *, image_file: File, prompt: str) -> AIResponse:
return self.get_response(
["This", "is", "an", "echo", "backend:", image_file.name]
)

def get_response(self, words):
def response_iterator() -> Generator[str, None, None]:
response = ["This", "is", "an", "echo", "backend:"]
response += context.split()
for word in response:
for word in words:
if (
self.config.max_word_sleep_seconds is not None
and self.config.max_word_sleep_seconds > 0
Expand Down
Loading
Loading