Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image description with OpenAI #81

Merged
merged 21 commits into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.pages
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
nav:
- installation.md
- editor-integration.md
- images-integration.md
- ai-backends.md
- text-splitting.md
- contributing.md
41 changes: 39 additions & 2 deletions docs/ai-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

Wagtail AI can be configured to use different backends to support different AI services.

Currently the only (and default) backend available in Wagtail AI is the ["LLM" backend](#the-llm-backend).
The default backend for text completion available in Wagtail AI is the ["LLM" backend](#the-llm-backend). To enable [image description](../images-integration/), you can use the ["OpenAI" backend](#the-openai-backend).

## The "LLM" backend

This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins.
This backend uses the ["LLM" library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins. At the moment it only supports [text completion](../editor-integration/).

By default, it is configured to use OpenAI's `gpt-3.5-turbo` model.

Expand Down Expand Up @@ -155,3 +155,40 @@ You can find the "LLM" library specific instructions at: https://llm.datasette.i
}
}
```

## The "OpenAI" backend

Wagtail AI includes a backend for OpenAI that supports both [text completion](../editor-integration/) and [image description](../images-integration/).

To use the OpenAI backend, you need an API key, which must be set in the `OPENAI_API_KEY` environment variable. Then, configure it in your Django project settings:

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4",
},
},
},
}
```

### Specifying another OpenAI model

The OpenAI backend supports the use of custom models. For newer models that are not known to Wagtail AI, you must also specify a token limit:

```python
WAGTAIL_AI = {
"BACKENDS": {
"vision": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4-vision-preview",
"TOKEN_LIMIT": 300,
},
},
},
}
```
18 changes: 18 additions & 0 deletions docs/editor-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,21 @@ When creating prompts you can provide a label and description to help describe t

- 'Append after existing content' - keep your existing content intact and add the response from the AI to the end (useful for completions/suggestions).
- 'Replace content' - replace the content in the editor with the response from the AI (useful for corrections, rewrites and translations.)

### Configuring the AI backend

By default, the `"default"` model will be used for text operations in the editor. To use a different model, set `TEXT_COMPLETION_BACKEND` to the name of another backend:

```python
WAGTAIL_AI = {
"BACKENDS": {
"gpt4": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4",
},
},
},
"TEXT_COMPLETION_BACKEND": "gpt4",
}
```
73 changes: 73 additions & 0 deletions docs/images-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Images Integration

Wagtail AI integrates with the image edit form to provide AI-generated descriptions to images. The integration requires a backend that supports image descriptions, such as [the OpenAI backend](../ai-backends/#the-openai-backend).

## Configuration

1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the backend:
```python
WAGTAIL_AI = {
"BACKENDS": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This seems pretty clear but I'm not sure it's obvious enough from this example that you need to configure two separate backends if you want both image description and text completion features.

Do you think specifying a IMAGE_DESCRIPTION_MODEL_ID config setting (with a default value) on the OpenAIBackend would make it less of a burden to configure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended the docs to have an example with two backends. I don't think it would help to add another config, and have the backend switch models on the fly, that feels confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK, I am still a bit stuck on the idea that the simplest configuration to get all the fancy Wagtail AI features should be:

WAGTAIL_AI = {
    "BACKENDS": {
        "default": {
            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
        },
    },
}

but appreciate that we have to balance technical complexity here too, and your configuration doesn't necessarily prevent us from doing that in the future.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

I agree it's not the best experience, and ideally they should be able to just specify a default backend with minimal configuration. But, keeping in mind that the model we're suggesting for images is called gpt-4-vision-preview, I think we can hold off until the functionality becomes part of the mainline offering.

"vision": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4-vision-preview",
"TOKEN_LIMIT": 300,
},
},
},
"IMAGE_DESCRIPTION_BACKEND": "vision",
}
```
2. In the Django project settings, configure a [custom Wagtail image base form](https://docs.wagtail.org/en/stable/reference/settings.html#wagtailimages-image-form-base):
```python
WAGTAILIMAGES_IMAGE_FORM_BASE = "wagtail_ai.forms.DescribeImageForm"
```

Now, when you upload or edit an image, a magic wand icon should appear next to the _title_ field. Clicking on the icon will invoke the AI backend to generate an image description.

## Separate backends for text completion and image description

Multi-modal models are faily new, so you may want to configure two different backends for text completion and image description. The `default` model will be used for text completion:

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.llm.LLMBackend",
"CONFIG": {
"MODEL_ID": "gpt-3.5-turbo",
},
},
"vision": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
"CONFIG": {
"MODEL_ID": "gpt-4-vision-preview",
"TOKEN_LIMIT": 300,
},
},
},
"IMAGE_DESCRIPTION_BACKEND": "vision",
}
```

## Custom prompt

Wagtail AI includes a simple prompt to ask the AI to generate an image description:

> Describe this image. Make the description suitable for use as an alt-text.
If you want to use a different prompt, override the `IMAGE_DESCRIPTION_PROMPT` value:

```python
WAGTAIL_AI = {
"BACKENDS": {
# ...
},
"IMAGE_DESCRIPTION_PROMPT": "Describe this image in the voice of Sir David Attenborough.",
}
```

## Custom form

Wagtail AI includes an image form that enhances the `title` field with an AI button. If you are using a [custom image model](https://docs.wagtail.org/en/stable/advanced_topics/images/custom_image_model.html), you can provide your own form to target another field. Check out the implementation of `DescribeImageForm` in [`forms.py`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/forms.py), adapt it to your needs, and set it as `WAGTAILIMAGES_IMAGE_FORM_BASE`.
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Wagtail AI integrates Wagtail with OpenAI's APIs (think ChatGPT) to help you wri

Right now, it can:

* Finish what you've started - write some text and tell Wagtail AI to finish it off for you
* Correct your spelling/grammar
* Let you add your own custom prompts
* Finish what you've started - write some text and tell Wagtail AI to finish it off for you.
* Correct your spelling/grammar.
* Generate image descriptions - useful for [image alt text](https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageElement/alt).
* Let you add your own custom prompts.
Loading
Loading