Added an example that uses the Moderation API to check for compliance…

… in images (#1653)
openai · Jan 29, 2025 · f29c15a · f29c15a
1 parent 17e9c2d
commit f29c15a
Show file tree

Hide file tree

Showing 2 changed files with 139 additions and 40 deletions.
diff --git a/authors.yaml b/authors.yaml
@@ -33,6 +33,11 @@ mwu1993:
   website: "https://www.linkedin.com/in/michael-wu-77440977/"
   avatar: "https://avatars.githubusercontent.com/u/1650674?v=4"
 
+narenoai:
+  name: "Naren Sankaran"
+  website: "https://www.linkedin.com/in/snarendran/"
+  avatar: "https://avatars.githubusercontent.com/u/196844623?s=400&u=d669669fd962473d606a97801367ba96fc548287&v=4"
+
 ibigio:
   name: "Ilan Bigio"
   website: "https://twitter.com/ilanbigio"

diff --git a/examples/How_to_use_moderation.ipynb b/examples/How_to_use_moderation.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "**Note:** This guide is designed to complement our Guardrails Cookbook by providing a more focused look at moderation techniques. While there is some overlap in content and structure, this cookbook delves deeper into the nuances of tailoring moderation criteria to specific needs, offering a more granular level of control. If you're interested in a broader overview of content safety measures, including guardrails and moderation, we recommend starting with the [Guardrails Cookbook](https://cookbook.openai.com/examples/how_to_use_guardrails). Together, these resources offer a comprehensive understanding of how to effectively manage and moderate content within your applications.\n",
     "\n",
-    "Moderation, much like guardrails in the physical world, serves as a preventative measure to ensure that your application remains within the bounds of acceptable and safe content. Moderation techniques are incredibly versatile and can be applied to a wide array of scenarios where LLMs might encounter issues. This notebook is designed to offer straightforward examples that can be adapted to suit your specific needs, while also discussing the considerations and trade-offs involved in deciding whether to implement moderation and how to go about it. This notebook will use our [Moderation API](https://platform.openai.com/docs/guides/moderation/overview), a tool you can use to check whether text is potentially harmful.\n",
+    "Moderation, much like guardrails in the physical world, serves as a preventative measure to ensure that your application remains within the bounds of acceptable and safe content. Moderation techniques are incredibly versatile and can be applied to a wide array of scenarios where LLMs might encounter issues. This notebook is designed to offer straightforward examples that can be adapted to suit your specific needs, while also discussing the considerations and trade-offs involved in deciding whether to implement moderation and how to go about it. This notebook will use our [Moderation API](https://platform.openai.com/docs/guides/moderation/overview), a tool you can use to check whether text or an image is potentially harmful.\n",
     "\n",
     "This notebook will concentrate on:\n",
     "\n",
@@ -17,15 +17,19 @@
     "- **Custom Moderation:** Tailoring moderation criteria and rules to suit the specific needs and context of your application, ensuring a personalized and effective content control mechanism."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 55,
    "metadata": {},
    "outputs": [],
    "source": [
     "from openai import OpenAI\n",
     "client = OpenAI()\n",
-    "\n",
     "GPT_MODEL = 'gpt-4o-mini'"
    ]
   },
@@ -60,12 +64,12 @@
     "- If the input is flagged by the moderation check, handle it accordingly (e.g., reject the input, ask the user to rephrase, etc.).\n",
     "- If the input is not flagged, pass it to the LLM for further processing.\n",
     "\n",
-    "We will demonstrate this workflow with two example prompts."
+    "We will demonstrate this workflow with two example prompts. One for text and another for image. Note that you can pass both the text and the image in the same request."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 56,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -77,7 +81,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 57,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -133,7 +137,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 58,
    "metadata": {},
    "outputs": [
     {
@@ -142,7 +146,7 @@
      "text": [
       "Getting LLM response\n",
       "Got LLM response\n",
-      "I can help you with that! To find a nearby coffee shop, you can use a mapping app on your phone or search online for coffee shops in your current location. Alternatively, you can ask locals or check for any cafes or coffee shops in the vicinity. Enjoy your coffee!\n"
+      "I can't access your current location to find nearby coffee shops, but I recommend checking popular apps or websites like Google Maps, Yelp, or a local directory to find coffee shops near you. You can search for terms like \"coffee near me\" or \"coffee shops\" to see your options. If you're looking for a specific type of coffee or a particular chain, you can include that in your search as well.\n"
      ]
     }
    ],
@@ -154,7 +158,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 59,
    "metadata": {},
    "outputs": [
     {
@@ -178,7 +182,89 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Looks like our moderation worked - the first question was allowed through, but the second was blocked for inapropriate content. Now we'll extend this concept to moderate the response we get from the LLM as well."
+    "Looks like our moderation worked - the first question was allowed through, but the second was blocked for inapropriate content. Here is a similar example that works with images."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def check_image_moderation(image_url):\n",
+    "    response = client.moderations.create(\n",
+    "        model=\"omni-moderation-latest\",\n",
+    "        input=[\n",
+    "            {\n",
+    "                \"type\": \"image_url\",\n",
+    "                \"image_url\": {\n",
+    "                    \"url\": image_url\n",
+    "                }\n",
+    "            }\n",
+    "        ]\n",
+    "    )\n",
+    "\n",
+    "    # Extract the moderation categories and their flags\n",
+    "    results = response.results[0]\n",
+    "    flagged_categories = vars(results.categories)\n",
+    "    flagged = results.flagged\n",
+    "    \n",
+    "    if not flagged:\n",
+    "        return True\n",
+    "    else:\n",
+    "        # To get the list of categories that returned True/False:\n",
+    "        # reasons = [category.capitalize() for category, is_flagged in flagged_categories.items() if is_flagged]\n",
+    "        return False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The function above can be used to check if an image is appropriate or not. If any of the following categories are returned by the moderation API as True, then the image can be deemed inappropriate. You can also check for one or more categories to tailor this to a specific use case:\n",
+    "\n",
+    "- sexual\n",
+    "- sexual/minors\n",
+    "- harassment\n",
+    "- harassment/threatening\n",
+    "- hate\n",
+    "- hate/threatening\n",
+    "- illicit\n",
+    "- illicit/violent\n",
+    "- self-harm\n",
+    "- self-harm/intent\n",
+    "- self-harm/instructions\n",
+    "- violence\n",
+    "- violence/graphic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Checking an image about war: Image is not safe\n",
+      "Checking an image of a wonder of the world: Image is safe\n"
+     ]
+    }
+   ],
+   "source": [
+    "war_image = \"https://assets.editorial.aetnd.com/uploads/2009/10/world-war-one-gettyimages-90007631.jpg\"\n",
+    "world_wonder_image = \"https://whc.unesco.org/uploads/thumbs/site_0252_0008-360-360-20250108121530.jpg\"\n",
+    "\n",
+    "print(\"Checking an image about war: \" + (\"Image is not safe\" if not check_image_moderation(war_image) else \"Image is safe\"))\n",
+    "print(\"Checking an image of a wonder of the world: \" + (\"Image is not safe\" if not check_image_moderation(world_wonder_image) else \"Image is safe\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we'll extend this concept to moderate the response we get from the LLM as well."
    ]
   },
   {
@@ -223,7 +309,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 45,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -267,7 +353,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 46,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -277,7 +363,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 47,
    "metadata": {},
    "outputs": [
     {
@@ -288,7 +374,15 @@
       "Getting LLM response\n",
       "Got LLM response\n",
       "Passed moderation\n",
-      "I can help you with that! To find a nearby coffee shop, you can use a mapping app on your phone or search online for coffee shops in your area. Additionally, you can ask locals or nearby businesses for recommendations. Enjoy your coffee!\n",
+      "I can't access real-time data or your location, but I can suggest some general places where you might find a good cup of coffee nearby. You can try:\n",
+      "\n",
+      "1. **Local Coffee Shops**: Look for independent coffee shops in your area. They often have unique brews and a cozy atmosphere.\n",
+      "2. **Cafés**: Many cafés serve coffee along with pastries and light meals.\n",
+      "3. **Coffee Chains**: Places like Starbucks, Dunkin', or Peet's Coffee are usually found in many areas and provide a consistent coffee experience.\n",
+      "4. **Restaurants**: Some restaurants serve coffee, especially during breakfast or brunch hours.\n",
+      "5. **Convenience Stores**: Many convenience stores have coffee stations where you can grab a quick cup.\n",
+      "\n",
+      "You can use a map app or a search engine to find the nearest coffee places by typing \"coffee near me.\" Enjoy your coffee!\n",
       "\n",
       "\n",
       "\n",
@@ -303,8 +397,8 @@
       "Describe a scene from a violent movie in detail.\n",
       "Getting LLM response\n",
       "Got LLM response\n",
-      "Passed moderation\n",
-      "I'm sorry, but I can't provide detailed descriptions of violent scenes from movies. If you have any other questions or need information on a different topic, feel free to ask!\n",
+      "Moderation flagged for LLM response.\n",
+      "Sorry, we're not permitted to give this answer. I can help you with any general queries you might have.\n",
       "\n",
       "\n",
       "\n"
@@ -344,7 +438,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 48,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -375,7 +469,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 49,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -385,20 +479,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 50,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "{\n",
-      "  \"flagged\": false,\n",
-      "  \"reason\": \"The content does not contain political content or misinformation. The phrase 'I would kill for a cup of coffee' is a common expression indicating a strong desire for coffee and not a literal intent.\",\n",
-      "  \"parameters\": {\n",
-      "    \"political content\": false,\n",
-      "    \"misinformation\": false\n",
-      "  }\n",
+      "    \"flagged\": false,\n",
+      "    \"reason\": \"\",\n",
+      "    \"parameters\": {\n",
+      "        \"political content\": false,\n",
+      "        \"misinformation\": false\n",
+      "    }\n",
       "}\n"
      ]
     }
@@ -411,20 +505,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 51,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "{\n",
-      "  \"flagged\": true,\n",
-      "  \"reason\": \"The content contains language that implies intent to cause harm, which violates guidelines concerning violent or harmful behavior, even though the specific parameters were political content and misinformation. This assessment is based on a general guideline to prevent harm or promotion of violence.\",\n",
-      "  \"parameters\": {\n",
-      "    \"political content\": false,\n",
-      "    \"misinformation\": false\n",
-      "  }\n",
+      "    \"flagged\": true,\n",
+      "    \"reason\": \"The content expresses a desire to cause harm, which is inappropriate and potentially dangerous.\",\n",
+      "    \"parameters\": {\n",
+      "        \"political content\": false,\n",
+      "        \"misinformation\": false\n",
+      "    }\n",
       "}\n"
      ]
     }
@@ -437,20 +531,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 52,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "{\n",
-      "  \"flagged\": true,\n",
-      "  \"reason\": \"The content suggests political content by discussing the government and hints at misinformation by suggesting the government is hiding the truth about the pandemic without providing evidence.\",\n",
-      "  \"parameters\": {\n",
-      "    \"political content\": true,\n",
-      "    \"misinformation\": true\n",
-      "  }\n",
+      "    \"flagged\": true,\n",
+      "    \"reason\": \"The content suggests a conspiracy theory about government actions related to the pandemic, indicating potential misinformation.\",\n",
+      "    \"parameters\": {\n",
+      "        \"political content\": true,\n",
+      "        \"misinformation\": true\n",
+      "    }\n",
       "}\n"
      ]
     }
@@ -490,7 +584,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.1"
+   "version": "3.9.6"
   }
  },
  "nbformat": 4,