Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update notebooks #309

Merged
merged 4 commits into from
Jul 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 159 additions & 47 deletions docs/source/apis.ipynb

Large diffs are not rendered by default.

693 changes: 338 additions & 355 deletions docs/source/document_extraction.ipynb

Large diffs are not rendered by default.

9 changes: 7 additions & 2 deletions docs/source/guidelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,19 @@
"\n",
"`Kor` is a wrapper around LLMs to help with information extraction.\n",
"\n",
"*Kor* is best used with LLMs that do **NOT** natively support function calling.\n",
"\n",
"If you're working with a chat model that **does** support native function calling, please read through\n",
"this guide first (https://python.langchain.com/v0.2/docs/how_to/tool_calling/).\n",
"\n",
"The quality of the results depends on a lot of factors. \n",
"\n",
"Here are a few things to experiment with to improve quality:\n",
"\n",
"* Add more examples. Diverse examples can help, including examples where nothing should be extracted.\n",
"* Improve the descriptions of the attributes.\n",
"* If working with multi-paragraph text, specify an `input_formatter` of `\"triple_quotes\"` when creating the chain.\n",
"* Try a better model (e.g., text-davinci-003, gpt-4).\n",
"* Try a better model.\n",
"* Break the schema into a few smaller schemas, run separate extractions, and merge the results.\n",
"* If possible to flatten the schema, and use a CSV encoding instead of a JSON encoding.\n",
"* Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).\n",
Expand Down Expand Up @@ -44,7 +49,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
73 changes: 30 additions & 43 deletions docs/source/nested_objects.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,15 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 1,
"id": "0b4597b2-2a43-4491-8830-bf9f79428074",
"metadata": {
"nbsphinx": "hidden",
"tags": [
"remove-cell"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The autoreload extension is already loaded. To reload it, use:\n",
" %reload_ext autoreload\n"
]
}
],
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
Expand All @@ -43,7 +34,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 2,
"id": "718c66a7-6186-4ed8-87e9-5ed28e3f209e",
"metadata": {
"tags": []
Expand All @@ -57,7 +48,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 3,
"id": "9bc98f35-ea5f-4b74-a32e-a300a22c0c89",
"metadata": {
"tags": []
Expand All @@ -83,7 +74,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 4,
"id": "f75990e6-5973-4618-9f15-f3b60a14bfa5",
"metadata": {
"tags": []
Expand Down Expand Up @@ -147,7 +138,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 5,
"id": "54a199a5-24b4-442c-8907-1449e437a880",
"metadata": {
"tags": []
Expand All @@ -161,7 +152,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 6,
"id": "193e257b-df01-45ec-af77-076d2070533b",
"metadata": {
"tags": []
Expand All @@ -178,20 +169,20 @@
" 'to_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'}}]}"
]
},
"execution_count": 29,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\n",
"chain.invoke(\n",
" \"Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite.\"\n",
")[\"data\"]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 8,
"id": "c8295f36-f986-4db2-97bc-ef2e6cdbcc87",
"metadata": {
"tags": []
Expand All @@ -201,29 +192,24 @@
"data": {
"text/plain": [
"{'information': [{'person_name': 'Alice Doe',\n",
" 'from_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" 'from_address': {'city': 'New York', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Bob Smith',\n",
" 'from_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" 'from_address': {'city': 'New York', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Andrew',\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" {'person_name': 'Joana',\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" {'person_name': 'Paul',\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" {'person_name': 'Betty',\n",
" 'from_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'},\n",
" 'to_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'}}]}"
" 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Joana', 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Paul', 'to_address': {'city': 'Boston', 'country': 'USA'}}]}"
]
},
"execution_count": 30,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\n",
"chain.invoke(\n",
" \"Alice Doe and Bob Smith moved from New York to Boston. Andrew was 12 years\"\n",
" \" old. He also moved to Boston. So did Joana and Paul. Betty did the opposite.\"\n",
")[\"data\"]"
Expand All @@ -247,7 +233,7 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 9,
"id": "e528f20c-46d3-40b6-b1ba-11024002deb8",
"metadata": {
"tags": []
Expand Down Expand Up @@ -300,7 +286,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 10,
"id": "23b81b06-118a-4ebe-9e20-5df1bf269ce3",
"metadata": {
"tags": []
Expand All @@ -312,7 +298,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 11,
"id": "29219fae-41cb-4235-92fa-07b16ded2296",
"metadata": {
"tags": []
Expand All @@ -325,19 +311,20 @@
" 'from_address': [{'city': 'New York', 'state': 'NY', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'}]},\n",
" {'person_name': 'Bob Smith',\n",
" 'from_address': [{'city': 'New York', 'state': 'NY', 'country': 'USA'},\n",
" {'city': 'Boston', 'state': 'MA', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'},\n",
" {'city': 'LA', 'state': 'CA', 'country': 'USA'}]}]}"
" 'from_address': [{'city': 'New York', 'state': 'NY', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'}]},\n",
" {'person_name': 'Bob Smith',\n",
" 'from_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'LA', 'state': 'CA', 'country': 'USA'}]}]}"
]
},
"execution_count": 33,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\n",
"chain.invoke(\n",
" \"Alice Doe and Bob Smith moved from New York to Boston. Bob later moved to LA.\"\n",
")[\"data\"]"
]
Expand All @@ -359,7 +346,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
28 changes: 10 additions & 18 deletions docs/source/objects.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@
}
],
"source": [
"print(chain.prompt.format_prompt(text=\"[user input]\").to_string())"
"print(chain.get_prompts()[0].format_prompt(text=\"[user input]\").to_string())"
]
},
{
Expand All @@ -194,14 +194,6 @@
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/eugene/.pyenv/versions/3.9.6/envs/kor/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
" warn_deprecated(\n"
]
},
{
"data": {
"text/plain": [
Expand All @@ -214,7 +206,7 @@
}
],
"source": [
"chain.run(\"Eugene was 18 years old a long time ago.\")[\"data\"]"
"chain.invoke(\"Eugene was 18 years old a long time ago.\")[\"data\"]"
]
},
{
Expand All @@ -236,7 +228,7 @@
"source": [
"chain = create_extraction_chain(llm, schema)\n",
"print(\n",
" chain.run(\n",
" chain.invoke(\n",
" \"My name is Bob Alice and my phone number is (123)-444-9999. I found my true love one\"\n",
" \" on a blue sunday. Her number was (333)1232832. Her name was Moana Sunrise and she was 10 years old.\"\n",
" )[\"data\"]\n",
Expand Down Expand Up @@ -271,7 +263,7 @@
}
],
"source": [
"chain.run(\n",
"chain.invoke(\n",
" \"My phone number is (123)-444-9999. I found my true love one on a blue sunday.\"\n",
" \" Her number was (333)1232832\"\n",
")[\"data\"]"
Expand Down Expand Up @@ -333,7 +325,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 10,
"id": "5c694d79-e72c-4712-b891-111bc0279032",
"metadata": {
"tags": []
Expand All @@ -347,14 +339,14 @@
" 'age': '20'}]}"
]
},
"execution_count": 12,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain = create_extraction_chain(llm, schema)\n",
"chain.run(\n",
"chain.invoke(\n",
" \"My name is Bob Alice and my phone number is (123)-444-9999. I found my true love one\"\n",
" \" on a blue sunday. Her number was (333)1232832. Her name was Moana Sunrise and she was 20 years old.\"\n",
")[\"data\"]"
Expand All @@ -370,7 +362,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 11,
"id": "a2944e8c-4630-4b29-b505-b2ca6fceba01",
"metadata": {
"tags": []
Expand Down Expand Up @@ -408,7 +400,7 @@
}
],
"source": [
"print(chain.prompt.format_prompt(text=\"[user input]\").to_string())"
"print(chain.get_prompts()[0].format_prompt(text=\"[user input]\").to_string())"
]
}
],
Expand All @@ -428,7 +420,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
6 changes: 3 additions & 3 deletions docs/source/prompt.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@
"\n",
"chain = create_extraction_chain(llm, schema, instruction_template=instruction_template)\n",
"\n",
"print(chain.prompt.format_prompt(text=\"hello\").to_string())"
"print(chain.get_prompts()[0].format_prompt(text=\"hello\").to_string())"
]
},
{
Expand Down Expand Up @@ -259,7 +259,7 @@
" type_descriptor=CatType(),\n",
")\n",
"\n",
"print(chain.prompt.format_prompt(text=\"hello\").to_string())"
"print(chain.get_prompts()[0].format_prompt(text=\"hello\").to_string())"
]
}
],
Expand All @@ -279,7 +279,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
Loading
Loading