Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
eyurtsev committed Jul 20, 2024
1 parent 8875a5b commit 68818c5
Show file tree
Hide file tree
Showing 11 changed files with 570 additions and 537 deletions.
199 changes: 148 additions & 51 deletions docs/source/apis.ipynb

Large diffs are not rendered by default.

693 changes: 338 additions & 355 deletions docs/source/document_extraction.ipynb

Large diffs are not rendered by default.

9 changes: 7 additions & 2 deletions docs/source/guidelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,19 @@
"\n",
"`Kor` is a wrapper around LLMs to help with information extraction.\n",
"\n",
"*Kor* is best used with LLMs that do **NOT** natively support function calling.\n",
"\n",
"If you're working with a chat model that **does** support native function calling, please read through\n",
"this guide first (https://python.langchain.com/v0.2/docs/how_to/tool_calling/).\n",
"\n",
"The quality of the results depends on a lot of factors. \n",
"\n",
"Here are a few things to experiment with to improve quality:\n",
"\n",
"* Add more examples. Diverse examples can help, including examples where nothing should be extracted.\n",
"* Improve the descriptions of the attributes.\n",
"* If working with multi-paragraph text, specify an `input_formatter` of `\"triple_quotes\"` when creating the chain.\n",
"* Try a better model (e.g., text-davinci-003, gpt-4).\n",
"* Try a better model.\n",
"* Break the schema into a few smaller schemas, run separate extractions, and merge the results.\n",
"* If possible to flatten the schema, and use a CSV encoding instead of a JSON encoding.\n",
"* Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).\n",
Expand Down Expand Up @@ -44,7 +49,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
69 changes: 28 additions & 41 deletions docs/source/nested_objects.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,15 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 1,
"id": "0b4597b2-2a43-4491-8830-bf9f79428074",
"metadata": {
"nbsphinx": "hidden",
"tags": [
"remove-cell"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The autoreload extension is already loaded. To reload it, use:\n",
" %reload_ext autoreload\n"
]
}
],
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
Expand All @@ -43,7 +34,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 2,
"id": "718c66a7-6186-4ed8-87e9-5ed28e3f209e",
"metadata": {
"tags": []
Expand All @@ -57,15 +48,15 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 3,
"id": "9bc98f35-ea5f-4b74-a32e-a300a22c0c89",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = ChatOpenAI(\n",
" model_name=\"gpt-4o-mini\",\n",
" model_name=\"gpt-4o\",\n",
" temperature=0,\n",
" max_tokens=2000,\n",
")"
Expand All @@ -83,7 +74,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 4,
"id": "f75990e6-5973-4618-9f15-f3b60a14bfa5",
"metadata": {
"tags": []
Expand Down Expand Up @@ -147,7 +138,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 5,
"id": "54a199a5-24b4-442c-8907-1449e437a880",
"metadata": {
"tags": []
Expand All @@ -161,7 +152,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 6,
"id": "193e257b-df01-45ec-af77-076d2070533b",
"metadata": {
"tags": []
Expand All @@ -178,7 +169,7 @@
" 'to_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'}}]}"
]
},
"execution_count": 29,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -191,7 +182,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 8,
"id": "c8295f36-f986-4db2-97bc-ef2e6cdbcc87",
"metadata": {
"tags": []
Expand All @@ -201,23 +192,18 @@
"data": {
"text/plain": [
"{'information': [{'person_name': 'Alice Doe',\n",
" 'from_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" 'from_address': {'city': 'New York', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Bob Smith',\n",
" 'from_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" 'from_address': {'city': 'New York', 'country': 'USA'},\n",
" 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Andrew',\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" {'person_name': 'Joana',\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" {'person_name': 'Paul',\n",
" 'to_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'}},\n",
" {'person_name': 'Betty',\n",
" 'from_address': {'city': 'Boston', 'state': 'MA', 'country': 'USA'},\n",
" 'to_address': {'city': 'New York', 'state': 'NY', 'country': 'USA'}}]}"
" 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Joana', 'to_address': {'city': 'Boston', 'country': 'USA'}},\n",
" {'person_name': 'Paul', 'to_address': {'city': 'Boston', 'country': 'USA'}}]}"
]
},
"execution_count": 30,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -247,7 +233,7 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 9,
"id": "e528f20c-46d3-40b6-b1ba-11024002deb8",
"metadata": {
"tags": []
Expand Down Expand Up @@ -300,7 +286,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 10,
"id": "23b81b06-118a-4ebe-9e20-5df1bf269ce3",
"metadata": {
"tags": []
Expand All @@ -312,7 +298,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 11,
"id": "29219fae-41cb-4235-92fa-07b16ded2296",
"metadata": {
"tags": []
Expand All @@ -325,13 +311,14 @@
" 'from_address': [{'city': 'New York', 'state': 'NY', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'}]},\n",
" {'person_name': 'Bob Smith',\n",
" 'from_address': [{'city': 'New York', 'state': 'NY', 'country': 'USA'},\n",
" {'city': 'Boston', 'state': 'MA', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'},\n",
" {'city': 'LA', 'state': 'CA', 'country': 'USA'}]}]}"
" 'from_address': [{'city': 'New York', 'state': 'NY', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'}]},\n",
" {'person_name': 'Bob Smith',\n",
" 'from_address': [{'city': 'Boston', 'state': 'MA', 'country': 'USA'}],\n",
" 'to_address': [{'city': 'LA', 'state': 'CA', 'country': 'USA'}]}]}"
]
},
"execution_count": 33,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -359,7 +346,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
18 changes: 5 additions & 13 deletions docs/source/objects.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"outputs": [],
"source": [
"llm = ChatOpenAI(\n",
" model_name=\"gpt-4o-mini\",\n",
" model_name=\"gpt-4o\",\n",
" temperature=0,\n",
" max_tokens=2000,\n",
")"
Expand Down Expand Up @@ -194,14 +194,6 @@
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/eugene/.pyenv/versions/3.9.6/envs/kor/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
" warn_deprecated(\n"
]
},
{
"data": {
"text/plain": [
Expand Down Expand Up @@ -333,7 +325,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 10,
"id": "5c694d79-e72c-4712-b891-111bc0279032",
"metadata": {
"tags": []
Expand All @@ -347,7 +339,7 @@
" 'age': '20'}]}"
]
},
"execution_count": 12,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -370,7 +362,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 11,
"id": "a2944e8c-4630-4b29-b505-b2ca6fceba01",
"metadata": {
"tags": []
Expand Down Expand Up @@ -428,7 +420,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions docs/source/prompt.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"outputs": [],
"source": [
"llm = ChatOpenAI(\n",
" model_name=\"gpt-4o-mini\",\n",
" model_name=\"gpt-4o\",\n",
" temperature=0,\n",
")\n",
"\n",
Expand Down Expand Up @@ -279,7 +279,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
26 changes: 9 additions & 17 deletions docs/source/schema_serialization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 5,
"id": "6088c98a",
"metadata": {
"tags": []
Expand All @@ -211,7 +211,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 6,
"id": "718c66a7-6186-4ed8-87e9-5ed28e3f209e",
"metadata": {
"tags": []
Expand All @@ -224,15 +224,15 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 7,
"id": "9bc98f35-ea5f-4b74-a32e-a300a22c0c89",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = ChatOpenAI(\n",
" model_name=\"gpt-4o-mini\",\n",
" model_name=\"gpt-4o\",\n",
" temperature=0,\n",
" max_tokens=2000,\n",
" model_kwargs={\"frequency_penalty\": 0, \"presence_penalty\": 0, \"top_p\": 1.0},\n",
Expand All @@ -241,7 +241,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 8,
"id": "54a199a5-24b4-442c-8907-1449e437a880",
"metadata": {
"tags": []
Expand All @@ -253,27 +253,19 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 9,
"id": "193e257b-df01-45ec-af77-076d2070533b",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/eugene/.pyenv/versions/3.9.6/envs/kor/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
" warn_deprecated(\n"
]
},
{
"data": {
"text/plain": [
"{'personal_info': [{'first_name': 'Eugene', 'last_name': '', 'age': '18'}]}"
]
},
"execution_count": 13,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -284,7 +276,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 10,
"id": "c8295f36-f986-4db2-97bc-ef2e6cdbcc87",
"metadata": {
"tags": []
Expand Down Expand Up @@ -325,7 +317,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 68818c5

Please sign in to comment.