197 lines
6.1 KiB
Plaintext
197 lines
6.1 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "4f3b89c3",
|
|
"metadata": {},
|
|
"source": [
|
|
"[](https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/llamaindex/llamaindex.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "2a1f181d-feeb-4b29-aabc-67a75234b92c",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Supabase + LlamaIndex\n",
|
|
"\n",
|
|
"In this example we'll use PostgreSQL + pgvectors with [LlamaIndex's Supabase Vector Store](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html#setup-openai).\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "3cab93f5-10d0-47c5-9f4e-64921461e7e2",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Install Dependencies"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "a41bc3e4-ea52-43aa-9239-a431b49f029e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install -qU vecs datasets llama_index html2text"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0026437c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import logging\n",
|
|
"import sys\n",
|
|
"\n",
|
|
"# Uncomment to see debug logs\n",
|
|
"# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
|
|
"# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
|
|
"\n",
|
|
"from llama_index.core.storage import StorageContext\n",
|
|
"from llama_index.readers.web import SimpleWebPageReader\n",
|
|
"from llama_index.indices.vector_store import VectorStoreIndex\n",
|
|
"from llama_index.vector_stores.supabase import SupabaseVectorStore\n",
|
|
"import textwrap\n",
|
|
"import html2text"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "7a95a440",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Set up OpenAI\n",
|
|
"\n",
|
|
"OpenAI requires an [API key](https://platform.openai.com/api-keys) to run their models. Let's store that in an environment variable:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ca2437d3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"os.environ['OPENAI_API_KEY'] = \"[your_openai_api_key]\""
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "4dbd176f-3b4e-4d41-a72d-1e1affe6ecae",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load the Dataset\n",
|
|
"\n",
|
|
"Let's load a small data set of Paul Graham's essays:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "dc6b0bc2-b95f-4190-bf77-fa2dc57fc247",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"essays = [\n",
|
|
" 'paul_graham_essay.txt'\n",
|
|
"]\n",
|
|
"documents = SimpleWebPageReader().load_data([f'https://raw.githubusercontent.com/supabase/supabase/master/examples/ai/llamaindex/data/{essay}' for essay in essays])\n",
|
|
"print('Document ID:', documents[0].doc_id, 'Document Hash:', documents[0].hash)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "955b2700-8242-40eb-ac3f-d479a0312693",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create an index in Supabase\n",
|
|
"\n",
|
|
"Let's store Paul Graham's essays in Supabase. You can find the Postgres connection string in the [Database Settings](https://supabase.com/dashboard/project/_/settings/database) of your Supabase project.\n",
|
|
"\n",
|
|
"> **Note:** SQLAlchemy requires the connection string to start with `postgresql://` (instead of `postgres://`). Don't forget to rename this after copying the string from the dashboard.\n",
|
|
"\n",
|
|
"> **Note:** You must use the \"connection pooling\" string (domain ending in `*.pooler.supabase.com`) with Google Colab since Colab does not support IPv6.\n",
|
|
"\n",
|
|
"This will also work with any other Postgres provider that supports pgvector."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "ce9bc85b-e844-407c-a0ad-ccf6af3c8866",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\n",
|
|
"# Substitute your connection string here\n",
|
|
"DB_CONNECTION = \"postgresql://postgres:password@localhost:5431/db\"\n",
|
|
"\n",
|
|
"vector_store = SupabaseVectorStore(\n",
|
|
" postgres_connection_string=DB_CONNECTION, \n",
|
|
" collection_name='base_demo'\n",
|
|
")\n",
|
|
"storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
|
|
"index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "9e82862d-440a-4f66-9ed7-0eaa6a0f4062",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Query the index\n",
|
|
"\n",
|
|
"We can now ask questions using our index."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "d2771545-d209-4ceb-a222-ed139a4620f2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"query_engine = index.as_query_engine()\n",
|
|
"\n",
|
|
"# Ask a question\n",
|
|
"response = query_engine.query(\"What did the author do growing up?\")\n",
|
|
"\n",
|
|
"# Print the response\n",
|
|
"print(response)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|