Files
HKSingleParty/99_references/supabase-examples/ai/llamaindex/llamaindex.ipynb
2025-05-28 09:55:51 +08:00

197 lines
6.1 KiB
Plaintext

{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "4f3b89c3",
"metadata": {},
"source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/llamaindex/llamaindex.ipynb)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2a1f181d-feeb-4b29-aabc-67a75234b92c",
"metadata": {},
"source": [
"# Supabase + LlamaIndex\n",
"\n",
"In this example we'll use PostgreSQL + pgvectors with [LlamaIndex's Supabase Vector Store](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html#setup-openai).\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3cab93f5-10d0-47c5-9f4e-64921461e7e2",
"metadata": {},
"source": [
"## Install Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a41bc3e4-ea52-43aa-9239-a431b49f029e",
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU vecs datasets llama_index html2text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0026437c",
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import sys\n",
"\n",
"# Uncomment to see debug logs\n",
"# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
"# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
"\n",
"from llama_index.core.storage import StorageContext\n",
"from llama_index.readers.web import SimpleWebPageReader\n",
"from llama_index.indices.vector_store import VectorStoreIndex\n",
"from llama_index.vector_stores.supabase import SupabaseVectorStore\n",
"import textwrap\n",
"import html2text"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7a95a440",
"metadata": {},
"source": [
"# Set up OpenAI\n",
"\n",
"OpenAI requires an [API key](https://platform.openai.com/api-keys) to run their models. Let's store that in an environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca2437d3",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ['OPENAI_API_KEY'] = \"[your_openai_api_key]\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4dbd176f-3b4e-4d41-a72d-1e1affe6ecae",
"metadata": {},
"source": [
"## Load the Dataset\n",
"\n",
"Let's load a small data set of Paul Graham's essays:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc6b0bc2-b95f-4190-bf77-fa2dc57fc247",
"metadata": {},
"outputs": [],
"source": [
"essays = [\n",
" 'paul_graham_essay.txt'\n",
"]\n",
"documents = SimpleWebPageReader().load_data([f'https://raw.githubusercontent.com/supabase/supabase/master/examples/ai/llamaindex/data/{essay}' for essay in essays])\n",
"print('Document ID:', documents[0].doc_id, 'Document Hash:', documents[0].hash)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "955b2700-8242-40eb-ac3f-d479a0312693",
"metadata": {},
"source": [
"## Create an index in Supabase\n",
"\n",
"Let's store Paul Graham's essays in Supabase. You can find the Postgres connection string in the [Database Settings](https://supabase.com/dashboard/project/_/settings/database) of your Supabase project.\n",
"\n",
"> **Note:** SQLAlchemy requires the connection string to start with `postgresql://` (instead of `postgres://`). Don't forget to rename this after copying the string from the dashboard.\n",
"\n",
"> **Note:** You must use the \"connection pooling\" string (domain ending in `*.pooler.supabase.com`) with Google Colab since Colab does not support IPv6.\n",
"\n",
"This will also work with any other Postgres provider that supports pgvector."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ce9bc85b-e844-407c-a0ad-ccf6af3c8866",
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Substitute your connection string here\n",
"DB_CONNECTION = \"postgresql://postgres:password@localhost:5431/db\"\n",
"\n",
"vector_store = SupabaseVectorStore(\n",
" postgres_connection_string=DB_CONNECTION, \n",
" collection_name='base_demo'\n",
")\n",
"storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
"index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9e82862d-440a-4f66-9ed7-0eaa6a0f4062",
"metadata": {},
"source": [
"## Query the index\n",
"\n",
"We can now ask questions using our index."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d2771545-d209-4ceb-a222-ed139a4620f2",
"metadata": {},
"outputs": [],
"source": [
"query_engine = index.as_query_engine()\n",
"\n",
"# Ask a question\n",
"response = query_engine.query(\"What did the author do growing up?\")\n",
"\n",
"# Print the response\n",
"print(response)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}