{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "4f3b89c3", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/llamaindex/llamaindex.ipynb)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2a1f181d-feeb-4b29-aabc-67a75234b92c", "metadata": {}, "source": [ "# Supabase + LlamaIndex\n", "\n", "In this example we'll use PostgreSQL + pgvectors with [LlamaIndex's Supabase Vector Store](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html#setup-openai).\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3cab93f5-10d0-47c5-9f4e-64921461e7e2", "metadata": {}, "source": [ "## Install Dependencies" ] }, { "cell_type": "code", "execution_count": 1, "id": "a41bc3e4-ea52-43aa-9239-a431b49f029e", "metadata": {}, "outputs": [], "source": [ "!pip install -qU vecs datasets llama_index html2text" ] }, { "cell_type": "code", "execution_count": null, "id": "0026437c", "metadata": {}, "outputs": [], "source": [ "import logging\n", "import sys\n", "\n", "# Uncomment to see debug logs\n", "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n", "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n", "\n", "from llama_index.core.storage import StorageContext\n", "from llama_index.readers.web import SimpleWebPageReader\n", "from llama_index.indices.vector_store import VectorStoreIndex\n", "from llama_index.vector_stores.supabase import SupabaseVectorStore\n", "import textwrap\n", "import html2text" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7a95a440", "metadata": {}, "source": [ "# Set up OpenAI\n", "\n", "OpenAI requires an [API key](https://platform.openai.com/api-keys) to run their models. Let's store that in an environment variable:" ] }, { "cell_type": "code", "execution_count": null, "id": "ca2437d3", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['OPENAI_API_KEY'] = \"[your_openai_api_key]\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4dbd176f-3b4e-4d41-a72d-1e1affe6ecae", "metadata": {}, "source": [ "## Load the Dataset\n", "\n", "Let's load a small data set of Paul Graham's essays:" ] }, { "cell_type": "code", "execution_count": null, "id": "dc6b0bc2-b95f-4190-bf77-fa2dc57fc247", "metadata": {}, "outputs": [], "source": [ "essays = [\n", " 'paul_graham_essay.txt'\n", "]\n", "documents = SimpleWebPageReader().load_data([f'https://raw.githubusercontent.com/supabase/supabase/master/examples/ai/llamaindex/data/{essay}' for essay in essays])\n", "print('Document ID:', documents[0].doc_id, 'Document Hash:', documents[0].hash)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "955b2700-8242-40eb-ac3f-d479a0312693", "metadata": {}, "source": [ "## Create an index in Supabase\n", "\n", "Let's store Paul Graham's essays in Supabase. You can find the Postgres connection string in the [Database Settings](https://supabase.com/dashboard/project/_/settings/database) of your Supabase project.\n", "\n", "> **Note:** SQLAlchemy requires the connection string to start with `postgresql://` (instead of `postgres://`). Don't forget to rename this after copying the string from the dashboard.\n", "\n", "> **Note:** You must use the \"connection pooling\" string (domain ending in `*.pooler.supabase.com`) with Google Colab since Colab does not support IPv6.\n", "\n", "This will also work with any other Postgres provider that supports pgvector." ] }, { "cell_type": "code", "execution_count": 5, "id": "ce9bc85b-e844-407c-a0ad-ccf6af3c8866", "metadata": {}, "outputs": [], "source": [ "\n", "# Substitute your connection string here\n", "DB_CONNECTION = \"postgresql://postgres:password@localhost:5431/db\"\n", "\n", "vector_store = SupabaseVectorStore(\n", " postgres_connection_string=DB_CONNECTION, \n", " collection_name='base_demo'\n", ")\n", "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n", "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9e82862d-440a-4f66-9ed7-0eaa6a0f4062", "metadata": {}, "source": [ "## Query the index\n", "\n", "We can now ask questions using our index." ] }, { "cell_type": "code", "execution_count": 7, "id": "d2771545-d209-4ceb-a222-ed139a4620f2", "metadata": {}, "outputs": [], "source": [ "query_engine = index.as_query_engine()\n", "\n", "# Ask a question\n", "response = query_engine.query(\"What did the author do growing up?\")\n", "\n", "# Print the response\n", "print(response)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" } }, "nbformat": 4, "nbformat_minor": 5 }