Objective: Develop a standalone PostgreSQL database (chat_and_billing) and a Python/Node.js service layer on a Google Cloud e2-standard-4 (16GB RAM) environment. The system must prioritize context accuracy and semantic recall over token cost, utilizing pgvector and deep-context retrieval strategies. 1. Database Schema (PostgreSQL 17+ & pgvector) The freelancer will provide a .sql initialization script for: Table: Users id, email, stripe_customer_id, created_at. Table: Conversations id, user_id (FK), title, created_at. Table: Messages (Deep Memory) id, conversation_id (FK). role (Enum: 'User', 'Assistant'). content (Text, Not Null). embedding (Vector 1536) — Note: Index using HNSW (Hierarchical Navigable Small World) for high-speed, high-accuracy retrieval. timestamp (Timestamp). 2. Advanced Retrieval Requirements (Effectiveness Focus) The freelancer must implement a "Hybrid Context Engine" rather than a simple "Last 10" query: Full Thread Continuity: The logic must retrieve the entire active conversation history (up to 10k+ tokens) to ensure the LLM never loses the current thread's flow. Cross-Conversation Semantic Recall: Implement a query that searches the Messages table for the top 20 most semantically relevant snippets across all of that user's historical conversations (excluding the current one). Parent-Context Retrieval: If a relevant memory snippet is found, the system must automatically fetch the 2 messages preceding and 2 messages following it to provide the LLM with full context. Metadata Filtering: Enable filtering by date ranges and conversation titles so the LLM can resolve queries like "What did we discuss in the Genesis Study last January?" 3. Stripe Billing Integration Table: Subscriptions (Tracks status, plan_tier, and current_period_end). Webhook Handler: Provide logic for Stripe webhooks (customer.subscription.updated) to ensure the user’s "Pro" or "Unlimited" status is verified before executing deep-memory queries. 4. Deliverables (No Server Access Required) Full SQL Script: Schema creation, pgvector extension, and HNSW indexing for the embedding column. Context Assembly Logic: A service-layer function (Python/Node.js) that: Fetches 100% of the current chat thread. Performs a vector search for related past memories. Combines them into a single "Context Block" for the LLM. CRUD API Endpoints: Standard functions for creating conversations and appending messages. 5. Hardware Context for Freelancer Target Machine: Google Cloud e2-standard-4 (16GB RAM, 100GB SSD). Optimization: Ensure the PostgreSQL configuration is optimized for shared_buffers = 4GB and work_mem = 64MB to handle high-dimensional vector math in RAM. Freelancer details: Phase 1: Database Schema & Setup (Deliverable: schema_setup.sql file) The freelancer will provide a single, commented SQL script to set up the dedicated database on a PostgreSQL 17+ instance. Database Name: chat_and_billing Required Schema & Indexes: Table Name Column Name Data Type/Constraint Notes/Indexes Users id SERIAL PRIMARY KEY email VARCHAR(255) UNIQUE NOT NULL stripe_customer_id VARCHAR(255) UNIQUE Crucial for Stripe mapping Conversations id SERIAL PRIMARY KEY user_id INT REFERENCES Users(id) ON DELETE CASCADE Index on user_id title VARCHAR(255) Messages id SERIAL PRIMARY KEY conversation_id INT REFERENCES Conversations(id) ON DELETE CASCADE Index on conversation_id role VARCHAR(50) CHECK (...) ('User' or 'Assistant') content TEXT NOT NULL embedding Vector(1536) Required: Must create HNSW index for high performance. Subscriptions id SERIAL PRIMARY KEY user_id INT REFERENCES Users(id) ON DELETE CASCADE stripe_subscription_id VARCHAR(255) UNIQUE status VARCHAR(50) e.g., 'active', 'canceled' plan_tier VARCHAR(50) e.g., 'Pro', 'Basic' SQL Requirements: Include CREATE EXTENSION IF NOT EXISTS vector; Define the HNSW index creation for the Messages.embedding column: CREATE INDEX ON Messages USING HNSW(embedding vector_cosine_ops); Phase 2: Functional Logic & Service Layer (Deliverable: Documented Python/Node.js Logic Outline) The freelancer will provide clear pseudocode or a complete code module showing how to interact with the database using the specified advanced retrieval strategies. Required Functions/Modules: save_message(user_id, conversation_id, role, content, embedding): Handles message insertion and ensures the corresponding user and conversation exist. Implements Recursive Character Splitting for very long user inputs before generating embeddings. retrieve_context(user_id, conversation_id, current_query_embedding): Goal: Max Effectiveness. This function assembles the full context payload for the LLM. A. Current Thread Continuity: Fetches all messages belonging to the current conversation_id. B. Cross-Conversation Semantic Search: Executes a pgvector search against other conversations for the user's top 20 relevant snippets (using cosine similarity). C. Parent Document Retrieval: Logic to pull the 2 messages before and after each relevant snippet found in Step B. D. Reranking (Optional but Recommended): Logic outline for using a Reranker API to select the final 10 most valuable snippets. handle_stripe_webhook(event_type, payload): Logic to process Stripe webhooks (checkout.session.completed, customer.subscription.updated) and update the Users and Subscriptions tables accurately. check_user_access(user_id): A simple function that checks the Subscriptions table status to verify a user is 'active' before granting access to premium RAG features. Phase 3: Environment & Constraints The freelancer must design the logic assuming the following target environment. No server access will be provided. Platform: Google Cloud Compute Engine Hardware: e2-standard-4 VM (16 GB RAM, 4 vCPUs, 100 GB SSD Persistent Disk) Database: PostgreSQL 17+ with pgvector extension enabled. Goal: Optimize SQL for performance on the 16 GB RAM limit (e.g., using HNSW indexes ensures vector operations leverage this RAM efficiently). Acceptance Criteria Project success is defined by: A complete, runnable SQL script (schema_setup.sql). Clear code or pseudocode that demonstrates all required logic points in Phase 2. Confirmation that the retrieval logic prioritizes the full current conversation history over arbitrary limits like "last 10 messages". Code: Table: Conversations id (Primary Key) user_id (Foreign Key to your User table) title (e.g., "Genesis Study - Jan 18") created_at Table: Messages id conversation_id (Foreign Key) role (User or Assistant) content (The text) timestamp