I’m building an internal “SmartDoc” assistant that can digest practically any content I hand it—full PDFs, single HTML files, live web pages, images, even video transcripts—and then answer questions with proof in sight. Here’s what I need you to put together: • Ingestion & storage – Extract text from all supported formats. – Create embeddings and store them in more than one vector database; design it so I can toggle between, say, Chroma, Pinecone, Milvus, or any other store with minimal code changes. • Query workflow 1. Retrieve the most relevant chunks from every connected vector DB. 2. Hand those chunks to whichever LLM the user chooses. By default the call should hit my self-hosted model on a GPU server, but the UI must also let a user drop in their own API key and route the prompt there instead. 3. Return an answer that always includes: – the exact source page (or frame) you took the facts from, – highlighted passages on that page, – a concise summary, and – for PDFs, line numbers and, if possible, the page itself as a mini-PDF attachment. • Stack & deliverables – Prefer Python; I’m already comfortable with LangChain but open to LlamaIndex or clean custom code. – Docker-ised deployment with a README good enough for me to spin it up on my Ubuntu GPU box in one command. – Clean, modular code so I can slot in new data types or vector stores later. The must-haves are accurate citation, smooth LLM switching, and fault-tolerant retrieval across multiple databases. Point me to a previous RAG (Retrieval Augmented Generation) project, demo, or repo and you’ll jump to the front of the line. I’m ready to kick off immediately and will stay responsive throughout the build.