AI Chat Assistant's Memory State Improvement

Заказчик: AI | Опубликовано: 09.01.2026
Бюджет: 250 $

Have desktop AI chat assistant running local LLMs offline. It's running a Flask web server with a pywebview desktop GUI wrapper, using llama-cpp-python for GGUF model loading and an adaptive AI system that automatically selects models from performance tiers (minimal/low/medium/high) based on hardware detection. The app uses conversation context tracking with topic analysis, reference resolution, and memory monitoring to switch between models when RAM gets tight. The problem is the model isn't keeping memory/conversation state between requests - even though there's conversation history tracking and context enhancement systems in place, the model seems to lose context and not maintain state properly between generations. Also, loading bigger models (like 24B parameter models in the high tier) is failing or crashing - the RAM checks and fallback logic exist but bigger models won't load reliably, possibly due to memory allocation issues during loading, GPU layer configuration problems, or the model switching logic interfering with initial loads.