I need a complete, camera-ready body-emotion detection pipeline that runs in real time on a standard classroom CCTV feed. The end goal is to monitor student engagement continuously, flagging moments of happiness, surprise, confusion, fear, frustration and closely related states so that teaching staff can react while the lesson is still in progress. Scope of work • Model exploration: start with CNN-BiLSTM and YOLO-based approaches, experiment with any other architecture you feel can outperform them, then select and justify the single best model for live deployment. • Data: restrict training to publicly available body-language datasets (for instance, CMU Panoptic, PKU-MMD, Kinetics-Skeleton, or similar) plus any augmentations you create yourself. No private data collection will be possible on my side. • Training & evaluation: deliver clear metrics—accuracy, precision/recall per emotion class and real-time FPS tested on a 1080p classroom-style video. • Inference pipeline: provide Python code (PyTorch or TensorFlow/Keras are both fine), a lightweight REST or gRPC endpoint, and a demo script that ingests an RTSP stream from a classroom CCTV camera and overlays bounding boxes with emotion labels in real time. • Documentation: include setup instructions, environment file or Dockerfile, and a concise report explaining architecture choices, hyper-parameters, final metrics and how to extend the emotion set later. Acceptance criteria 1. At least 25 FPS inference on a single mid-range GPU (RTX 3060 or better) with ≤150 ms end-to-end latency. 2. Mean F1 ≥0.70 across the specified emotion classes on a held-out public validation set. 3. End-to-end demo video showing live detection on a 5-minute classroom clip. I’m ready to start as soon as you confirm your preferred framework and estimated timeline.