AI System for Real-Time Scam Detection

Заказчик: AI | Опубликовано: 04.02.2026

Project Title: AI-Based "Digital Arrest" Scam Detection System (MVP) Project Overview: I am looking for an AI/ML developer to build a functional prototype of a security system designed to detect "Digital Arrest" scams. The system needs to analyze video and audio inputs in real-time (or near real-time) to identify deepfakes, threatening language, and fake law enforcement visuals. Key Features Required (The Scope): I need a desktop-based prototype (Python/Streamlit or similar) that can process a sample video feed or live webcam input and perform the following: * Audio Threat Detection (NLP): * Transcribe audio in real-time (using OpenAI Whisper or Google Speech-to-Text). * Detect specific scam keywords/intents (e.g., "money laundering," "CBI," "narcotics," "arrest," "isolate yourself"). * Flag high-pressure/threatening tones. * Visual Forensics (Computer Vision): * Liveness/Deepfake Detection: Identify if the face in the video is AI-generated (looking for lack of blinking, lip-sync errors, or artifacts). * Uniform/Badge Recognition: Detect if the person is wearing a police uniform or showing a badge (using object detection like YOLO). * Real-Time Risk Dashboard: * A simple UI that displays a "Trust Score." If the score drops below a threshold, it shows a "SCAM ALERT" warning. Preferred Tech Stack: * Language: Python * ML Frameworks: TensorFlow / PyTorch / Keras * Computer Vision: OpenCV, MediaPipe * NLP: Hugging Face Transformers (BERT/RoBERTa for intent classification) * Interface: Streamlit or Flask (for the demo dashboard) Deliverables: * Source Code (well-commented). * A requirements.txt file for easy installation. * A short demo video showing the system detecting a scam attempt from a sample video file. * Documentation on the model architecture used. Screening Questions Questions for you? * "For the deepfake detection, will you be training a model from scratch, or do you plan to use a pre-trained model like XceptionNet or MesoNet? Why?" (A good dev will suggest pre-trained models to save time/cost). * "How will you handle the latency? If we use Whisper for audio transcription, will it be fast enough for a live alert?" * "Do you have experience with 'Multimodal' analysis (combining audio and video data), or will these run as separate independent modules?" Option A: The Screen-Reflection Test Implement a feature where the screen flashes a random color sequence. Build a CV model that attempts to detect this color change in the reflection of the caller's eyes/glasses. Goal: Prove the caller is a live feed and not a deepfake/loop. Option B: Environmental Consistency Check Build a classifier that labels the "Visual Scene" (e.g., Office, Outdoors, Car) and the "Audio Scene" (e.g., Echoey, Windy, Traffic). Trigger an alert if they do not match (e.g., Visual = Office, Audio = Traffic/Wind).