Social Media & News Sentiment Analysis System

Заказчик: AI | Опубликовано: 15.12.2025

PROJECT BRIEF: Social Media & News Comment Monitoring System (Data Collection + NLP + Admin Panel + Power BI Integration) 1. Project Overview We are looking for an experienced full-stack developer or development team to build a complete monitoring system for collecting, processing, analysing and exporting social media and news comment data. The system must automatically fetch content, perform text cleaning and classification, run sentiment analysis, and make the processed data available for dashboards in Power BI. The project consists of four core components: Data acquisition Data processing & NLP Admin panel Power BI integration Hosting can be on AWS / Azure / VPS (developer may propose optimal architecture). 2. DATA ACQUISITION (APIs + Legal Scraping) The developer must be able to implement: 2.1 Public Data Sources X/Twitter API (official API only) YouTube API (video search, metrics, comments) Reddit API News portals with public comment sections (scraping only where legally permitted) Facebook/Meta data only if possible via legal Graph API endpoints (no scraping; must comply with Meta Platform Policy: https://developers.facebook.com/policy/ ) 2.2 Data Collection Requirements Configurable fetch frequency (e.g., every hour / once per day) Logging of errors and API limits Storage in a structured SQL database (PostgreSQL preferred, but open to alternatives) Expected data volume (initial estimate): 5,000–50,000 rows per day Scalable architecture to support higher volumes later 2.3 Data Structure Each collected item should include at minimum: Date/time Platform Source URL / ID Full text Author (if available & legally permissible) Number of likes / replies / views (if applicable) 3. DATA PROCESSING & NLP PIPELINE The developer must create an automated processing pipeline with: 3.1 Text Pre-Processing Removal of HTML, special characters, emojis, spam patterns Language detection: LV, RU, EN Tokenisation and normalisation 3.2 Automated Keyword & Topic Classification Keyword matching for predefined keyword lists Automatic extraction of additional keywords Optional: Topic modelling (LDA or similar), if feasible 3.3 Sentiment Analysis Sentiment classification (positive / negative / neutral) Developer may use an existing open-source model or fine-tune a lightweight model Accuracy must be optimised for Latvian/English/Russian if possible 3.4 Output Table For each processed text: Date Platform Source Text Sentiment value Topic / category Identified keywords Metrics (likes, shares, views, etc.) 4. ADMIN PANEL (Web Application) A secure admin panel where non-technical users can manage monitoring settings. 4.1 Core Features Add / edit / delete keywords and keyword groups Create thematic monitoring groups (e.g., “Elections”, “Education”, “Human rights”) Enable / disable data sources Filter by: Date range Platform Language Sentiment Keywords View summary statistics (optional basic charts) 4.2 Users & Authentication Initially 1–3 admin users Standard login system (JWT or session-based) Role-based access (optional) 5. POWER BI INTEGRATION Developer must provide one of the following: Option A — Direct API Feed Provide an API endpoint that Power BI can refresh automatically. Option B — Export to Files Periodic export of data into: CSV JSON Parquet (optional) Stored on AWS S3 / Azure Blob / server folder for Power BI ingestion. Power BI Requirements The exported/served table must include: date platform source text sentiment topic keywords metrics columns unique ID timestamp of last update Automatic scheduled refresh is required. 6. TECHNICAL REQUIREMENTS 6.1 Preferred Tech Stack (flexible) Backend: Python (FastAPI / Flask) or Node.js Frontend: React / Vue / standard admin template Database: PostgreSQL or MySQL NLP: Python (spaCy, NLTK, transformers) Hosting: AWS / Azure / DigitalOcean / other (developer may propose) 6.2 Documentation The developer must deliver: API documentation Database schema Instructions on deployment Admin panel user guide 6.3 Code Ownership All code and database schema must be fully delivered with full usage rights. 7. WHAT TO INCLUDE IN YOUR PROPOSAL Developers should provide: Description of relevant experience in: API integrations NLP / sentiment analysis Web scraping Admin panel development Power BI integrations Proposed tech stack Estimated timeframe Estimated cost (fixed price or hourly) Clarifications/questions if any requirements are unclear 8. Additional Notes Project must comply with all platform policies (Meta, YouTube, X, Reddit, etc.). No illegal scraping of Facebook is allowed. Scalability and reliability are important. GDPR