Forecasting (predictive modeling) -- 2

Заказчик: AI | Опубликовано: 01.02.2026

Machine Learning Engineer — Predictive Modeling & Data Standardization (Retail Analytics Platform) Project Overview We are building a retail analytics and decision intelligence platform. The goal of this engagement is to design and implement the core predictive modeling and data standardization layer that supports demand forecasting, inventory planning, and price optimization across multiple enterprise clients. This role is strictly focused on machine learning, data preprocessing, and model architecture. Frontend, visualization, and UI concerns are explicitly out of scope. Core Objective Design a generic, production-ready ML pipeline capable of ingesting heterogeneous retail datasets from different clients and producing reliable, model-driven insights through standardized preprocessing and modular predictive models. The system must support multi-client variability without per-client custom code, relying instead on configuration-driven transformations and reusable components. Data Environment Clients provide historical retail data via Excel/CSV files and, in later stages, API feeds. Data schemas vary significantly across clients and may include: Product identifiers (SKU, family, category) Store, region, or channel dimensions Temporal data with varying granularity (daily, weekly, monthly) Units sold Pricing and discount information Inventory levels (when available) Promotional or campaign flags Data completeness, quality, and structure are not guaranteed and must be handled programmatically. Responsibilities 1. Data Ingestion and Validation Design a standardized ingestion interface for structured files and APIs. Implement schema detection and validation logic. Handle missing fields, inconsistent naming, and malformed records. Log data quality issues and enforce minimum validation constraints. 2. Standardized Preprocessing and Normalization Build a fully configurable preprocessing pipeline that can: Normalize date formats, time zones, and temporal granularity. Standardize units of measure and currencies. Resolve product, store, and region hierarchies. Handle missing values, outliers, and sparse time series. Apply client-specific transformations through configuration files, not code branches. The preprocessing layer must be reusable and deterministic, enabling reproducible model training and inference. 3. Predictive Modeling Implement and structure three production-grade predictive systems: Demand Forecasting Time-series or hybrid forecasting models capable of operating at product/store/region level. Explicit handling of seasonality, trends, and historical sparsity. Designed for operational planning rather than academic benchmarks. Inventory Planning Stock level recommendations derived from forecasted demand and historical sell-through. Balance stockout risk versus overstock exposure. Output must be directly usable for purchase and replenishment decisions. Price Optimization Model price elasticity and demand response. Recommend optimal price points or discount ranges. Objective function focused on margin and revenue optimization, not volume alone. Models must be modular, versioned, and easily replaceable without refactoring the full pipeline. 4. Training, Inference, and Interfaces Clear separation between training and inference logic. Expose inference via Python interfaces or REST endpoints. Support low-latency predictions at client level. Ensure deterministic outputs for identical inputs. 5. Monitoring and Model Reliability Implement hooks or interfaces for: Model performance tracking (e.g., MAPE, error distributions). Data drift and feature distribution changes. Data freshness and pipeline execution health. Inference latency and failure rates. A full MLOps stack is not required, but production-awareness is mandatory. Technical Requirements Python (production-quality code, not notebooks) Pandas, NumPy scikit-learn, CatBoost, XGBoost, or equivalent Time-series forecasting techniques SQL for intermediate storage or aggregation (if applicable) REST API framework (FastAPI or similar) Experience designing multi-tenant data systems Cloud provider and infrastructure details are flexible. Deliverables: Modular Python codebase covering: Data ingestion and validation Config-driven preprocessing Demand forecasting model Inventory planning model Price optimization model Clear configuration system for onboarding new clients Inference interfaces or APIs Documentation covering: Pipeline architecture Client onboarding process Model retraining workflow Configuration examples Example datasets and reproducible runs