TXT Log Outlier Removal

Бюджет: 750 $

I have a steady stream of plain-text machine logs arriving every hour. What I need is a dependable way to sift through each file, detect the anomalous patterns hiding in the noise, and produce a cleaned log (or at least clearly flagged lines) before the next batch rolls in. The raw files are straightforward TXT—no embedded markup or JSON structures—so the solution can focus entirely on pattern analysis rather than parsing exotic formats. Because the anomalies are behavioural rather than simply extreme numeric values, the detection logic must look for irregular sequences, unexpected combinations of fields, or sudden structural deviations. I’m happy with a Python‐based approach (pandas, scikit-learn, PyOD, or similar libraries come to mind), but I’m open to another language if it keeps the setup lightweight and easily deployable on a Linux server. Speed matters: each hourly log is time-stamped and should be processed within the hour so downstream analytics always work with clean data. Your script or small tool should run from the command line, accept a path or file pattern, and write either a scrubbed copy or a parallel “outliers.txt” that my monitoring stack can ignore. Please package the work as follows: • A well-commented script or notebook ready to run • A concise README explaining dependencies and how to execute or schedule it • A short sample run (cleaned file or flagged output) that proves the anomaly filter is catching patterns, not just extreme numbers If you’ve tackled log anomaly detection before, especially on rolling hourly data, I’d love to see it in action here.

Python

Реєстрація