AWS Data Engineer for ELT Pipelines Design

Бюджет: 50 $

Summary Overview We’re looking for an experienced AWS Data Engineer to help us design and implement production-grade ELT pipelines on AWS. The objective is a scalable, well-governed data foundation that supports accurate reporting and analytics with clear lineage, repeatable transformations, and reliable orchestration. This is not a one-off scripting project. We want an engineer who can make sound architectural decisions, implement infrastructure and pipelines cleanly, and create documentation we can reuse across clients and datasets. Current AWS Stack You will work primarily with: S3 (data lake storage) AWS Glue (crawlers, jobs, Data Catalog) Athena (querying) Amplify (light frontend serving curated inputs) What We Need Built We need help defining and implementing the end-to-end flow: Architecture & Data Modeling Propose a practical Bronze/Silver/Gold design for our use case Establish conventions: folder structure, partitioning strategy, naming, table registration, and schemas Ingestion (Raw/Landing) Build repeatable ingestion patterns into S3 Set up Glue Data Catalog assets and crawlers as appropriate Ensure ingestion is auditable (run logs, basic data quality checks, and schema change approach) Transformations Implement transformations (Glue / Spark / SQL) into curated tables Produce analytics-ready datasets with agreed-upon business logic Optimize for performance: partitions, file sizing, and query cost control Orchestration / Reliability Implement a reliable scheduling/orchestration approach (recommend best fit: Glue workflows, Step Functions, MWAA/Airflow, etc.) Add a monitoring/alerting strategy appropriate for a small team (CloudWatch-based is fine) Governance & Access Configure permissions and table governance properly Maintain clear lineage from raw to curated Handoff Provide clear documentation: architecture diagram, runbook, and how to extend the pipeline Ideal Candidate (Must Have) Strong hands-on experience with S3, Athena, Glue, Python Experience building production ELT/ETL pipelines (not just ad hoc scripts) Solid understanding of data design (partitions, Parquet, table formats, cost/performance) Comfort with SQL and data modeling for analytics-ready datasets Ability to communicate tradeoffs clearly and propose a clean architecture Clear updates, short design review upfront, then implementation sprints After reviewing this proposal, I will proceed with a 4-hour test. The detailed scope is outlined below. Architecture + Data Modeling Design Review Project Summary Create data pipelines for marketing and sales data that will feed Tableau. You will ingest: HubSpot (sales and lead opportunities, revenue tracking for closed deals) Google Ads (marketing campaigns, spend) Manual inputs for goal setting and commission setting Propose the best design to complete this work. The work must take place in an AWS environment, with Athena preferred as the database solution. Deliverable: Written Design Doc Please address the following: Assumptions & Open Questions Assumptions about volumes, refresh cadence, number of clients, environments (dev/prod), etc. Proposed Architecture High-level diagram (simple boxes/arrows is fine) Services used and why Data flow overview Orchestration + Reliability Tool choice and reasoning Dependencies, retries, and idempotency approach Backfill strategy (re-run date ranges safely) Monitoring + Data Quality Minimum viable checks (row count deltas, null rates, duplicates) Alert triggers and definition of “pipeline failure” Cost/Performance Considerations Athena scan minimization strategy Partitioning vs partition projection (if relevant) Small file mitigation Handoff / Runbook Outline What gets documented and how a new engineer would extend the pipeline Pricing Plan & Summary Scope a. Milestones including timelines, acceptance criteria, and estimated effort (hours) b. Estimated maintenance and operating costs Pros and Cons of Designs Possible upgrades to the system Please let me know your proposed approach.

Python

Регистрация