Overview

Responsibilities

  • Implement a large scale data warehouse in AWS
  • Implement high-performance data pipelines that can be scaled to process petabytes of data on a daily basis
  • Design, implement and maintain ETL processes
  • Implement Direct Acyclic Graphs (DAGs) in Apache Airflow to programmatically author, schedule and monitor workflows
  • Design and build rest APIs using Python Flask framework
  • Work with data scientists and productionize machine learning algorithms for real-time fraud detection
  • Work with data analysts to automate and optimize reporting and BI infrastructure

Requirements

  • At least 5 years of experience as a data engineer or a back end developer
  • Proficient in programming in Java and Python
  • Proficient in Apache Spark and Airflow
  • Proficient in writing and optimizing SQL statements
  • Proficient with AWS and/or Cloud Computing
  • Experienced with Data Engineering services like Athena, Redshift, Sagemaker, Kineses etc.
  • Experienced with SQL and NoSQL Databases like DynamoDB, RDS Aurora, MySQL, ElasticSearch, Solr, etc.
  • Experienced with BI tools such as Tableau, AWS Quicksight etc
  • Experienced in using monitoring tools and instrumentation to ensure optimum platform and application performance
  • Experienced in both streaming and batch data processing
  • Knowledge of machine learning concepts will be an advantage
  • Knowledge of Scala will be an advantage
  • Prior experience in working with cross-functional data and tech teams will be an advantage