Data Engineer
Data Engineer181
Applications
181
Applications
Not Accepting Applications
About the Job
Skills
Job Description: Data Engineer
We are looking for a highly motivated Data Engineer who will be responsible for designing and developing scalable solutions for a large data infrastructure in a fast-paced environment. You will participate in detailed technical design, development, and implementation of applications using cutting-edge technology stacks.
Our main focus is on building and managing a highly scalable Data Platform for our Identity Analytics product that is capable of performing large scale data analytics. The ideal candidate will have a strong engineering background with the ability to tie engineering initiatives to business impact.
This position is within our data science team and the individual will work with data scientists, software engineers and qa engineers in the team to create solutions to improve the scale of ML models.
OUR TEAM MISSION
At Aquera, we believe we have an important story to tell: companies can transform their operations with a single source of truth of user identities that already exists in their HRIS. Aquera helps companies of all sizes to eliminate security gaps, reduce operational costs, and support day-1 employee productivity with HR-driven IT automation–in a click.
RESPONSIBILITIES
- Design scalable and reliable data pipelines to consume, integrate and analyze large volumes of complex data from different sources to support the growing needs of our business.
- Build & manage a DataLake in AWS infrastructure.
- Build Knowledge graph of user identities for Identity Management and Analytics.
- Design our data pipeline architecture which collects, processes, streams and analyses very large amounts of data.
- Analyzing batch and stream data sources and designing scalable data pipelines that transform ETL efficiently and at scale.
- Designing and implementing the feature store to support current and anticipated data analytics and machine learning workloads.
- Running and troubleshooting data and ML pipelines running on AWS infrastructure.
- Implementing CI/CD pipelines that integrate data processing logic and perform validation and logic promotion to production.
- Design custom workflows to orchestrate the control plane to manage model, data source and pipeline lifecycle.
- Write code, leverage managed solutions, open-source tools, and industry best practices to ensure our data is reliable, standardised and reusable.
QUALIFICATIONS
- 3 to 7 years of industry research experience in a large scale data computational area such as data science, computer science, machine learning, artificial intelligence or statistics.
- Proficiency in Python (or similar) programming language for data analysis and software development.
- 3+ years experience in the data warehouse and data lake space.
- Experience with building distributed systems.
- Experience with Graph Databases
- Experience with Machine Learning Life Cycle.
- Familiarity with data engineering tech stack - SQL, ETL tools, Spark,PySpark, Kafka, Redis, ElasticSearch
- Experience in Data Lakes & Analytics on AWS:
- Storage- (S3/Opensearch); Data Transformation Pipeline (Lake formation, Glue, data catalogue); Data Analytics Pipeline (Athena / Redshift).
- Experience with CI/CD tools and procedures.
- Experience with json, parquet/avro file formats.
- Good to have:- Experience in MLops, MLflow, Airflow, Databricks.
- Proven ability to work creatively and analytically in a problem-solving environment.
- Exceptional written and oral communication skills.
- Fundamental belief in the importance of Integrity, good vibes and good work ethic.
- Poised under pressure and a sense of humor.
- Preferred:- Experience with Product based companies.
- Bachelor’s or Masters degree in a computational, mathematical, engineering or scientific field.
About the company
Industry
Information Technology an...
Company Size
11-50 Employees
Headquarter
Bangalore