company logo

Site Reliability Engineer (SRE) with Automation

Mumbai
Full-Time
Senior: 5 to 10 years
Posted on Sep 18 2024

About the Job

Skills

UNIX
powershell
Ansible
Kubernetes
SRE
Automation

Site Reliability Engineer (SRE) with Automation

 

Job Overview

As a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture, people, systems, and processes to both establish and continuously improve SLIs and SLOs for uptime, performance, deployment, monitoring, and troubleshooting. You are interested in setting direction and leading the day to day processes that shape our vision for reliability

 

Responsibilities and Duties

·        Design and implement automation projects according to the requirements and responsible for end to end delivery up to production environment.

·        Willing to work hands-on coding to deliver given project.

·        Work collaboratively with OEM/vendor/partner for IT Infra Automation/Self-service tools deployment for capacity forecasting, predictability of failure, zero touch operation and auto healing.

·        Build standard documentation for automation.

·        Participate in RCA and understand the gap in monitoring automation for operations.

·        Maintain and support the Product and Data systems: proactively monitor events, investigate issues, analyze solutions, and drive problems through to resolution.

·        Experience with configuration management tools like Chef, Puppet, Salt or equivalent

·        Experience in Administration of AWS, Google or Azure Cloud

·        Define requirements and develop tools and reporting as needed by projects and operations.

·        Participate in 24x7 on-call rotation for after-hours emergencies

·        Use operational tools and monitoring platforms to gain in-depth knowledge, understanding, and ongoing monitoring of system availability, performance, and capacity.

·        Implement alerting strategy that makes alerts actionable and unique.

·        Provide follow-through to ensure issues are resolved to satisfaction

·        Drive continuous improvement and innovation within the team.

·        A sense of ownership, initiative and drive.

 

Qualifications

·        Bachelor's degree in Computer Science, or a related technical field involving software or systems engineering, or equivalent practical experience

·        5+ years hands on Experience with Linux/UNIX/Windows OS

·        Strong Shell/Python/PowerShell skills.

·        Experience in Infra Orchestration / Automation tools eg. Ansible, Terraform.

·        Good understanding of Git, DevOps methodology, CI/CD for Automation Projects.

·        Hands on experience on managing Web servers, Application servers, Databases (SQL/NoSQL)

·        Experience on Docker/Kubernetes

·        Knowledge of monitoring tools and strategy

·        Experience with incident management, running incident post-mortems

·        Solid understanding of automated deployment processes

·        Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.

·        Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.

·        Experience designing and developing software oriented towards systems or network automation.

 

About the company

We are the force behind the meteoric rise of Indias leading telecom operator Jio with 400 Million+ customers. In Addition to this we have also powered an exhaustive list of digital apps & services that have delivered functionality, usability, engagement, scale and loyalty. We provide solutions for customers (B2C) and enterprise (B2B). We have an end to end 5G solution consisting of 5G Radio, a com ...Show More

Industry

Media & Telecommunication...

Company Size

51-200 Employees

Headquarter

Navi Mumbai, Maharashtra

Other open jobs from Jio