Site Reliability Engineer
Site Reliability Engineer117
Applications
117
Applications
Not Accepting Applications
About the Job
Skills
Job Overview:
• Design and build reliable systems: SRE engineers are responsible for designing and building IT systems and applications that are reliable, scalable, and efficient.
• Monitor system performance: SRE engineers monitor system performance and proactively identify and resolve issues before they impact users.
• Develop and maintain automation tools: SRE engineers develop and maintain automation tools to streamline IT operations and reduce manual intervention.
• Collaborate with development teams: SRE engineers collaborate closely with development teams to ensure that systems are designed and built with reliability and scalability in mind.
Primary Skills:
• Bachelor’s degree in computer science, Information Technology, or related field.
• Minimum of [insert number] years of experience as an SRE, with a focus on AWS.
• Strong knowledge of AWS services, such as EC2, RDS, S3, and CloudWatch.
• Experience with infrastructure as code tools such as Terraform, CloudFormation, and AWS CLI.
• Strong experience with automation and scripting tools such as Ansible, Go-lang , Java
• Excellent analytical and problem-solving skills, with the ability to quickly identify and resolve issues.
• Strong communication and collaboration skills, with the ability to work effectively with other IT teams and stakeholders.
• AWS certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or AWS Certified SysOps Administrator are a plus.
Good to have Skills:
• List additional skills which might help
Responsibilities and Duties:
• Design, build, and maintain highly available and scalable AWS infrastructure, including EC2, RDS, S3, and other services as needed.
• Implement and maintain automation tools for AWS infrastructure, using tools such as Terraform, CloudFormation, and AWS CLI, Go-lang or Java
• Monitor and analyze system performance, identifying and resolving issues proactively.
• Develop and maintain monitoring and alerting systems for AWS infrastructure, using tools such as CloudWatch and Prometheus.
• Collaborate closely with development teams to ensure that applications are designed and built with reliability and scalability in mind.
• Implement effective incident response processes, ensuring that incidents are quickly detected, escalated, and resolved.
• Work closely with other IT teams, such as network and security teams, to ensure that AWS infrastructure is secure, stable, and performant.
• Continuously evaluate and improve AWS infrastructure, identifying areas for optimization and implementing improvements to increase reliability and efficiency.
• Provide support to internal and external customers, responding to inquiries and resolving issues in a timely and effective manner.
• Stay up to date with AWS trends and best practices, continuously learning and incorporating new approaches and technologies to improve IT operations.
Keywords
• AWS , NoSQL , Cassandra , Aerospike , Kubernetes , Datadog , CI-CD , Jenkin
About the company
Industry
Human Resources
Company Size
11-50 Employees
Headquarter
New Delhi
Other open jobs from Mitr HR Solution