company logo

Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌

Bangalore
Full-Time
Mid-Level (4 to 6 years)
Posted on Apr 14 2022

About the Job

Skills

Job Title: Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌

Experience Required: 4 to 6 years of experience

Location: Bangalore


Role:

Site Reliability Engineer at Flipkart are developers‌ ‌with‌ excellent‌ ‌operations‌ ‌mindset.‌ As‌ ‌a‌ Site Reliability Engineer, you will be building solutions to scale ‌our‌ ‌platforms‌ ‌and‌ applications‌ reliably for ‌high ‌availability ‌and‌ make sure ‌Service‌ ‌Level‌ objectives (SLO) are‌ ‌met.‌ ‌You will own ‌all‌ ‌the‌ ‌SLOs‌ ‌of‌ ‌various Flipkart services across tiers.‌ ‌You will work‌ ‌directly‌ ‌with our Software‌ ‌Development teams to reduce the toil of developing, deploying and maintaining our software,by adopting engineered solutions and reliability engineering ‌best‌ ‌practices‌.‌ ‌You will be responsible for solving ‌ greenfield ‌ problems in ‌ reliability engineering and benchmarking, at ‌scale.‌ ‌ ‌‌


Responsibilities:

  • Help our engineers adopt Flipkart Reliability Engineering playbook by abstracting context and complexities of a hybrid cloud.
  • Build, coach and mentor teams of Site Reliability Engineers
  • Cover‌ ‌availability,‌ ‌reliability,‌ ‌security‌ ‌etc.‌ ‌considerations‌ ‌being‌ ‌imbibed‌ ‌and‌ ‌reviewed‌ ‌and‌ ‌adhered‌ ‌to‌ ‌at‌ ‌every‌ ‌stage‌ ‌of‌ ‌product‌ ‌development.‌
  • Monitor‌ ‌and‌ ‌resolve‌ ‌issues‌ ‌in‌ ‌all‌ ‌environments.‌ ‌Ensure‌ ‌SLO‌s ‌ ‌are‌ ‌met.‌ ‌Alert‌ ‌appropriately,‌ ‌build‌ ‌self-healing‌ ‌capabilities‌ ‌in‌ ‌the‌ ‌platforms,‌ ‌involve‌ ‌people‌ ‌when‌ ‌needed,‌ ‌and‌ ‌log‌ ‌tickets.‌ ‌Participate‌ ‌in‌ ‌a‌ ‌24x7‌ ‌on-call‌ ‌rotation.‌ ‌
  • Run periodic resilience ( chaos) experiments and continuously verify the state of reliability
  • Build‌ ‌and‌ ‌improve‌ ‌configuration‌ ‌and‌ ‌automation‌ ‌tools‌ ‌to‌ ‌remove‌ ‌toil ‌in‌ developing,‌ deploying and maintaining ‌software
  • Own‌ ‌the‌ ‌RCA‌ ‌lifecycle‌ ‌for‌ ‌the‌ ‌platform‌ ‌issues,‌ ‌be‌ ‌answerable‌ ‌to‌ ‌the‌ ‌stakeholders‌ ‌(internals‌ ‌and‌ ‌external)‌ ‌on‌ ‌most‌ ‌of‌ ‌the‌ ‌service‌ ‌internals.‌ ‌
  • Have‌ ‌a‌ ‌viewpoint‌ ‌on‌ ‌the‌ ‌distributed‌ ‌systems’‌ ‌performance,‌ ‌and‌ ‌should‌ ‌be‌ ‌able‌ ‌to‌ ‌drive‌ ‌the‌ ‌capacity‌ ‌plans‌ ‌and‌ ‌scale‌ ‌requirements.‌ ‌
  • Identifying‌ ‌bottlenecks‌ ‌and‌ ‌tuning‌ ‌areas‌ ‌as‌ ‌long‌ ‌as‌ ‌major‌ ‌code‌ ‌changes‌ ‌are‌ ‌not‌ ‌necessary.‌ ‌e.g.‌ ‌If‌ ‌working‌ ‌on‌ ‌a‌ ‌hive‌ ‌benchmark,‌ ‌and‌ ‌MySQL‌ ‌connection‌ ‌pool‌ ‌is‌ ‌not‌ ‌externally‌ ‌configurable‌ ‌and‌ ‌expansion‌ ‌policy‌ ‌is‌ ‌becoming‌ ‌a‌ ‌problem,‌ ‌you‌ ‌should‌ ‌be‌ ‌able‌ ‌to‌ ‌make‌ ‌code‌ ‌changes,‌ ‌build‌ ‌it‌ ‌and‌ ‌expose‌ ‌config‌ ‌and‌ ‌continue‌ ‌benchmark.‌ ‌
  • Partner‌ ‌the‌ ‌developer‌ ‌and‌ ‌devops‌ ‌teams‌ ‌in‌ ‌on-call‌ ‌load‌ ‌sharing,‌ ‌handle‌ ‌24/7‌ ‌platform‌ ‌support.‌ ‌

Qualification:

  • BTech or Mtech in CS or‌ ‌equivalent with 5+‌‌ ‌‌years‌ ‌working‌ ‌w/‌ ‌highly‌ ‌available‌ ‌platforms‌ ‌in‌ ‌web-scale‌ ‌organizations.‌ Demonstrated‌ ‌experience‌ ‌of‌ ‌around‌ ‌1-2‌ ‌years‌ ‌as‌ ‌a‌ ‌developer‌ ‌is‌ ‌good‌ ‌to‌ ‌have.‌
  • Good‌ ‌troubleshooting‌ ‌skills‌ ‌of‌ ‌always‌ ‌available‌ ‌and‌ ‌high‌ ‌scale‌ ‌systems.‌ ‌
  • Should‌ ‌have‌ ‌the‌ ‌ability‌ ‌to‌ ‌effectively‌ ‌collect‌ ‌all‌ ‌the‌ ‌relevant‌ ‌data-points‌ ‌and‌ ‌debugging‌ ‌artefacts/snapshots‌ ‌so‌ ‌that‌ ‌the‌ ‌debugging‌ ‌at‌ ‌a‌ ‌later‌ ‌stage‌ ‌can‌ ‌be‌ ‌as‌ ‌effective‌ ‌as‌ ‌possible.‌ ‌ ‌
  • Expert‌ ‌level‌ ‌knowledge‌ ‌of‌ ‌at‌ ‌least‌ ‌one‌ ‌configuration‌ ‌management‌ ‌system‌ ‌(Ansible,‌ ‌Puppet,‌ ‌etc.).‌ ‌
  • Understanding‌ ‌of‌ ‌standard‌ ‌networking‌ ‌basics‌ ‌such‌ ‌as:‌ ‌HTTP,‌ ‌DNS,‌ ‌TCP/IP,‌ ‌ICMP,‌ ‌the‌ ‌OSI‌ ‌Model,‌ ‌Subnetting‌ ‌and‌ ‌Load‌ ‌Balancing,‌ ‌DB‌ ‌sharding,‌ ‌partitions‌ ‌etc..‌ ‌
  • Excellent‌ ‌written‌ ‌and‌ ‌verbal‌ ‌communication‌ ‌skills.‌ ‌
  • Understand‌ ‌CI/CD‌ ‌and‌ ‌ability‌ ‌to‌ ‌architect‌ ‌the‌ ‌workflow‌ ‌or‌ ‌a‌ ‌deployment‌ ‌plan.‌ ‌
  • Write‌ ‌software‌ ‌to‌ ‌automate‌ ‌API-driven‌ ‌tasks‌ ‌at‌ ‌scale;‌ ‌using‌ ‌Python,‌ ‌Go‌ ‌etc.,‌ ‌develop‌ ‌application‌ ‌components‌ ‌wherever‌ ‌required‌ ‌using‌ ‌Scala,‌ ‌Python,‌ ‌C++‌ ‌and‌ ‌Java


About the company

Anzy Global is a leading HR Consultancy for IT and IT-related industry. Our solutions are powered by a deep expertise and an in depth understanding of technology. Our constant endeavour and commitment is to provide the best talent available in the industry to our clients. By the virtue of our strong network, not only do we put the best candidates across but also provide insight on their past pe ...Show More

Industry

Human Resources Services

Company Size

51-200 Employees

Headquarter

Bangalore