KEY RESPONSIBILITIES
• Engage in and improve the whole lifecycle of services—from inception and design, deployment,
operation, and refinement.
• Support services before they go live through activities such as system design consulting,
developing software platforms and frameworks, capacity planning, and launch reviews.
• Maintain services once they are live by measuring and monitoring availability, latency, and overall
system health.
• Scale systems sustainably through mechanisms like automation; evolve systems by pushing for
changes that improve reliability and velocity.
• Lead sustainable incident response, blameless postmortems, and production improvements that
result in direct business opportunities for Organization.
• Manage individual project priorities, deadlines, and deliverables.
• Provide guidance to other team members on managing end-to-end availability and performance of
mission-critical services, on building automation to prevent problem recurrence, and on building
automated responses for non-exceptional service conditions.
• Able to work in shifts 24x7
REQUIRED QUALIFICATIONS
Minimum qualifications:
• Bachelor’s degree in Computer Science, a related technical field involving software/systems
engineering, or equivalent practical experience.
• Experience programming in at least one of the following languages: Java, C#, Go, GCP, Infuxdb,
Grafana.
• Experience with algorithms and data structures.
• 3-5 years of experience in computing, distributed systems, storage, or networking.
Preferred qualifications:
• Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
• Ability to debug, optimize code, and automate routine tasks.
• Systematic problem-solving approach, coupled with effective communication skills and a sense of
drive.
• Experience with algorithms and data structures and/or Unix/Linux systems internals (e.g.,
filesystems, system calls) and administration.