Sr. Service Reliability Engineer - Amazon - Seattle

Job description

DESCRIPTION

The AWS Commerce Platform provides the back and front-end services that enable AWS customers to purchase AWS services and understand and manage their infrastructure costs. Our teams tackle some of the hardest scalability, performance, and distributed computing challenges in the world. We process trillions of events per month using stream processing techniques (Kinesis), process billions of line items via map reduce (EMR), and manage artifacts through the latest in database technologies (DynamoDB and Aurora). We process big data and provide tools for customers to interactively understand their bills. We also provide the analytics that let customers manage billions of dollars of IT usage and spending. Because we sit at the nexus of all AWS services and interact directly with end-customers, we also work closely across all AWS teams to ensure that we offer a great customer experience.

We're looking for a Sr. Systems Developer to work closely with our Development community to build our highly reliable and scalable next-generation systems that support one of world's largest cloud services. We are seeking talented systems developer engineers to own automation, scaling, and to help us evolve our services to improve customer experience across the globe. You will also become intimate with the architecture of our systems and be responsible for diving deep into code and logs. This includes designing and developing software solutions for service monitoring, auto remediation, measuring availability/reliability, performance, analytics and security. You will build services and solutions that enrich our monitoring and automation through data analytics and applied tooling (ML, Clustering, anomaly detection, AI, etc.). You are creative and have excellent problem solving and analytical skills with experience in engineering, operating, troubleshooting, administrating and scaling online services. You have very strong knowledge of operating system and networking fundamentals. You understand TCP/IP and other common network protocols. You are proficient with at least one of the following languages: Java, Python, Go, or Ruby. You should have a bias toward automation and track record of creating automated solutions. You have experience operating distributed systems and diagnosing and resolving complex problems. You pay attention to detail and solve problems at their root. You will mentor Developers to build highly reliable, operationally excellent, NoSQL services. Through partnerships you foster with the development teams, you will support new features, services, releases, and become an authority in our services.

Desired profile

BASIC QUALIFICATIONS

· At least 5 years of recent experience in development, testing, and deployment of multi-tiered systems and services
· 3 years of recent experience with automation and operational support of production applications, engineering, operating, troubleshooting, administrating and scaling online services
· Experience with distributed operational health and performance monitoring systems.
· 3 years of experience with maintaining, managing and troubleshooting production databases (i.e., Oracle, MySQL, Postgres, etc.) specifically including SQL scripting.
· Bachelor's degree in Computer Science or a related technical field (or 5 years equivalent experience)
· Must be willing to work on a team with on-call and be part of a 7x24 service watch rotation with ability to drive into workplace for critical events/needs.
· Demonstrated skill and passion for operational excellence.

Offers “Amazon”

Job description

Desired profile