Principal Site Reliability Engineer - DELL - United States

Job description

Why Work at Dell?

Endless challenges and rewards. Opportunities on six continents. A team of colleagues fueled by collaboration. All this, and a company deeply committed to integrity and responsibility.

Principal Site Reliability Engineer

*Remote / Virtual Opportunity

Have you ever wondered what it takes to run a cloud infrastructure at scale? Do you enjoy a challenge and improving services already at scale? Our team of engineers at Virtustream work to design, build and operate a infrastructure-as-a-service cloud for some of the biggest companies in the world. The database engineering team is looking for individuals with a diverse set of experience and skills to design, build and operate database solutions needed for the exciting new Virtustream Cloud. As a site reliability engineer you will build and maintain a database platform for metadata services, automated remediation, and service management at scale needed to maintain high service reliability with low touch.

Responsibilities include:

·
Design, build, and operate a NoSQL based metadata service

·
Build scalable services running on a Container platform.

·
Develop solutions for monitoring, automated remediation, measuring availability and reliability, performance, analytics and security

·
Maintaining environment state with the use of configuration tools and event driven automation

·
Participate in collaborative projects with software engineering teams

·
Participate in troubleshooting, capacity planning and analysis, performance analysis activities.

·
Part of a 24x7 service watch rotation team

Requirements:

·
Experience supporting and troubleshooting production platform environments at scale

·
Production experience using configuration management tools (eg Ansible, Saltstack, Puppet, Chef)

·
Experience instrumenting monitoring and alerting for production platforms

·
Proficiency implementing and maintaining continuous integration and delivery workflows.

·
Experience managing Unix/Linux systems in production

·
A tenacious ability to diagnose and fix performance and reliability problems

Preferred skills:

·
Experience in deploying, managing, tuning, administering and scaling Cassandra database cluster

·
Experience in JVM tuning/Monitoring and related tools.

·
Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols

·
Experience with backup and disaster recovery solution for Cassandra

·
3+ year Experience on Container Orchestration tool like Kubernetes/Marathon etc

·
3+ year Experience as DevOps, Operations Engineer, or SRE (development for large online services)

·
3+ year Experience building and operating highly available and scalable infrastructure solutions

·
Experience working in distributed, remote teams across multiple time zones a plus

·
Ability to travel for team meetings.

Dell is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at Dell are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. Dell will not tolerate discrimination or harassment based on any of these characteristics. Learn more about Diversity and Inclusion at Dell here .