Site Reliability Engineering Manager
United States Design / Civil engineering / Industrial engineering
Job description
Are you passionate about technology? Do you love leading teams to build new things? Do you want to help drive the future of IBM's Cloud offerings? If you answered YES, then we have the right opportunity for you!
The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and industrial research capabilities, no other company is as well positioned to address the full opportunity of cloud computing.
The Next Generation Cloud Network Engineering (NextGenCloud) team is a team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design to network architecture to storage and compute clusters to flexible infrastructure services. While our focus is on Network as a Service (NaaS), we are part of the team building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients. We are looking for a Site Reliability Engineer to join our team, who innovates & shares our passion for winning in the cloud marketplace.
This position is for a Site Reliability Engineer Manager who should have at least 12 years' industry experience. In this role, you will work as the manager of the Site Reliability team with the following key responsibilities:
· Manage a team of highly skilled DevOps engineers responsible for ensuring maximum uptime of NaaS software and associated infrastructure.
· Provide detailed trouble reports back to the development teams including automated methods to reproduce any defects; ensure that these reports are complete and accurate.
· Direct troubleshooting and maintaining pre-production CICD systems in support of deployment.
· Lead the team to ensure automation and the highest level of determinism possible in the installation and configuration of new systems (software and hardware).
· Deliver documentation of the automation and the interaction of software and system as necessary to enable all members of the team to ensure uptime in production.
· Lead the development of the processes and software necessary to maintain services post-deployment through data collection and monitoring ensuring overall health of the services provided.
· Ensure the team collaborates effectively.
· Lead meaningful planning to improve software, systems, and processes. To summarize, in this role you will engage in leading a team that performs in all aspect of the lifecycle of the IBM's NaaS, from idea to architecture and through deployment, operation, and improvement ensuring that our clients have the most reliable and performant experience possible.
This opportunity is for someone in the continental United States.
Auto req ID
123483BR
Required Education
Bachelor's Degree
Role ( Job Role )
Software Developer
State / Province
MULTIPLE
Primary job category
Software Development & Support
Contract type
Regular
Employment Type
Full-Time
ERBP
Yes
Is this role a commissionable/sales incentive based position?
No
Travel Required
No Travel
IBM Business Group
W&CP
Preferred Education
Master's Degree
City / Township / Village
MULTIPLE CITIES
EO Statement
IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Required Technical and Professional Expertise
Job Requirements
· 12+ years' experience as with systems and/or software engineering.
· 5+ years' experience with software development or similar discipline.
· 3+ years' experience in an operational environment requiring 99.999% uptime.
· 5+ years' experience as a first line manager.
· 3+ years' experience managing a team in an operational environment.
· Experience in a devops environment.
· Strong experience with Git.
· Experience with OpenStack or similar proprietary cloud like Azure or AWS.
· Experience with CICD and their pipelines; experience with Zuul or Jenkins a plus.
· Experience with containers and HA clusters; experience with Docker and Kubernetes a plus.
· Excellent knowledge of TCP/IP networking.
· Strong background in network engineering.
· Hands-on data center operational experience.
· Proven ability to collaborate and work well within a team.
· Ability to communicate effectively both verbally and in writing.
Skill-keywords
cloud data center design
Country
United States
Preferred Technical and Professional Experience
Preferred
---------
* 15+ years experience in all of the above
* Devops experience working with Ansible, Puppet, or Chef
* Experience with Data Center layout planning
Eligibility Requirements
· None
Position Type
Professional
New Collar Role
No