HPC DevOps Engineer
Madrid, SPAIN
Job description
The Position
Madrid (Spain) - Hybrid
As a HPC (High Performance Computing) DevOps Engineer, you will be part of the High Performance Computing section of Roche Informatics, responsible for implementing, operating and evolving computing and data IT infrastructure solutions, supporting our Research and Development organizations to enable science in Roche.
The position will involve working within product teams to deliver the best class scientific computing platforms, in partnership with our scientists and service providers. Knowledge of parallel file systems and high performance storage platforms, Linux system administration, scripting and a DevOps approach to platform administration.
Job Responsibilities
Contribute to activities focused on availability, tuning, performance, efficiency, change management, monitoring, emergency response and capacity planning.
Engage in and improve, under guidance, the whole lifecycle of services—from inception and design through deployment, operation and refinement.
Monitors and resolves Incident/problems with platform operations, suggesting priorities and collaborating in the resolution when required.
Contribute to support services before they go live through activities such as infrastructure design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Scale systems sustainably through mechanisms like automation, and evolve systems by proposing changes that improve reliability and velocity.
Contribute to the maintenance of services once they are in production by measuring and monitoring availability, latency and overall system health.
Look for continuous improvement activities both in technical, teamwork, collaboration and processes areas. Propose and contribute to continuous improvement activities.
Job Requirements / Qualifications
Well proven scripting and automation skills with strong knowledge in delivering and managing infrastructure as code.
Good interpersonal skills.
Demonstrated customer & delivery focus.
Ability to work effectively with team members and virtual teams from different locations and different cultural backgrounds.
Ability to work independently with low supervision and navigate ambiguity.
Strong problem-solving and decision-making skills.
Good oral and written communication skills in English.
Technology Skills
Extensive use of infrastructure automation tools, infrastructure as code, scripting languages, logging, monitoring and observability, infrastructure configuration, and applications.
Infrastructure as code: AWX, Ansible, Jenkins, Puppet, Chef, SaltStack or equivalents.
Coding and scripting: PHP, Python, YAML, shell, Perl and/or Ruby or equivalents.
Distributed version control and source code management tools: GIT, bitbucket, github, gitlab or equivalents.
Monitoring and observability (e.g. LogicMonitor, Nagios, Ganglia, ELK).
Experience working with Parallel Filesystem (e.g. Lustre, GPFS, BeeGFS).
Experience working with Object storage (NetApp StorageGrid, S3).
Background in Linux/Unix Server technologies (RedHat, CentOS, Ubuntu plus Satellite / Foreman).
Linux/Unix High-Performance Computing (HPC) Clusters configurations and workload managers (ie IBM LSF, SLURM).
Knowledge about defining Service Level Objectives and Service Level Indicators.
Nice to have technologies:
Storage Scale-out NAS (e.g. Isilon, NetApp)
Mellanox (Infiniband network) or equivalent
Container orchestration platform experience - (e.g. Kubernetes, Mesos)
Knowledge of parallel programming techniques and tuning, CUDA, MPI, or OpenMPI
Big Data frameworks (e.g. Hadoop, Mapreduce, Spark)
Programming skills in high level languages such as Java or C++ are a plus
Public Cloud technologies: AWS, Azure & GCP or equivalents
Education / Years of Experience
4-7 years of relevant work experience
or 2-5 years with Bachelor’s degree
or 1-3 years with Masters degree
At least 1 year experience of working in one or more multinational work environments (e.g. healthcare industry experience is a plus) as a systems or software Engineer.
Ability to work across multiple time zones, including on-call and occasionally travel.
Who we are
At Roche, more than 100,000 people across 100 countries are pushing back the frontiers of healthcare. Working together, we’ve become one of the world’s leading research-focused healthcare groups. Our success is built on innovation, curiosity and diversity.
Roche is an Equal Opportunity Employer.