Site Reliability Engineer
Graduate job San Jose, Costa Rica Design / Civil engineering / Industrial engineering
Job description
Job Summary
We are building a next-generation Site Reliability Engineering team. You'll provide operational support to clients' servers, applications, and network systems. Along with daily client infrastructure maintenance, you'll manage networks and servers for everything from small to large and complex apps. You'll have the opportunity to work and interact with multiple business areas in the company.
Key Responsibilities
- Operate, maintain and administer information technology solutions that contribute to the operational efficiency, availability and visibility of customer infrastructure.
- Plan and execute maintenances, scheduled tasks and follow procedural documentation.
- Observe and provide feedback on the current state of the client's infrastructure, and identify opportunities to improve security and resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
- Contribute to, improve and maintain team documentation about client systems and infrastructure, procedures, policies and schedules.
Work proactively across the company to ensure Intel's managed hosting infrastructure is never a constraint for its customers or any aspect of the company.
- Gather and document information about client environments through audit activities, and analyze the information to identify opportunities for improvement and application of best practices.
- Work collaboratively with team mates to contribute to the continuous improvement of our working culture.
- Must be able to participate in 12x7 incident response with occasional 24x7 coverage.
- Work through follow the sun and disaster recovery plans with other groups and departments.
- Ensure that all projects are delivered on-time, within scope and within budget.
Ability to manage time, resources and cost effectively thru the execution of projects; report progress clearly and accurately at all time.
Desired profile
Required Skills
- BS or MS in Computer Science, Engineering, or a related technical discipline, or equivalent experience.
- Essential systems hardware and network troubleshooting experience.
Scripting and automation of administrative tasks using powershell, bash, python, etc.
- System and application error investigation, troubleshooting of access/availability issues.
- Awareness of DevOps tools, processes, and culture.
- Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud.
- Common Windows & Linux installation, configuration and performance tuning.
- General SQL Server, Oracle, PostgreSQL and/or MySQL database knowledge to perform day-to-day tasks (backup & recovery, security and object change management, instance installation, configuration, patching, upgrading & monitoring).
- Fundamental TCP/IP networking, NIC bonding and network services configuration (DNS, NTP, DHCP, SMTP, etc.)
- Administration of web servers and supporting technologies, including network load balancers.