Site Reliability Engineer

In this Site Reliability and Automation Engineer role, you will work closely with the Data Center, the entire Cloud development organization and vendors to support, maintain and operationally improve the cloud infrastructure.

Your focus will be on the following key responsibilities:

•Support the compliance and security integrity of the environment through your work

• Partner with other teams, functional managers and program managers to deliver mission-critical services to the market

• Support development of new and existing capabilities for our compute, storage and network services

• Work with Engineering to:

o Define operational requirements

o Automate operational requirements

o Participate in the full deployment pipeline

• Work with Support and Development to:

o Identify and resolve issues

o Discuss and plan integration requirements

Required Professional and Technical Expertise

• Minimum of 5 years' experience in hands-on production administration of large system environments, including virtual platforms.

• Experience in establishing, following, and improving operational procedures within a mission critical environment

• 5+ years of experience in data center infrastructure or relevant work experience

• 5+ years of experience in large-scale infrastructure design, engineering, and support

• 5+ years of experience in IT Change, Incident, Problem, Asset management

• 5+ years of infrastructure engineering with proven record for delivering high-quality, large-scale solutions. Experience designing architectures for scale and performance

• Must be efficient in writing, debugging and maintaining scripts (Bash and Python)

• Must be extremely comfortable using and navigating within a Linux environment

• Ability to do low level debugging and problem analysis by examining logs and running Unix commands

• 3-5 years of experience with configuration management systems (Ansible / Chef)

• Hands on knowledge of using Splunk or ELK

• Must have the ability to perform debugging and problem analysis by examining logs and running Unix commands

• Must have experience in dealing with bringing incidents to resolution and leading a group during the troubleshooting

• Working knowledge with Network and Storage technologies

• Working knowledge with ServiceNow, JIRA, Confluence, and GitHub

• Excellent written and verbal communication skills

• Comfortable operating in fast paced environment

This role is based on a shift pattern of Thursday - Monday 08:00 - 16:00

Required Skills

As per Job Description

Preferred Skills:

• 2+ years of experience with Kubernetes

• 2+ years of experience with GitHub, Perl and Python

• 2+ years of experience in virtualization environments such as AWS /Softlayer/Zen/VMWARE

Required Education:

BS or equivalent in computer science or electrical engineering or relevant experience

#LI-MB2

Michael Boyhan