Senior Site Reliability Engineer
The Site Reliability Engineer III will spend their time split evenly between working through IT and Engineering operational issues and development work focused on reducing toil experienced during the operational component. This feedback cycle is experienced through partnering with Engineering and IT teams to work through tickets and develop software. This embedded work style puts the SRE directly in the Engineering team’s sprint cycle using that team’s processes and tools. This embedded development work should be spent writing tooling around automation, instrumentation, scaling, and CI/CD but could also include working directly on the Engineering product if that effort leverages cloud or instrumentation APIs or increases reliability as defined by the Engineering team. An ideal candidate would be an Ops person who has demonstrated coding/scripting skills or a software developer who has demonstrated experience writing software that leverages cloud native APIs.
Approximately 50% of this position will be Development focused on application reliability, including product features, scaling, monitoring, automation and CI/CD. The other approximately 50% will be Ops related work in support of Engineering and IT teams, including issue resolution, on-call, post incident reviews and manual interventions.
Candidates will need a solid communications skills and understand both the development and operational processes.
The position will require a broad range of skills, including:
- GCP Cloud Administration
- Kubernetes Administration
- Engineering languages (Node.js, Java, C#, etc)
Certifications like VCP, RHCA, GCP Associate Cloud Engineer and CNCF CKA will be very beneficial.