Sr. Site Reliability Engineer – Process Automation

Toast • Remote • 12 days ago

Toast is driven by building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love.

The Sr. Site Reliability Engineer, Process Automation role at Toast fits within the Site Reliability Engineering team responsible for overseeing the Incident and Change Management processes at Toast.

About this roll* (Responsibilities) 

As a Sr. Site Reliability Engineer, Process Automation, you will provide automation for incident and change management processes, which improve release consistency and enable faster incident response. You will help maintain and improve key organizational processes for Incident and Change management, which prevent customer impacts through change control, rapid detection, response, root cause analysis, and continuous learning from issues.

You will:

  • Drive and lead optimizations to existing processes, identify areas for improvement, and implement automated solutions to enhance efficiency and reliability of Toast systems. 
  • Utilize, configure, and support tools such as JIRA, FireHydrant, and Backstage for tracking events, incidents, and changes, and maintain the Service Catalog
  • Enable low-risk, compliant releases with rapid rollback capability to maintain platform reliability
  • Implement automation for risk mitigation strategies to minimize the impact of changes and releases on Toast customers
  • Collaborate closely with leadership, 3rd party vendors, and relevant stakeholders to drive work to completion

Do you have the right ingredients*? (Requirements)

  • Industry experience with 3-7 years engineering experience with a focus on SRE 
  • Bachelor’s Degree in Computer Science, engineering, or related field
  • Working knowledge of complex cloud environments (AWS, GCP, Azure, etc.)
  • Experience scripting automation (Python, Go, etc)
  • Experience with Infrastructure as code (Terraform, etc)
  • Experience driving and leading projects
  • Experience participating in and leading Incident Response  and Blameless Retrospectives/post-mortems
  • Strong written and verbal communication skills
  • Strong problem-solving skills and the ability to think strategically and analytically
  • Experience working with a diverse global team across multiple regions and time zones
  • Working knowledge of various best practice frameworks, including ITIL, ITSM, Agile/scrum, change management, etc a plus
  • Experience with Incident and Change processes and tools (JIRA, OpsGenie, FireHydrant, DX, etc) a plus

AI at Toast

At Toast we’re Hungry to Build and Learn. We believe learning new AI tools empowers us to build for our customers faster, more independently, and with higher quality. We provide these tools across all disciplines, from Engineering and Product to Sales and Support, and are inspired by how our Toasters are already driving real value with them. The people who thrive here are those who embrace changes that let us build more for our customers; it’s a core part of our culture.

Our Spread* of Total Rewards
We strive to provide competitive compensation and benefits programs that help to attract, retain, and motivate the best and brightest people in our industry. Our total rewards package goes beyond great earnings potential and provides the means to a healthy lifestyle with the flexibility to meet Toasters’ changing needs. Learn more about our benefits at https://careers.toasttab.com/toast-benefits.

*Bread puns encouraged but not required

The base salary range for this role is listed below. The starting salary will be determined based on skills and experience. In addition to base salary, our total rewards components include cash compensation (overtime, bonus/commissions, if eligible), benefits, and equity (if eligible).

Related jobs in Remote