DevOps/Site Reliability Engineer

Insight Global • New York, NY • 41 days ago

Job Description

The DevOps & Site Reliability Engineer position is responsible for implementing and maintaining the continuous integration / continuous deployment (CI/CD) pipeline that meets Development, IT Security, and Program Governance requirements. This role also leads the design and automation of infrastructure using tools like Terraform and Ansible, and ensures system reliability through proactive monitoring, observability, and incident response practices.

In addition, the role is responsible for implementing and supporting infrastructure and processes that enable comprehensive monitoring and observability for Servers, Applications, and Network using tools such as Prometheus, Elastic Stack, and Grafana. The engineer will also work with messaging and data systems including Redis, RabbitMQ, Kafka, and CouchDB, and contribute to the development of internal developer portals to enhance engineering productivity and platform usability.

Essential Duties and Responsibilities:

Infrastructure & Reliability Engineering (30%)

 • Collaborate with leadership, technical leads, and teams to design and maintain infrastructure that maximizes availability and ensures configuration consistency across environments.

 • Implement and maintain SRE best practices such as SLIs, SLOs, and error budgets.

CI/CD & Developer Enablement (25%)

 • Maintain and enhance CI/CD pipelines and tooling.

 • Analyze onboarding requests and automate developer workflows.

 • Contribute to the development and maintenance of internal developer portals to streamline access to tools, documentation, and services.

Automation & Transformation Projects (25%)

 • Lead design and development efforts that drive the transformation to a DevOps and SRE culture.

 • Automate infrastructure provisioning, configuration management, and application deployment using tools like Terraform and Ansible.

Monitoring & Observability (10%)

 • Maintain and improve observability tools and practices using Prometheus, Elastic Stack, Grafana, and other monitoring solutions.

 • Ensure proactive alerting and actionable insights into system performance and reliability.

Support & Collaboration (10%)

 • Assist technical staff in resolving complex issues.

 • Participate in incident response and postmortem processes to improve system resilience.

 • Understand and actively participate in Environmental, Health & Safety responsibilities by following established UO policy, procedures, training, and team member involvement activities.

 • Perform other duties as assigned.

$47/hr - $75/hr - Exact compensation may vary based on several factors, including skills, experience, and education.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Skills and Requirements

4–6 years of experience in software engineering, DevOps, or SRE roles.

Strong understanding of Agile development methodologies and DevOps/SRE principles.

Specific Skills & Abilities:

 • Extensive experience with CI/CD tools and pipelines (e.g., GitLab, GitHub Actions, Jenkins).

 • Deep knowledge of containerization and orchestration (Kubernetes, Docker, Rancher).

 • Proficient in scripting and automation (Ansible, Python, PowerShell).

 • Experience with infrastructure as code tools such as Terraform.

 • Experience with observability tools: Prometheus, Elastic Stack, Grafana, Elastic APM.

 • Familiarity with message brokers and data streaming platforms: RabbitMQ, Kafka.

 • Experience with caching and NoSQL databases: Redis, CouchDB.

 • Proven ability to debug and troubleshoot distributed systems.

 • Experience with internal developer portals and platform engineering concepts.

 • Strong analytical, planning, and organizational skills.

 • Excellent communication and collaboration skills across technical and non-technical teams.

 • Experience with Linux systems; Windows/IIS experience is a plus.

 • Familiarity with tools such as Atlassian Suite (Jira, Confluence), ServiceNow, SolarWinds, SQL.

 • Agile certifications (e.g., Scrum Master) are a plus.

 • null

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to HR@insightglobal.com.

Related jobs in New York, NY