Key Responsibilities:
- Support application systems engineering within a hybrid cloud infrastructure.
- Collaborate with development teams to improve production insights, automation, and system resilience.
- Use observability tools to monitor system health and performance (e.g., Splunk, Datadog).
- Apply ITSM practices including Incident, Change, and Problem Management using tools like ServiceNow.
- Automate infrastructure using Python, Shell scripting, PowerShell, and tools like Ansible, Terraform, and Rundeck.
- Build and maintain resilient systems in cloud environments (Azure or AWS).
- Work with container orchestration platforms, preferably Kubernetes.
- Participate in an on-call rotation and respond to production issues efficiently.
Required Skills and Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s preferred).
- Experience in systems engineering, SRE, or DevOps environments.
- Hands-on experience with monitoring and observability tools (Splunk, Datadog, Grafana, etc.).
- Proficiency in cloud development (Azure or AWS), infrastructure as code, and CI/CD pipelines (Jenkins, uDeploy).
- Experience with container orchestration (preferably Kubernetes).
- Strong scripting skills in Python, Shell, or PowerShell.
- Familiarity with ITSM tools and processes (ServiceNow, Incident/Change/Problem Management).
- Strong communication skills and the ability to work cross-functionally.
Preferred Qualifications:
- Experience in both Linux and Windows environments.
- Cloud migration experience.
- Demonstrated ability to improve system scalability and resiliency.
- Quick learner and collaborative team player with a systems-thinking mindset.