Site Reability Engineering

DevOps & Site Reliability Jobs

North Carolina Contract / 6 months Negotiable

We are seeking a highly motivated temporary worker with experience in Site Reliability Engineering. As an SRE, you will be reporting to the Senior Manager of Engineering and will be responsible for ensuring the reliability of our ecosystem (both Native and Web), contributing to overall change, incident and problem management, and partnering with cross-functional teams to drive continuous improvement. This is an excellent opportunity for a technical contributor to evolve our technology through automation, reliable architecture and help increase velocity by collaborating across engineering to facilitate adoption of best practices.

Responsibilities:

● Contribute to overall change, incident and problem management in our environment with a focus on troubleshooting and fast restoration of our essential services and preventing future outages.

● Participate in a once-a-month 24×7 on-call rotation and take leadership of severe incidents to help minimize impact.

● Assist engineering teams by conducting truly blameless post mortems with focused action items to drive continuous improvements.

● Provide insights on trends of issues affecting reliability and partner in cross functional projects to provide scalable solutions.

● Review and advise on high-risk platform changes to minimize impact to the site and maximize success for stakeholders.

● Work within a large distributed system based on Cloud Native services.

● Maintain an automation-centric vision and incorporate SRE methodologies to increase reliability and decrease toil.

● Create operating standards to help drive reliability.

Requirements:

● 5+ years of experience with Site Reliability Engineering with a focus on Infrastructure, Platform, and Application (Cloud, Containerization, Container orchestration, Network, Application Reliability, Database Architecture) and an understanding of full stack and SDLC practices (Software Development Life Cycle) in DevOps or continuous release environment.

● Experience in running critical incidents in a global or company-wide context, engaging with executives and senior leadership, and leading root cause analysis sessions.

● Experience running and monitoring applications at scale, using metrics and tracing tools like, New Relic, Data Dog, Stackdriver, Zipkin, Prometheus, etc.

● Professional experience with Python, Go, or similar programming languages.

● Familiarity with SRE methodologies; passionate about solving operational challenges by using automation and software.

● Ability to communicate effectively vertically and horizontally within the organization through demonstrating written and verbal communication skills.

Preferred Qualifications:

● Ability to drive troubleshooting through a pragmatic and collaborative approach.

● Can construct clear and concise insights from data to promote and champion measurable improvements.

● Experience working with Cloud Native services in a Public Cloud, e.g. Google Cloud Platform, AWS, Azure.

Salt is acting as an Employment Business in relation to this vacancy.

Job Information

Job Reference: JO-2412-349079
Salary: Negotiable
Salary per: zero
Job Duration: 6 months
Job Start Date: 17/03/2025
Job Industries: DevOps & Site Reliability Jobs
Job Locations: North Carolina
Job Types: Contract

Here are some related jobs

×
US

Upload your CV

Upload your resume to our database.

  • Max. file size: 49 MB.
  • Hidden
  • This field is for validation purposes and should be left unchanged.
Site Reability Engineering

Please let us know where you are, or where you would like to be in the world so we can point you in the right direction.

Contact us

  • Click here to find out more about Salt's Privacy Policy
  • This field is for validation purposes and should be left unchanged.