IDC: Remote SRE Engineer USC,GC

Wednesday, February 21, 2024

Remote SRE Engineer USC,GC

I hope you are doing well.

I am hiring a consultant for one of my client's requirements. I've written the Job Description for the same below. If you find yourself comfortable with the requirement please reply back with your updated resume

Position: SRE Engineer

Location: Remote

Duration: 6+ Months

Visa: USC,GC

USC,GC ONLY

3 OPENINGS

Responsibilities:

These resources will be working on building and improving the disaster recovery (DR) capabilities of Client's Tier 1 applications. Common responsibilities will include:
Building, reviewing and maintaining application design and architecture documents.
Ensuring the DR capabilities are built into each system.
Working with development teams to implement and maintain the DR capabilities.
Participate in DR testing exercises and evaluate the results for continuous improvement.

Job details:

Helps lead projects that are focused on managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures, documentation, people and processes and continuous delivery(CI/CD) tools, processes, and designs.
Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents.
Understands and advocates for standardized and scalable software tools to ensure that systems operate without interruption at optimum performance and leads project teams through out the deployment process.
Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability.

Education and Experience:

A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science).
Requires 4 – 6 years of related experience.

Essential Functions:

Troubleshoots and resolves more complex problems with systems and services and initiates regular deployment of new versions of the systems and their subcomponents
Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility.
Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools
Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization
Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time
Leads lower level Engineers in stress, security, and performance testing
Resolves issues that come up through support escalation
Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise
Leads post incident reviews and documents findings for future informed decision making
Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward.
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them

What previous job titles or background work will in this role?

Site Reliability Engineer
Disaster Recovery Engineer
System Support Engineer
Application Architect
Cloud Systems Engineer

Required Skills/Experience:

AWS, Route 53, Lambda, Mongo DB, Kafka, Kubernetes
Load Balancing / Load Redirecting / Load Restricting strategies
Monitoring and Observability tools such as Prometheus, Grafana, Dynatrace, Splunk, Elk
Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science)

Required Skills/Experience:
1. AWS, Route 53, Lambda, Mongo DB, Kafka, Kubernetes
2. Load Balancing / Load Redirecting / Load Restricting strategies
3. Monitoring and Observability tools such as Prometheus, Grafana, Dynatrace, Splunk, Elk
Preferred Skills/ Experience:
1. Rancher, Axway API Gateway,

Kind Regards

Gaurav Pandey | Absolute IT | Recruitment Manager

116 • Village Blvd • Suite 200 • Princeton • New Jersey • 08540