Job Location : Tel Aviv, Israel & Pune, India
We are international Multi-Cloud experts, utilizing the power of the cloud for smart digital transformation. With 5 sites over 4 continents around the globe, +450 experts, + 1000 customers, and +30 years of proven experience, our mission is to deliver the best Multi-Cloud service to our customers, accelerate their business and help them grow. As tech-savvies, To help our customers stay on top of their game, our teams are constantly developing new strategies and tools that will help them improve cloud performance, spend, visibility, control, and automation. Our cloud experts will make any digital transformation a quick, smart, and easy process.
About the Role:
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that the cloud vendors’ services have reliability, uptime appropriate to users’ needs, and a fast rate of improvement. Additionally, SRE’s will keep an ever-watchful eye on the capacity and performance of our systems. On the SRE team, you’ll have the opportunity to manage our complex set of customer production environments while using your expertise in coding, analysis, and large-scale system design. SRE’s culture of diversity, intellectual curiosity, problem-solving, and openness is key to its success.
- Experience in cloud environments AWS/GCP is a must.
- Experience with a monitoring system like Stackdriver, Cloudwatch (Advantage – Uptime, Pingdom, Datadog, Splunk, Grafana).
- Strong Exp of service automation using shell scripting tools and Python (Advantage – PowerShell).
- Experience handling critical production incidents.
- Experience with Linux system administration.
- Proven technical troubleshooting and performance tuning experience.
- Strong written and oral communication skills required.
- Ability to contribute to multiple projects/demands simultaneously
- Experience with DNS, debugging OS, and network issues.
- Experience with container orchestration such as Kubernetes is an advantage.
- Responsibility for availability, performance, and security of the client’s production environments on AWS or GCP.
- Analyse complex system behaviour, performance, and application issues.
- Apply modern engineering best practices to drive down operational overhead through automation and system design.
- Promote security excellence across a broad set of internal and external customers.
- Serve as the Tier 1 point for support responsible for troubleshooting.
- Demonstrate complex troubleshooting skills, deep knowledge of the services running on the infrastructure, and work with engineers and vendors to resolve issues.