Senior Observability / Monitoring Engineer
Role Overview
- We are looking for a Senior Observability / Monitoring Engineer to design, implement, and optimize observability solutions for large-scale enterprise platforms. This role will play a critical part in enabling proactive monitoring, faster incident detection, and improved system reliability across Salesforce and Microsoft Azure environments. The ideal candidate will have strong expertise in metrics, logs, traces, alerting strategies, and observability tooling, along with hands-on experience supporting production environments.
Key Responsibilities - Observability Engineering:
- Design and implement end-to-end observability frameworks across Salesforce and Azure platforms
- Establish unified monitoring across logs, metrics, and distributed tracing
- Define and standardize observability best practices, dashboards, and alerting strategies
- Enable proactive detection of issues through intelligent alerting and anomaly detection Monitoring & Tooling
- Implement and manage tools such as Azure Monitor, Application Insights, Splunk, Datadog, Grafana, Prometheus, or similar
- Build actionable dashboards for operations, SRE, and business stakeholders
- Optimize alert noise reduction and improve signal-to-noise ratio
- Continuously enhance monitoring coverage across applications and infrastructure Incident Support & Reliability
- Support incident management by providing deep insights using observability data
- Perform root cause analysis (RCA) leveraging logs, traces, and metrics
- Collaborate with SRE and engineering teams to improve system reliability and performance
- Contribute to post-incident reviews and continuous improvement initiatives Automation & Integration
- Automate monitoring setup and configuration using Infrastructure as Code (IaC)
- Integrate observability tools with CI/CD pipelines and DevOps workflows
- Develop scripts or tools to enhance monitoring capabilities and data collection Platform & Integration Support
- Monitor and optimize Salesforce applications, including integrations and APIs
- Support Azure-based services, ensuring visibility across compute, storage, and networking layers
- Ensure end-to-end observability across integrated systems and middleware Governance & Compliance
- Ensure observability practices align with security and compliance requirements (e.g., SOX)
- Maintain documentation, runbooks, and monitoring standards
- Support audits and governance reviews as required
Required Skills & Qualifications Technical Skills
- Strong experience in observability, monitoring, or SRE roles
- Hands-on experience with Azure (Azure Monitor, Application Insights)
- Experience with observability tools (Splunk, Datadog, Prometheus, Grafana, etc.)
- Strong understanding of logs, metrics, traces, and distributed systems
- Experience with APM tools and performance tuning
- Scripting skills (Python, PowerShell, Bash, or similar)
- Familiarity with CI/CD tools (Azure DevOps, Jenkins, GitHub Actions)
- Knowledge of Infrastructure as Code (Terraform, ARM, Bicep) Platform Knowledge
- Experience supporting Salesforce environments (monitoring integrations, APIs, performance)
- Understanding of cloud-native and microservices architectures Operational Excellence
- Experience in incident management and RCA
- Ability to analyze system performance and recommend improvements
- Strong troubleshooting and analytical skills Soft Skills
- Strong communication and collaboration skills
- Ability to work with cross-functional and global teams
- Proactive mindset with a focus on continuous improvement