Director, Cloud Infrastructure & Operations

Summary: You will be leading a driven and growing team focused on the hosting operations of our AWS and Microsoft Azure cloud offerings, as well as the SaaS systems that internal employees use. The ideal candidate is:

  • An expert in 24/7 operations with high performing and scaling systems that meet a high degree of uptime.
  • An expert in all facets of Cloud hosting operations with the ability to effectively communicate with customers and internal stakeholders.
  • A leader in continuous integration and continuous deployment (CI/CD) with automated deployments in an AgileSDLC.

You’ll be empowered and encouraged to bring forth new ideas that further empower the team while showcasing your passion for emerging technologies and best practices.

Responsibilities:

  • Manage operations plans, staffing, budget, and execution
  • Work with GRC Oversight Committee to address security risks and endpoint security
  • Manage IT Infrastructure that is well integrated with Microsoft Azure AD
  • Generate and provide recommendations on optimizing usage of Cloud services
  • Ensure compliance with security best practices and continuously monitor for potential vulnerabilities
  • Establish compliance controls through analysis of current systems
  • Build out and/or automate required SOPs
  • Participate in the Cloud Well Architected Framework to support overall operations initiatives
  • Optimize operations costs across vendors and service providers
  • Develop automated reporting that enables teams to leverage best practices for running efficient Cloud solutions
  • Analyze system performance data to identify system trade-offs and proactively develop a reasonable and pragmatic IT infrastructure roadmap
  • Analyze operational metrics and data to identify trends and potential problems
  • Adhere to established customer SLAs
  • Partner with Engineering, Client Services on CI/CD and automated deployment and event management
  • Identify key procedures that can be automated and either automate them or work with DevOps engineering team to develop automation

Required Knowledge or Skills:

  • Deep understanding of the key concepts and practices of Cloud observability, coupled with experience implementing robust systems that leverage metrics, logs, and traces to provide holistic state of the Cloud operations.
  • Deep understanding of how to apply best practices around monitoring, alerting, logging and have implementation experience with one or more (Azure Monitor, CloudWatch, AppInsight, Log Analytics, Splunk, Dynatrace, SolarWinds, etc…).
  • Knowledge of monitoring systems for infrastructure monitoring as well as application performance monitoring including SLAs/KPIs and reporting approaches for the multi-cloud platforms.
  • Partner with Engineering team to design key concepts and practices of observability, coupled with experience implementing robust systems that leverage metrics, logs, and traces to provide understanding of system state. Advocate for that strategy with engineers, managers, and executives.
  • Knowledge of corporate IT, data centers, ticketing system implementations, monitoring software implementation, troubleshooting, and continuous improvement approaches.
  • Expert managing an enterprise with Microsoft O365, Azure AD, and SSO integration with 3rd party SaaS applications.
  • Serverless computing experience with containers (AKS/EKS) and VM based workloads along with a solid understanding of the trade-offs of different serverless implementations emerging in public Cloud.
  • Experience with and enthusiasm for operating in an agile DevSecOps oriented organization and culture.
  • Plan and execute Disaster Recovery (DR) and Failover simulations to demonstrate adherence to SLAs.
  • Skill and knowledge in ITIL processes related to Incident Management, Service Requests, Event Management, Access Management, Change Management, Knowledge Management and Escalated Incident Management.
  • Knowledge of Cloud Monitoring Platforms that simplify financial management, help streamline operations, and strengthen security & compliance (eg. CloudHealth, Apptio, CloudCheckr, etc).
  • A technical business acumen that ensures the organization is operating efficiently and effectively in a hybrid environment, including the ability to monitor costs and engage teams for cost containment and reduction projects.

Qualifications:

  • 2+ years of experience managing 24/7 production operations for a high-volume, business-critical Cloud service.
  • 2+ years of experience with Azure and/or AWS.
  • 2+ years’ experience working with a Managed Service Provider and managing IT vendor relationships.
  • 2+ years of transformational experience running Cloud at scale.
  • 2+ years of management experience.
  • Scripting knowledge using Python, Perl, PowerShell, JavaScript, or similar scripting languages.
  • AWS SysOps Administrator (associate level) or DevOps Engineer (professional level) certification a plus but not required

Preferred not Required:

  • ITIL Certification a plus
  • Experience with hosting government solutions (eg. AWS GovCloud) a plus
  • Experience with Terraform a plus
  • Experience with noSQL databases a plus
  • Experience with healthcare information technology a plus

Location: Remote

Apply with resume to: Careers@DiameterHealth.com

® Diameter Health is a trademark registered in the US Patent and Trademark Office.