About the team
The Platform Infrastructure Engineering team transforms complex infrastructure systems into simple, efficient, and reliable solutions, focusing on scalable operational methodologies to drive business impact and cost savings. Zillow Group Incident Management (ZGIM) drives best practices for change management, manages major incidents, and drives root cause analysis that improves product availability for Zillow customers so they can unlock life's next chapter.The team plays a pivotal role in the company's success by driving down incident detection and recovery times. This team works closely with software development engineers while supporting all users and brands. Your work with us will be highly visible throughout Zillow Group and have a significant impact on all parts of the business.
About the role
The Senior Incident Manager plays a key role in maintaining availability of Zillow services. Incident managers drive proactive readiness, change management, incident management, and root cause analysis. This requires working effectively in real-time with highly technical engineers and business leaders. Incident managers identify and build processes, demonstrate calm under pressure, and adaptively work with all types of stakeholders.
Areas of responsibility include:
Incident Management
Own live-site incident management during customer-impacting events, driving immediate engagement where needed and providing executive-facing updates
Lead cross-functional teams through detailed incident response in real-time
Participate in an on-call rotation to ensure 24/7 major incident response
Change Management
Develop processes and influence product engineering teams to lower incident occurrence, detection, and resolution time
Drive development to improve resilience through testing, deployment, observability, etc.
Monitor various metrics to ensure SLA compliance and drive improvement where needed
Root Cause Analysis
Drive analysis of major incidents through cross-functional teams to uncover root cause and contributing factors
Analyze trends to gain insight and drive improvements to products and processes
Lead problem review sessions and drive on-time completion of process steps
Process Improvement
Identify and initiate improvement in team processes, including developing standard operating procedures and giving and receiving cross-training
Translate technical concepts into business language to clarify issues and impact
Create reports and present data to drive understanding and improvement
This role has been categorized as a teleworker position. Teleworkers do not have a permanent corporate office workplace and, instead, work from a physical location of their choice which must be identified to the Company. Employees may live in any part of Mexico, but preferably in Mexico City, as we would encourage attendance for occasional in-office events.
In addition to a competitive base salary and benefits, this position is also eligible for equity awards based on factors such as experience, performance and location.
Who you are
You take initiative when you see an issue and are exhilarated by being in the middle of the action. You effectively communicate with both technical staff and executives under high pressure. You can take control of an urgent situation and coordinate multiple work streams without alienating anyone or missing a crucial input.
BS/BA degree in Computer Science, Information Systems, or related discipline, or a minimum of 5 years' related work experience
3+ years of experience leading major incident war rooms during live incidents
Proficiency driving cross-functional decisions in ambiguous situations
Proficiency triaging multiple incoming issues and addressing according to priority and severity
Sufficient technical background, ideally in networking, systems, and software development
Hands-on experience analyzing incidents, root causes, weaknesses, corrective actions, etc.
Proficiency guiding and influencing technical teams and leaders on incident processes
Proficiency communicating technical updates to non-technical stakeholders
Business acumen to understand business issues and align communications strategies to outcomes accordingly
Experience creating Tableau dashboards from operational data
ITILv3 Foundation certification a plus
Proficiency with Google and Microsoft Office documents and co-authoring features
Proficiency working from home and virtually with distributed teams
Self-starter with a high degree of initiative in scaling programs to large organizations
Get to know us
Zillow is reimagining real estate to make it easier to unlock life's next chapter.
As the most-visited real estate website in the United States, Zillow® and its affiliates help movers find and win their home through digital solutions, first class partners, and easier buying, selling, financing and renting experiences. Millions of people visit Zillow Group sites every month to start their home search, and now they can rely on Zillow to help make it easier to move. The work we do is helping people move from dreaming to transacting - and no matter what job you're in, you will play a critical role in making this vision a reality.
Our efforts to streamline the real estate transaction are supported by a deep-rooted culture of innovation, our passion to redefine the employee experience, and a fundamental commitment to Equity and Belonging. We're also setting the standard for work experiences of the future, where our employees are supported in doing their best work and living a flexible, well-balanced life. But don't just take our word for it. Read recent reviews on Glassdoor and recent recognition from multiple organizations, including: the 100 Best Compa.