
Senior Site Reliability Engineer - Northwood Space
View Company Profile- Job Title
- Senior Site Reliability Engineer
- Job Location
- Torrance, CA
- Job Description
Role:
Northwood is looking for a Senior Site Reliability Engineer to architect and lead the monitoring and reliability systems that keep satellites connected to Earth. As we rapidly scale our ground station network across multiple continents, you'll design and build the observability infrastructure that ensures our space communications systems operate 24/7 for customers ranging from commercial satellite operators to national security missions.
This is a high-impact leadership role where you'll architect global-scale reliability platforms while mentoring junior engineers and establishing SRE practices across the organization. You'll work directly with our founding engineering team and department heads to define the monitoring, alerting, and deployment strategies that will scale with us from startup to enterprise. If you're excited about space technology and want to architect infrastructure that directly supports mission-critical satellite operations while building and leading technical teams, this role offers that opportunity.
Responsibilities:
Architect and maintain enterprise observability stack (Grafana, Prometheus, Loki, Vector, VictoriaMetrics) monitoring ground stations, satellite communications, and multi-region AWS infrastructure
Design SRE practices, error budgets, and SLO/SLI frameworks for mission-critical satellite systems with 99.9%+ uptime requirements
Build advanced AWS infrastructure with Terraform, implementing multi-region reliability, automated scaling, and disaster recovery for ground station operations
Lead CI/CD pipeline architecture using GitLab and ArgoCD with advanced deployment strategies for mission-critical software releases
Mentor junior engineers and establish reliability standards across the growing engineering organization
Design comprehensive Kubernetes deployments with Helm, focusing on high availability and zero-downtime operations
Lead incident response, conduct post-mortems, and drive systematic reliability improvements
Basic Qualifications
5-8 years of production infrastructure and SRE experience with demonstrated leadership in reliability improvements and team mentorship
Expert-level experience with Kubernetes, Docker, and container orchestration in large-scale production environments
Strong background in infrastructure as code (Terraform) and advanced CI/CD practices with experience mentoring others on these technologies
Advanced AWS experience including multi-region architectures, networking, security, and cost optimization, with demonstrated ability to architect complex cloud solutions
Proven track record of leading technical projects from conception to production in fast-moving, high-growth environments
Deep understanding of SRE principles, error budgets, SLOs/SLIs, and experience implementing reliability frameworks across engineering organizations
Preferred Qualifications
Production experience architecting and scaling observability tools (Vector, Loki, Grafana, Prometheus, VictoriaMetrics) in high-throughput environments
Advanced experience with HashiCorp Vault, Okta, and enterprise identity/secrets management systems including policy design and implementation
Previous experience scaling infrastructure and leading technical teams at high-growth companies (startup to 500+ employees)
AWS Professional certification or equivalent demonstrated expertise with advanced cloud networking, security, and compliance frameworks
Strong Linux system administration and networking expertise with experience troubleshooting complex distributed systems
Background in aerospace, telecommunications, defense contracting, or other mission-critical, highly regulated industries
Experience with ITAR, NIST 800-171, or other defense/aerospace compliance requirements
#LI-DNI
Everything You Need, One Platform.
From job listings to startups, investors to funding rounds, and everything in between, Employbl puts the power in your hands. Why wait?
Start your free trial today!Stay Ahead of the Curve
Sign up for our newsletter to stay informed about the latest startups and trends in the tech market. Let Employbl be your guide to success.
Northwood Space Company Size
Between 20 - 100 employees
Northwood Space Founded Year
2023
Northwood Space Total Amount Raised
$36,400,000
Northwood Space Funding Rounds
View funding detailsSeries A
$30,000,000 USD
Seed
$6,300,000 USD
Grant
$100,000 USD