Job Description

About Hive

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations. The company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more.

Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!

DevOps and Systems Team

Our unique machine learning needs led us to open our own data centers, with an emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.

Responsibilities

Automate manual operational processes

Improve workflows of developer, data, and machine learning teams

Manage secure integration and deployment tooling

Create, maintain, monitor, and audit secure infrastructure

Manage a diverse array of technology platforms, following best practices and procedures

Participate in on-call rotation and root cause analysis

Maintain awareness of industry best practices for data maintenance handling as it relates to your role

Adhere to policies, guidelines and procedures pertaining to the protection of information assets

Report actual or suspected security and/or policy violations/breaches to an appropriate authority

Requirements

Minimum 3 - 5 years of previous experience in development, operations, IT, or a related field

Comfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environment

Able to debug, optimize, and automate routine tasks

Able to multitask, prioritize, and manage time efficiently independently

Able to physically lift equipment at least 30 pounds

Can communicate effectively across teams and management levels

Degree in computer science, or similar, is an added plus!

Technology Stack

Operating Systems - Linux/Debian Family/Ubuntu

Configuration Management - Chef

Containerization - Docker

Container Orchestrators - Mesosphere/Kubernetes

Scripting Languages - Python/Ruby/Node/Bash

CI/CD Tools - Jenkins

Network hardware - Arista/Cisco/Fortinet

Hardware - HP/SuperMicro

Storage - Ceph, S3

Database - Scylla, Postgres, Pivotal GreenPlum

Message Brokers: RabbitMQ

Logging/Search - ELK Stack

AWS: VPC/EC2/IAM/S3

Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems, RAID, distributed file systems, NFS / iSCSI / CIFS

Senior Site Reliability Engineer - Hive

Everything You Need, One Platform.

Stay Ahead of the Curve

Hive Headquarters Location

Hive Company Size

Hive Founded Year

Hive Total Amount Raised

Hive Funding Rounds

Hive's Industries

💸 Investors

💻 Job Types

📚 Popular Blog Posts