About Hive
Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations. The company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more.
Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!
DevOps and Systems Team
Our unique machine learning needs led us to open our own data centers, with an emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.
Responsibilities
Automate manual operational processesImprove workflows of developer, data, and machine learning teamsManage secure integration and deployment toolingCreate, maintain, monitor, and audit secure infrastructureManage a diverse array of technology platforms, following best practices and proceduresParticipate in on-call rotation and root cause analysisMaintain awareness of industry best practices for data maintenance handling as it relates to your roleAdhere to policies, guidelines and procedures pertaining to the protection of information assetsReport actual or suspected security and/or policy violations/breaches to an appropriate authority
Requirements
Minimum 3 - 5 years of previous experience in development, operations, IT, or a related fieldComfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environmentAble to debug, optimize, and automate routine tasksAble to multitask, prioritize, and manage time efficiently independentlyAble to physically lift equipment at least 30 poundsCan communicate effectively across teams and management levelsDegree in computer science, or similar, is an added plus!
Technology Stack
Operating Systems - Linux/Debian Family/UbuntuConfiguration Management - ChefContainerization - DockerContainer Orchestrators - Mesosphere/KubernetesScripting Languages - Python/Ruby/Node/BashCI/CD Tools - JenkinsNetwork hardware - Arista/Cisco/FortinetHardware - HP/SuperMicroStorage - Ceph, S3Database - Scylla, Postgres, Pivotal GreenPlumMessage Brokers: RabbitMQLogging/Search - ELK StackAWS: VPC/EC2/IAM/S3Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems, RAID, distributed file systems, NFS / iSCSI / CIFS