Manager AI System Infrastructure and MLOps Engineering

Company: Promote Project
Location: Redwood City
Posted on: November 13, 2024

Job Description:

Manager AI System Infrastructure and MLOps EngineeringLocationRedwood City, California, United StatesSalary30000 - 80000 a year (US Dollars)DescriptionThe TeamThe AI/ML team is funding and building one of the largest computing systems dedicated to nonprofit life science research in the world. This new effort will provide the scientific community with access to predictive models of healthy and diseased cells, which will lead to groundbreaking new discoveries that could help researchers cure, prevent, or manage all diseases by the end of this century.As a hands-on Manager of the AI System Infrastructure and MLOps Engineering team, you will be joining the AI/ML and Data Engineering team in CZI Central Tech, with the responsibility for the stability and scalable operations of our leading edge GPU Cloud Compute Cluster. This supports our AI Researchers in their development and training of state-of-the-art models in artificial intelligence and machine learning to solve important problems in the biomedical sciences aligned with CZI's mission, contributing to greater understanding of human cell function.The OpportunityAs the Engineering Manager of the AI Infrastructure and MLOps Engineering team, you will be responsible for a variety of MLOps and AI development projects that empower our AI Researchers and help to accelerate Biomedical research across the whole of the AI lifecycle. You will guide our AI Systems Infrastructure and MLOps efforts focused on our GPU Cloud Cluster operations, ensuring that our systems are highly utilized, performant, and stable. You will be working in collaboration with other members of our own AI Engineering team as well as the Science Initiative's AI Research team as they iterate and train their deep learning code, optimizing systems operations and in helping to troubleshoot problems encountered by jobs running on the cluster.What You'll Do

Help to build out the MLOPs and Systems Infrastructure Engineering team, growing the team to support the large scale capacity systems and AI training efforts we will be undertaking.
Drive our MLOps processes and System Infrastructure Engineering efforts in ensuring that our GPU Cloud computing systems are highly utilized and stable, and proactively guide our team in implementing the instrumentation and observability tooling integral to our AI Platform.
Own the on-call efforts for our GPU Cloud computing systems, building out the MLOps and Systems Infrastructure Engineering alerting and monitoring efforts for our leading edge Kubernetes based AI platform, including troubleshooting problems encountered on the GPU platform infrastructure and with jobs running on the cluster and computing systems.
Responsibility for a variety of AI/ML development infrastructure, instrumentation, and telemetry projects that empower our team in supporting our users across the AI/ML lifecycle, taking a key role in simplifying and optimizing the systems and processes that are integral to our GPU Cloud Cluster operations - in an MLOps meets SRE kind of hybrid operations model.
Mentoring and managing your team in fulfilling their roles to the best of their abilities, provide skill and career coaching to help the team members keep growing along their own career and life paths, and keep the team engaged in meaningful and interesting projects in service of our north star philanthropic mission.What You'll Bring
- Hands-on AI/ML Model Training Platform Operations experience in an environment with challenging data and systems platform challenges.
- MLOps experience working with medium to large scale GPU clusters in Kubernetes, HPC environments, or large scale Cloud based ML deployments (Kubernetes Preferred).
- BS, MS, or PhD degree in Computer Science or a related technical discipline or equivalent experience.
- 2+ years of experience managing MLOps teams.
- 7+ years of relevant coding and systems experience.
- 7+ years of systems Architecture and Design experience, with a broad range of experience across Data, AI/ML, Core Infrastructure, and Security Engineering.
- Strong understanding of scaling containerized applications on Kubernetes or Mesos, including solid understanding of AI/ML training with containers using secure AMIs and continuous deployment systems that integrate with Kubernetes or Mesos. (Kubernetes preferred).
- Proficiency with Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, and experience with On-Prem and Colocation Service hosting environments.
- Solid coding ability with a systems language such as Rust, C/C++, C#, Go, Java, or Scala.
- Extensive experience with a scripting language such as Python, PHP, or Ruby (Python Preferred).
- Working knowledge of Nvidia CUDA and AI/ML custom libraries.
- Knowledge of Linux systems optimization and administration.
- Understanding of Data Engineering, Data Governance, Data Infrastructure, and AI/ML execution platforms.
- PyTorch, Karas, or Tensorflow experience a strong nice to have.CompensationThe Redwood City, CA base pay range for this role is $214,000 - $321,000. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process. Pay ranges outside Redwood City are adjusted based on cost of labor in each respective geographical market. Your recruiter can share more about the specific pay range for your location during the hiring process.Benefits for the Whole YouWe're thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.
  - CZI provides a generous 100% match on employee 401(k) contributions to support planning for the future.
  - Annual funding for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
  - CZI Life of Service Gifts are awarded to employees to "live the mission" and support the causes closest to them.
  - Paid time off to volunteer at an organization of your choice.
  - Relocation support for employees who need assistance moving to the Bay Area.We believe that the strongest teams and best thinking are defined by the diversity of voices at the table. We are committed to fair treatment and equal access to opportunity for all CZI team members and to maintaining a workplace where everyone feels welcomed, respected, supported, and valued. Learn about our diversity, equity, and inclusion efforts.If you're interested in a role but your previous experience doesn't perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.
    #J-18808-Ljbffr

Keywords: Promote Project, Redwood City , Manager AI System Infrastructure and MLOps Engineering, Executive , Redwood City, California

Click here to apply!

Didn't find what you're looking for? Search again!

Let Redwood City recruiters find you. Post your resume for free!

Get Redwood City Executive jobs via email.

View more Redwood City Executive jobs

Other Executive Jobs

Senior Construction Manager
Description: Current job opportunities are posted here as they become available. MWH Constructors MWH , a global leader in heavy civil construction of water and wastewater facilities, is currently seeking a Senior (more...)
Company: MWH Constructors, Inc.
Location: San Francisco
Posted on: 11/16/2024

Director, Finance & Accounting
Description: Join us at Gradient, where our purpose is to revolutionize home comfort while championing environmental sustainability. Our mission is to combat the escalating challenge of climate change by redefining (more...)
Company: Tbwa Chiat/Day Inc
Location: San Francisco
Posted on: 11/16/2024

Program Manager, Expert - Flexible Location
Description: Requisition ID 158005Job Category: Project / Program ManagementJob Level: Individual ContributorBusiness Unit: Operations - OtherWork Type: HybridJob Location: Auburn ChicoDepartment OverviewPower Generation (more...)
Company: Pacific Gas and Electric Company
Location: Auburn
Posted on: 11/16/2024

Salary in Redwood City, California Area | More details for Redwood City, California Jobs |Salary

Housing Case Manager - Rising Up
Description: 3rd Street Youth Center Clinic is a community-based agency providing youth in the Bayview Hunters Point with medical and behavioral health services that encourage them to make decisions that support (more...)
Company: 3rdstyouth
Location: San Francisco
Posted on: 11/16/2024

TERRITORY SALES MANAGER OFF PREMISE - SACRAMENTO
Description: The Territory Sales Manager - Off Premise will be responsible for in account field level execution with excellence and provide channel expertise. The Territory Sales Manager will manage resources POS (more...)
Company: Mast-J--germeister US, Inc.
Location: Sacramento
Posted on: 11/16/2024

Principal Product Manager - AI
Description: About Us: br We love going to work and think you should too. Our team is dedicated to trust, customer obsession, agility, and striving to be better every day. These values serve as the foundation of (more...)
Company: LogicMonitor
Location: San Francisco
Posted on: 11/16/2024

Senior Director of Development, Neurosciences
Description: br br br br br Senior Director of Development, Neurosciences br br br br br br br br Development br br br br br Full Time br br br br br 80484BR br br (more...)
Company: University of California - San Francisco
Location: San Francisco
Posted on: 11/16/2024

Senior Operations Manager New San Francisco
Description: In a world where healthcare can feel like a maze of bureaucracy and frustration, Mochi Health offers a refreshing oasis. We're not just another telehealth clinic - we're on a mission to rewrite the script (more...)
Company: Tbwa Chiat/Day Inc
Location: San Francisco
Posted on: 11/16/2024

Travel Nurse RN - Manager - $3,642 per week
Description: Host Healthcare is seeking a travel nurse RN Manager for a travel nursing job in Santa Cruz, California.Job Description Requirements ul li Specialty: Manager li Discipline: RN li Start Date: (more...)
Company: Host Healthcare
Location: Santa Cruz
Posted on: 11/17/2024

11B Infantryman - Management Training
Description: Job Description br br The Infantry is the backbone of the Army. These Soldiers fill the literal boots on the ground who are responsible for taking or holding ground during any combat operation. You'll (more...)
Company: Army National Guard
Location: Sacramento
Posted on: 11/16/2024

Loading more jobs...

Manager AI System Infrastructure and MLOps Engineering

Didn't find what you're looking for? Search again!

Other Executive Jobs

Log In or Create An Account