Posted Aug 16

Nvidia is hiring a
Senior MLOps Engineer, Deep Learning

US, CA, Santa Clara • US, CA, Santa Clara • 2 Locations
Full time

We are now looking for a Senior MLOps Engineer, Deep Learning!

NVIDIA is seeking a motivated senior build and continuous integration (CI/CD) engineer for its Deep Learning (DL) Algorithms team. DL algorithm group at Nvidia is responsible for publishing and maintaining GPU optimized state-of-art popular and main stream DL algorithms (NLP, CV, ASR, Conversational AI, RecSys, TextToSpeech). The team aims to provide push button experience(training to deployment)and scientific results to diverse AI/DL algorithms with clarity and are stable and easy to reproduce. See:

  • https://github.com/NVIDIA/NeMo

  • https://github.com/NVIDIA/JAX-Toolbox

  • https://github.com/NVIDIA/DeepLearningExamples

Academic and commercial groups around the world are using GPUs to redefine AI and data analytics, and to power data centers. Come and help the team building the software which will be used globally. Building upon modern DevOps tools, your work will enable framework (JAX, PyTorch, Tensorflow, MxNet) software engineers and deep learning algorithm engineers to work efficiently with a wide variety of deep learning algorithms and software stack as they vigilantly seek out opportunities for performance optimization and continuously deliver high quality software. Does the idea of pushing the boundaries of state-of-the-art research and development excite you? Are you interested in getting exposure to the entire DL SW stack? Then come join our technically diverse team of DL algorithm engineers and performance optimization specialist to unlock unprecedented deep learning performance in every domain.

What you’ll be doing:
  • Architect and own the build-release continuous integration processes of our deep learning software components that are built, tested, and released on various DL frameworks (JAX, Tensorflow, PyTorch, MxNet).

  • Propose, implement, and deploy efficient and scalable DevOps solutions to allow our fast-growing team to release software more frequently while maintaining high-quality and top performance.

  • Work with industry standard tools (Kubernetes, Docker, Slurm, Ansible, Gitlab, Github Actions, Jenkins, Artifactory, Jira).

  • Assist with cluster operations and system administration (managing servers, team accounts, clusters).

  • Automate away recurring tasks (DL algorithm accuracy and performance regression detection, designing and developing new quality control measures, e.g., code analysis) while employing and advancing best practices.

  • Work closely with DL framework and Libraries (CUDA, cuDNN, cuBLAS) team and with other relevant teams within NVIDIA that provide software build, testing, and release related infrastructure.

What we need to see:
  • BS or higher degree in computer science (or equivalent experience) with at least 5+ years hands-on experience in infrastructure engineering, DevOps.

  • Strong system level programming in languages like Python, Perl, and shell scripting.

  • Strong understanding of build/release systems, CI/CD and experience with solutions like Gitlab, Github, Jenkins etc.

  • Experience with Linux system administration.

  • Proficient with containerization and cluster management technologies like Docker and Kubernetes.

  • Background in build tools, including Make, Cmake, and Visual Studio (msbuild).

  • Experience using or deploying software configuration management (SCM) solutions such as Gitlab, Perforce, etc.

  • Excellent troubleshooting and debugging skills.

  • A great teammate who can collaborate and influence in a dynamic environment.

  • Excellent interpersonal and written communication skills.

Ways to stand out from the crowd:
  • Previous experience with GPU accelerated systems.

  • Hands on experience with DL frameworks (JAX, Tensorflow, PyTorch, MxNet).

  • Cluster/cloud technologies, e.g.: SLURM, Lustre, k8s.

  • Experience with HPC hardware systems such as compute clusters and HPC software performance benchmarking on such systems.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!

The base salary range is $144,000 - $270,250. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Please mention that you found the job on ARVR OK. Thanks.