Posted Aug 16

Nvidia is hiring a
Senior Distributed Systems Engineer

US, CA, Santa Clara • US, TX, Austin • US, WA, Redmond • US, OR, Remote • 4 Locations • 3 Locations
Full time

We are seeking a highly skilled Distributed Systems Engineer to join our Omniverse Infrastructure team. You will play a key role in designing, building, and optimizing large-scale distributed systems and infrastructure for the Omniverse Cloud. This is an extraordinary opportunity for a highly motivated and dedicated engineer who has in-depth understanding of distributed storage, high-performance networking, compute systems, and distributed system architecture.

NVIDIA Omniverse™ Cloud is a platform-as-a-service (PaaS) that provides developers and enterprises a full-stack cloud environment to design, develop, and deploy industrial Omniverse applications. The Omniverse Infrastructure organization develops hardware and software systems to power the Omniverse Cloud.

What you will be doing:

  • Architect, design, build, and optimize distributed systems.

  • Drive end-to-end Omniverse platform optimization from a hardware level to the application and service levels.

  • Develop infrastructure and microservices to support Omniverse users and developers in the deployment of a wide range of workloads.

  • Address challenges related to compute, networking, and storage resource utilization in a heterogeneous computing environment.

  • Collaborate with multiple Omniverse product teams to understand customer storage and compute requirements and build supporting infrastructure.

  • Collaborate across org boundaries with a diverse set of engineers.

  • Adapt and/or develop performance modeling and analysis tools to identify and optimize performance bottlenecks in Omniverse workloads and drive future system designs.

  • Ability to multitask effectively in a dynamic environment.

What we need to see:

  • 10+ years of hands-on software engineering experience.

  • 10+ years of experience building large-scale distributed, fault-tolerant systems and services.

  • Strong systems programming skills, including multi-threading, concurrency, caching, and batching.

  • Proficiency in C, C++, and Python.

  • Experience with cloud infrastructure platforms like AWS, Azure, and Google Cloud.

  • Masters or PhD in Computer Science or a related field (or equivalent experience).

  • Solid technical foundation and a deep understanding of cloud technologies, distributed systems, and microservices architecture.

  • Excellent interpersonal skills and ability to work successfully with multi-functional teams, principles, and architects across organizational boundaries and geographies.

  • Understanding of virtualization and containerization technologies like Docker, Kubernetes, VMware, KVM, etc.

Ways to stand out from the crowd:

  • Hands-on experience in performance optimization and benchmarking on large-scale distributed systems.

  • Experience in developing large-scale distributed applications and services on supercomputing and/or cloud environments.

  • Experience with NVIDIA GPUs, HPC storage, networking, and cloud computing.

  • In-depth understanding of storage systems, Linux file systems, and RDMA networking.

  • Share references to your code contributions.

The base salary range is $216,000 - $414,000. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Please mention that you found the job on ARVR OK. Thanks.