Nvidia is hiring a
Senior DevOps Engineer
NVIDIA is looking for a world class engineer to join its multifaceted and fast-paced Infrastructure, Planning and Processes organization where you will be working as a Senior DevOps and SRE Engineer. The position will be part of a fast-paced crew that develops and maintains sophisticated build & test environments for a multitude of hardware platforms both NVIDIA GPUs and Tegra Processors along with various operating systems (Windows/Linux). The team works with various other business units within NVIDIA Software such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence, Robotics and Driverless Cars to cater to their infrastructure & system's needs.
What you'll be doing:
- Kubernetes System Administration in a large-scale DevOps CI/CD environment. Designing and implementing clusters, cluster segmentation, internal/external networking for 4+ CI/CD deployment environments; dev, test, staging, production.
- Implementation of the Kubernetes architectures for configuration, hardening, networking, sizing, scaling etc. to support a CI/CD pipeline for NVIDIA products.
- Configuring Kubernetes auto provisioning, and auto scaling of CI/CD job/build agents/runners/nodes.
- Implementing high availability clusters and disaster recovery solutions
- Large scale pod/container deployments across multiple Kubernetes clusters to support CI/CD pipelines for NVIDIA products.
- Design and implement monitoring solutions to gain more insight into applications and system health. Implement critical metric using various analytics methods and dashboards.
- Craft and develop tools needed for automating workflows. Reuse AI techniques to extract useful signals about machines and jobs from the data generated.
- Take part in prototyping, crafting, and developing cloud infrastructure for Nvidia.
- Participating in on-call support and critical issue coverage as a SRE engineer.
What we need to see:
- Strong background with Gitlab, Jenkins and/or other CI/CD systems.
- Proficient with Kubernetes administration, dockers & virtualization. Knowledge of standard methodologies related to security.
- Proficient with data analytics/visualization & monitoring tools like Kibana, Grafana, Splunk, Zabbix, Prometheus and/or similar systems.
- Solid programming background in python and/or similar scripting languages.
- Experience of maintaining cloud infrastructure and highly available production environment.
- Strong background in dockers, containerization and managing large scale container/pod deployments for Kubernetes clusters.
- Excellent debugging, problem solving and analytical skills.
- Strong understanding of architectural requirements and development processes involved in building reliable, robust, scalable data products and pipelines.
- Experience in Databases both SQL (MySQL) and NoSQL (MongoDB, AstraDB).
- Proficient with configuration management tools like Ansible, Chef, Puppet and source code management & binary repository systems like GitLab, GitHub, Artifactory etc.
- Demonstrable experience working in large scale enterprise production systems.
- 8+ years of proven experience.
- Bachelor’s or Master’s degree in computer science, Software Engineering, or equivalent experience.
Ways to stand out from the crowd:
- Solid understanding of containerization and microservices architecture. Certified Kubernetes Administrator (CKA), Certified Kubernetes Security Specialist (CKS) & Certified Kubernetes Application Developer (CKAD) preferred.
- Knowledge of Java based applications is good to have.
- Thrives in a multi-tasking environment with constantly evolving priorities.
- Ability to analyze complex problems into simple sub problems and then reuse available solutions to implement most of those. Ability to design simple systems that can work efficiently without needing much support.
- Prior experience with large scale operations team.
- Outstanding interpersonal skills and communication with all levels of management.
Please mention that you found the job on ARVR OK. Thanks.