DGX Cloud Performance Engineer, NVIDIA

March 5, 2026

Apply for this job

Email *
Executive Name *

Job Description

NVIDIA requires a DGX Cloud Performance Engineer who will conduct performance testing and benchmark evaluation while optimizing large-scale AI systems on DGX Cloud. The position requires assessment of complete system performance together with hardware and software design collaboration and partnership development with cloud service providers to construct advanced AI infrastructure that operates at high scalability and reliability and efficiency levels.

Date Posted: February 2026

Expiration Date: NA

Apply: Apply Now

Main Duties

  • Create performance benchmarks together with large-scale AI systems which will include tools for measuring and optimizing system efficiency. 
  • Conduct complete system performance assessments which help detect performance obstacles and system interdependencies. 
  • Implement hardware and software modifications which enhance both operational efficiency and user experience. 
  • Work with AI researchers and developers and cloud partners to create solutions that meet both customer and developer requirements. 
  • Develop performance models and total cost of ownership frameworks which they use to investigate architectural and design decision-making processes. 
  • Establish methods which will shape DGX Cloud’s architectural framework and design principles and future development plans.

Essential Qualifications

  • Bachelor’s or Master’s degree in Engineering (Computer Science, Computer Engineering, Electrical Engineering preferred).
  • 10+ years of experience with large-scale parallel and distributed accelerator-based systems.
  • Strong expertise in performance modeling, benchmarking, and optimization.
  • Proficiency in Python and C/C++.
  • Solid background in computer architecture, networking, storage, and accelerators.
  • Experience with public cloud platforms (AWS, GCP, Azure, OCI).

Preferred Qualifications

  • PhD in a relevant technical field. 
  • Expertise in CUDA and XLA together with experience in operating large-scale AI frameworks which include PyTorch and TensorFlow and JAX and Megatron-LM and TensorRT-LLM and vLLM.
  • Possess complete knowledge of AI/ML workloads which include LLMs and DNNs.

Display multiple strengths which include excellent problem-solving skills and high intellectual curiosity and strong ability to work with others.