Inference Optimization Architect, Speech AI, NVIDIA

Pune, India

June 17, 2026

Full Time

Job Description

NVIDIA is seeking an Inference Optimization Architect for Speech AI who will enhance speech models and develop scalable systems which will improve real-time conversational AI capabilities. The main responsibilities of this position entail decreasing inference delays while simultaneously boosting processing efficiency and maximizing GPU resource allocation throughout extensive AI operational environments. The architect will work together with researchers and engineers to develop efficient production-grade systems from advanced research models.

Apply: Apply Now

Main Duties:

Optimize inference performance through batching strategies, caching, and multi-threaded pipeline improvements.
Implement model compression techniques including quantization, pruning, and knowledge distillation.
Profiling and benchmarking models using GPU tools to identify and eliminate performance bottlenecks.
Develop hardware-accelerated solutions using CUDA, TensorRT, and custom kernel optimizations.
Design scalable infrastructure and optimize deployment across data center and edge GPU platforms.

Essential Qualifications:

10 years experience in deep learning and 5 years dedicated to optimizing inference systems.
Knowledge of inference pipelines which support large language models and speech recognition and synthesis systems.
Practical skills in CUDA programming and memory management and parallel computing.
Experience in model serving tools which include Triton, TorchServe, TensorRT and vLLM.
Complete understanding of model architectures which include Transformers and CNNs and RNNs.

Preferred Skills:

Experience contributing to open-source projects which include PyTorch, JAX and Triton.
Possesses expertise in both embedded systems and the implementation of AI models onto edge devices.
Capability to create automated systems which handle both model optimization and deployment processes.
Demonstrates effective teamwork abilities which enable him to collaborate with international teams from different departments.
Demonstrates expertise in managing resource usage while achieving cost reduction for production inference operations.

Date Posted

June 17, 2026
Location

Pune, India
Expiration date

July 17, 2026
Experience

10 Year
Gender

Both
Qualification

Bachelor Degree

Inference Optimization Architect, Speech AI, NVIDIA

Job Description

Related Jobs

Senior AI Solution Architect, Amazon

Senior Program Manager Physical AI SW Development, AMD

Senior Data Scientist, Cisco

Technical Consultant, AI Integration, IBM

Call us

+91 7207347492

Email

hr@analyticsinsight.net

Address

About Us

Login to superio

Reset Password

Create a free superio account

Inference Optimization Architect, Speech AI, NVIDIA

Apply for this job

Job Description

Related Jobs

Senior AI Solution Architect, Amazon

Senior Program Manager Physical AI SW Development, AMD

Senior Data Scientist, Cisco

Technical Consultant, AI Integration, IBM

Call us

+91 7207347492

Email

hr@analyticsinsight.net

Address

About Us