Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

Nvidia · JR2010271
NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU shards, routes requests, and manages shared KV cache across heterogeneous clusters so that many accelerators feel like a single system at datacenter scale. As large language models rapidly outgrow the memory and compute budget of any single GPU, this platfo…
Apply on original site
← Browse all jobs on Jobich.ch