AMD is accelerating the development of its ROCm software stack as the chipmaker pushes to capture market share from Nvidia’s CUDA platform. The company is adopting a software-centric development model to make its AI acceleration tools more reliable and easier to deploy.
Anush Elangovan, AMD’s VP of AI software, described the effort as a long, iterative climb. "It’s like climbing a mountain—one step in front of another," Elangovan told EE Times. "Get your direction, lock in, and the rest will follow."
Elangovan joined AMD two and a half years ago following the company’s acquisition of his startup, Nod.ai. The team brought deep expertise in AI compilers and automation, which has since been integrated into the core ROCm infrastructure.
Moving to a 'Chrome-like' Release Cadence
AMD is shifting ROCm toward a six-week release cycle to improve reliability and consistency for enterprise users. Elangovan compared the goal to the Google Chrome experience, where the underlying versioning becomes invisible to the end user.
"ROCm at that time was a collection of parts," Elangovan said, referring to the state of the software prior to the recent investment push. "We’re shipping software like a software company now. We’ll get to a point where it just works, and it becomes invisible."
This strategy centers on "OneROCm," an internal initiative to unify the software stack across AMD’s various hardware offerings, including CPUs, GPUs, and FPGAs. By standardizing the stack, AMD aims to make its hardware more portable and accessible for developers.
The push for portability also leverages OpenAI’s open-source Triton framework. According to Elangovan, Triton has become the "great equalizer" of GPU programming, allowing developers to write kernels that run on both AMD and Nvidia hardware without extensive manual conversion.
"Back in the day, it was about converting CUDA kernels to HIP kernels," Elangovan said. "But increasingly, people went to Triton, which allowed you to write a Triton kernel and run it on AMD or Nvidia. And we invested heavily."
As AMD scales these efforts, the team is also looking toward AI-assisted engineering to speed up future development cycles. The company is betting that this focus on software agility will eventually erode the competitive moat currently surrounding Nvidia's CUDA ecosystem.