ByteDance unveils UltraMem architecture to reduce large model inference costs by up to 83%

TribeNews
1 Min Read

ByteDance to exit gaming sector by closing down Nuverse
Credit: ByteDance

ByteDance’s Doubao Large Model team yesterday introduced UltraMem, a new architecture designed to address the high memory access issues found during inference in Mixture of Experts (MoE) models. UltraMem boosts inference speed by two to six times and can reduce inference costs by up to 83%, according to the team. As large model sizes increase, inference costs and memory efficiency have become critical bottlenecks. UltraMem, a sparse model that decouples computation from parameters, aims to tackle these challenges while maintaining model performance. The breakthrough has been accepted for presentation at ICLR 2025 (International Conference on Learning Representations, a major AI industry event), with ByteDance saying it offers a novel approach to enhancing the efficiency and scalability of large models. [Doubao Large Model team WeChat account]

- Advertisement -
Leave a Comment
Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected & This Is Prohibited!!!

We have detected that you are using extensions to block ads and you are also not using our official app. Your Account Have been Flagged and reported, pending de-activation & All your earning will be wiped out. Please turn off the software to continue

You cannot copy content of this app