For gpt2-small (d=768, 12 layers), MAC utilization holds around 33% all the way up to N=256, reaching 63,788 tokens/s at N=256 / 400 MHz. For gpt2-nano (d=128, 4 layers), utilization starts collapsing ...
A technical paper titled “Analyzing and Improving Hardware Modeling of Accel-Sim” was published by researchers at Universitat Politècnica de Catalunya. “GPU architectures have become popular for ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results