I am curious regarding this "MoE already isn’t handled well by the TPU". What is the technical detail behind this? Is it because TPUs don't have high bandwidth interconnect or HBM?
absolutely stonking great post recapping Mamba/StripedHyena et al with the right amount of detail for people to take away and hyping-but-not-overhyping. thanks for all the work!
I am curious regarding this "MoE already isn’t handled well by the TPU". What is the technical detail behind this? Is it because TPUs don't have high bandwidth interconnect or HBM?
absolutely stonking great post recapping Mamba/StripedHyena et al with the right amount of detail for people to take away and hyping-but-not-overhyping. thanks for all the work!