It's interesting that they were able to get a model with 350M parameters to outperform others with 175B parameters