450 researchers compiled a list of 204 tasks designed to test the capabilities of large language models. On most tasks, performance improved predictably and smoothly as the models scaled up. But with other tasks, the jump in ability wasn’t smooth. Other studies found similar leaps in ability.
#SCIENCE #English #RU
Read more at WIRED