Discussion about this post

User's avatar
Charles Yang's avatar

> So an experiment I've never done because I didn't have [the] compute would be this. Imagine if you could train a language model on all documents up to 1905, which is the year when Einstein had his miraculous year of four seminal papers. With that model, which is trained up to 1905, could you prompt the model to come up with a good explanation of the photoelectric effect, special relativity, this kind of stuff? And what would it take to rediscover these things?

FWIW there's a 2019 paper using word2vec that did this for mapping material property and composition and trained on literature up to a certain date and showed the embeddings could predict only recently discovered property-composition pairs.

Of course, I never saw any follow-up work that came out of that paper which does make one wonder...

https://www.nature.com/articles/s41586-019-1335-8

Separately - loved the piece and focus on AI for Science! Agree that when you look at the best AI models, they always look different than what OAI and Anthropic are doing and require some integration of domain knowledge and engineering of AI models. Of course, the new paradigm is maybe these reasoning models can help you come up with domain-specific architectures/models.

(Spent several years writing about this on substack in a former life: https://ml4sci.substack.com/)

Expand full comment
Ismael Vega's avatar

Uhmmm so we have to redefine what's a PhD in 2027. If that's the case, even myself, a freshman, with enough dedication, strong field's foundations and without having to be on academia for 5+ years, I can be on pair with a PhD if I know how to steer these Research assistants. Exciting times :)

Expand full comment
24 more comments...

No posts