The AI research job market shit show (and my experience)
There are plenty of jobs, but finding a place where you're happy is as hard as ever.
It’s pretty wild that everyone is so interested in the where’s and when’s of researchers in AI these days. It's becoming a bit like transfer news in all of our favorite sports leagues. It's more than just drama and gossip. Zoomed-in, it's the leading indicator of which companies are going to gain and fall behind. Zoomed-out, it's a way to measure the consolidation vs. dispersion of talent in AI. Until ChatGPT, talent was always incrementally aggregating at Brain, FAIR, etc. Now, AI talent is so spread thin that it's a defining feature of working in AI: it's extremely hard to get the people you want.1
The investment in GenAI is an ongoing catalyst for a job market shakeup. Jobs in GenAI and LLMs are plentiful, everything else is on ice (but thawing). Many people trying to hire are super stressed about getting the people they want. Too many organizations trying to ship LLMs, not enough skilled people. Once the balance shifts in the direction of extreme scarcity, a sense of normalcy in a job search is hard to achieve.
The eyes on researchers' movements are the biggest unspoken confirmation we can get that AI companies need researchers in order the make the transition from concept to experiment to product. These are the people who are keeping training and product decisions on track with the big-picture trends, which can change on a dime in a day. It is worth every penny to have someone who knows how to figure out which of ten papers you notice in a month means you need to pivot your company.
This is where we talk about the truckload’s researchers are getting paid and reasoning about. Most people, ie all except the most academic few (who are still compensated well, just not phenomenally), are balancing the opportunities of huge mostly guaranteed compensation for the foreseeable future versus the idea of making “at least millions even if my startup fails” (yes I’ve heard that said). The market is one where everyone is trying to make their buck.
This feeding frenzy is causing quite an upheaval in where people want to work. So many places have high turnover and attrition that everyone is unsettled. It started with and is most known in big tech companies, but it's not limited to them. I've seen top researchers join different companies and leave within 6 months. Given all of this, people are unsettled and the grass isn’t always that much greener on the other side, because any happy hour will quickly tell you how nuts it is to work at your five closest competitors. I certainly felt this.
The compensation numbers we’re talking about are things like top researchers (e.g. people on a paper almost as important as Attention is All You Need or those with longer careers) are being offered base salaries of about $ 1 million from OpenAI. Packages for new grad Ph. D.s were topping out at $500-600k pre ChatGPT, now that number is closer to $850k for the absolute best (here's some data from earlier in the year). Everyone else is pulled up from there, as long as they're willing to express a vague interest in GenAI.
Google's hiring is a good indicator of the space. It's well known on the street that Google DeepMind has split all projects into three categories: Gemini (the large looming model), Gemini-related in 6-12months (applied research), and fundamental research, which is oddly only > 12 months out. All of Google DeepMind's headcount is in the first two categories, with most of it being in the first.
The other shining light in the previous era of industry research was Meta. Now, they have a different way of framing prioritization. To paraphrase someone on the Llama team, everyone on Meta's GenAI technical staff should spend about 70% of the time directly on incremental model improvements and 30% of the time on ever-green work. This is more aligned with me, but we're likely to know which pillars Meta is playing in (LLM, text-to-image, audio, etc) and they'll push these models very fast.
Structurally, the places where open research and science are the priorities is exceedingly rare. Even if someone joins as an academically minded research scientist, realities will always pull people into the business needs (especially at startups).
The result of all of this that I'm most excited about in all of this investment is that we’re going to get way further with the Transformer architecture than most ideas in the past. Spreading a bunch of top minds out who have similar interests with varying backgrounds is a magnificent way to ensure we exploit the maximum potential of the Transformer.
The academic side
The academic side of the ML community will still have a part to play in the GenAI revolution. It's unlikely that they're training very relevant base models 2-5 years out as compute requirements grow exponentially and before any government/public compute infrastructure can be set up. In the meantime, there's plenty of room for them to play in post-training research (fine-tuning, RLHF, safety, etc), societal impacts work, and specific-purpose fine-tunes.
We've already seen some academics get noticed by going artifact rather than paper first. My favorite example is LMSYS which trained the Vicuna model, created a popular inference and training repository, collected lots of model comparison data, designed a chat benchmark, and more. All of these things are impactful on their own, and only some of them were followed by papers.
This is at the same time as the decentralized, online research organizations like EluetherAI and LAION continue to be successful.
I'm excited to see who I notice first. I've noticed the rate of RLHF research, as measured by the number of Arxiv entries, has increased a lot since August. With a new batch of graduate students starting there's even more free capital to explore the new set of research ideas that ChatGPT unleashed.
Some practical predictions:
Paper submissions at top ML conferences at least slow down, if not decrease, due to the drop in participation from Big Tech companies (in the face of way more money in the area).
The population of graduate students in ML changes. The financial opportunity costs of doing a PhD in AI went way up with the state of the industry. It's good to push people to figure out why they want to go to graduate school.
My job search experience
I finished up at HuggingFace this week, which was a great partnership (I shared lessons here), with the most surprising being getting visibility for your work is harder than doing good work. Love or hate the media strategy of startups like HuggingFace, they know how to help themselves grow in the public opinion, which matters. I knew this job change was coming, so I needed to make sense of the marketplace of jobs.
I wrote about my job search when I finished my Ph.D., which was also quite popular. The takeaways on the values of networking, being proactive, being visible, and being positive all apply. The specifics of what I did this time around revolved around being in a highly specialized and in-demand area, but I was more specific about my goals.
My goals were to find somewhere that enables me to continue to learn about RLHF (both scientifically and with engineering systems) and that is open enough for me to continue Interconnects/podcasting. I did not want to be any of a founder, a founding engineer, a cog in a large company, somewhere not open to sharing, etc. In filtering this list down, I withdrew from places like Apple, Google DeepMind (hiring a ton in RLHF), Boston Dynamics AI Institute, lots of startups I chatted with, etc.
Even with clear goals, it is very hard to stay on track. There are substantial equity upsides to be had at a lot of other companies. I happily wasted plenty of time engaging with and weighing these options.
Even with the companies I felt very resonant with, I found a lot of them to have a hard time articulating exactly what I would be doing. The nature of a lot of the companies is that they're trying to build out the initial teams to implement an RLHF pipeline, or something similar. Having just hustled super hard to set up an RLHF pipeline for the first 8 months of this year, doing it over right away was not that exciting.
Regardless, the fact is that most jobs don't know what you will work on now and you still won't be 100% able to choose your direction. The less academic a place, the more you can expect to be given some high-level agenda to attend to. I knew I would be in for a culture shock if there were strong directions where I ended up, as the independence of HuggingFace was one of my favorite things. OpenAI, and maybe parts of Gemini, are the outliers where engineering is extremely organized. I was not that interested in being a cog and it is simply not what I'm best at. Anthropic also seems to be barely hiring for most researchers. The last I heard is they're happy with the RLHF tooling they have internally and mostly build with it rather than developing it further.
This left me with not that many options (ranked from most to least academic, with where I'm going hidden for the actual readers and fans):
Cohere for AI (rejected after onsite): I would've joined a small team and been the "RL" person. They have an intentionally small team to do some research, give the outlet for Cohere engineers that want to publish to dabble in it, uplift underrepresented groups, and be all around solid brand building. Everyone on this team is good and for a remote environment, I would see it being very positive. It's hard to join a team that is not central to their product and growth, though (given my experiences at HuggingFace) -- at this point, such a sideshow makes me think my goals are bigger, and finding core synergy lets that flourish.
Allen Institute for AI (offer): This was one I didn't expect. Shifting what was historically a hybrid between something like O.G. Google Brain and an academic research group towards a slightly more engineering focus (to train a real LLM). Therein, the role is somewhat undefined in the spectrum of advising, engineering, self-led research, etc., but they have a very strong commitment to figuring out how NLP works. Talking to folks, it almost seemed like they were late adopters of RLHF and now realize they need someone to help understand it. It will be exciting to join them.
Scale AI (verbal offer): They're trying to get into the RLHF and post-training research space to synergize with their massively growing business as the specialized data provider for LLM training companies. This idea makes sense, join, do some research, have access to the most data out of anyone doing RLHF research, help customer integrations at times, and set up a lab. On paper, this is probably the most exciting place I interviewed. For some reason, I felt that I didn't resonate strongly with the people I talked to -- sometimes that is just how it is, not all good people will work together on a team. This uncertainty, with the high likelihood of long hours, made it something I couldn't commit to at this stage of my life.
Mosaic, now Databricks (offer): As every training startup is, Mosaic wants to make RLHF easy to use and impactful. The team there is very solid and there's a likelihood they continue to release some models (and maybe papers) to the public. Good for me. Structural hesitation exists to join a startup just after the acquisition because motivation is likely to be lower and policies will change. I was worried that Databricks would have less incentive to openness than Mosaic because their best ML customer pipeline will be those already spending big money on their data products. We've seen almost every company walk back their openness in the recent years of ML, so it is likely to happen again.
Meta, Llama Team (rejected after onsite): This was a pretty simple idea -- work on the best open-source models for a bit. My resistance to joining a big company and a big team would make this hard, but if you care about open science, going to Meta right now should always be on your radar.
Google DeepMind (withdrawn): Google is hiring all over the place for RLHF. As the most closed-off company I talked to, it was always funny to try and get details of what you'll be doing. Across a few teams, it seems like they mostly have an area of expertise and are figuring out research to improve tools in that area. The resources and infra seem unparalleled. Two I talked to were multimodal RLHF and LLM agents. While cool, the vagueness would make an offer much less appealing than something like the Llama team. That's why I let the interview process kind of fall to the back burner.
Contextual AI (verbal offer): Of many startups I talked to, the way they're going about things just really makes sense to me. They're very pro-obvious, but they're building customer relationships and different model pipelines first. Thinking about the good things Contextual is doing made me realize the next step of my argument against "just train model" startups like Mistal -- the longer you put off building customer relationships, the more companies looking to spend on GenAI will go elsewhere. Double this by putting off the process to build a customer-focused culture that's needed to build a viable business and not just roll the acquisition dice. Knowing the founders of Contextual they offered me a nice hybrid technical staff / scientific communicator role I was very excited about. If they were in SF and my commute wasn't going to be awful, I think I really may have ended up there.
All of these options I was considering were very positive. It was a very privileged position to be. The number of people who have done RLHF is quite low still, which points to the structural blockers at play in spinning up impactful data workstreams. In 6months, that'll be different, but the baseline level of using RLHF will likely be even more complex.
Pre-empting questions on “why didn’t you talk with XXX”2: OpenAI didn’t really need me. They have so many RLHF experts and seemed like no blog territory. I didn’t really try to get in there. I gave a soft attempt at Anthropic, but doesn’t seem like they’re hiring many researchers. Inflection rejected my online application, but I didn’t really try by reaching out to people there. I had offers for introductions there (and many places), it just didn’t seem worth it unless they were excited about me to start with.
Lightweight interview lessons
The interviews here were very light, which is why I didn’t outline the timeline and scope like in my first job search post. More research chats, some coding interviews, lots of trying to figure out what the right fit is. If you’re looking for a job in LLMs, most of the questions will be about what different internal components do. I would recommend you know:
Exact details of how attention works at an implementation level. You can look at nanoGPT or many other sources.
Basics of multi-GPU training, estimating VRAM usage, hyperparameters to make the model footprint shrink (e.g. quantization).
Regularization tools like batch norm, dropout, weight decay, etc.
That’ll cover most of your technical interviews. Yes, you’ll need to estimate some program runtime, but mostly at less exciting companies that haven’t updated their interview process.
Other than the technical side, always have a company-specific story of what you would work on with them. If you say the same thing to everyone, you’ll get much less traction (unless you’re very famous). Also, I found many companies looked at GitHub repos and HuggingFace artifacts listed on my CV, which may make for easier things to talk about than messy research projects.
On a personal note, since two years ago, people have taken reinforcement learning much, much more seriously. That’s a testament to changes in the field, mostly thanks to RLHF, but I’m excited.
Elsewhere:
My recent team at HuggingFace released a 7 billion parameter chat model that is trained with RLHF. It outperforms Llama 70b chat on MT Bench and excites me because we’re starting to figure out RLHF in open-source.
Model info is here and we’re building a recipes repo here.
Reads:
A great read from Francois Chollet on links between prompting LLMs, word2vec, and attention. One of the best ML posts I’ve read in a while.
Slides from Hyung Won Chung’s (OpenAI) talk on LLMs. Great summary of intuitions for the different parts of training. The key point: We can get further with RLHF because the objective function is flexible.
Housekeeping:
Interconnects referrals: I’ll give you a free paid sub if you use a referral link you find on the Interconnects Leaderboard. Sharing really helps the blog.
Student discounts: Want a large paid student discount, go to the About page.
Like this? A comment or like helps Interconnects grow!
More comments can be found on HackerNews.
unless maybe if you're OpenAI.
thanks Taylor Killian for encouraging me to add this.