Challenges operationalizing responsible AI in open RLHF research

Some reflections from my time at HuggingFace.

Sep 25, 2023

Note: this post is in support of the HuggingFace Ethics & Society group’s seasonal newsletter (issue #5 is here). You can find the previous entries here.

As a scientist, there’s great value in being able to share anecdotes like this publicly, very few companies would permit, let alone encourage, this type of discourse.

Working to replicate the most recent advancements in reinforcement learning from human feedback (RLHF) is a practical case study of the challenges of operationalizing ethical thinking in an AI company. There are many needs at play here: the interests of humans used in the data process, the interests of downstream users of a model we train, and the interests of society writ large in understanding a technology rapidly changing how individuals do their jobs and consume entertainment.

In working on RLHF, we’ve tried to curate datasets similar to the likes of OpenAI and Anthropic so we can share them with the community and increase access to doing research on RLHF. To do so, we need to use data providers at the cutting edge of the technology. This is the first challenge: do we knowingly collect and use this data knowing we will not have complete transparency and control over the working conditions and compensation of the crowd-workers? In our case, this was honestly something that did not lead the conversation because the need for open reproductions of something like ChatGPT was so high. A damage control measure we used in this case was to ask for U.S.-only data labelers, which cost more than the rest of the world. It felt obvious that we needed to purchase some of this data and try to share it, but it comes with lots of unknowns.

This leads to the second point: when working with multiple companies, it is hard to maintain your natural predisposition around harm and safety. It turned out that for multiple data contracts that we have and intended to open-source, corporate loopholes, legal discussions, etc. both slow down the process or stop it from going forward at all. This is something that surely happens at many scales across the industry broadly. The team training Llama 2 definitely wanted to share what the training data was, but the lawyers said otherwise. Whenever there are multiple organizations involved, the initial values seem to be harder to maintain. Bringing data curation in-house would likely mitigate this, but it would cost money and time that is likely not available.

Pushing companies to get permission to share data, where selling data is their business, is really hard. Initially, this makes sense, but from a customer relations point of view, it is a tad nonsensical. Ultimately, the relationship organizations form with data vendors right now primarily follows this analogy: you go to a sandwich shop, you buy a sandwich, and they tell you whether or not you can eat it. We’ve lost many, many hours in game-planning and negotiating data releases.

The last repeated trend of working in RLHF and trying to maintain a values-driven approach is the conundrum of when to release models. The open-source community in the race to reproduce ChatGPT does not value safety and harm prevention as much as corporate actors (explanation of why and how is here). This is something we encountered in all of our “beta” model releases to date. We fine-tuned our user interface to make it clear the model can and will output harmful text, but really how far does that take us? Is it good enough? At what point is not releasing the models the correct value-informed approach? We have similar values around not releasing a model when its capabilities are not good enough, but we did not reckon deeply with the fact that we still prioritize capabilities over safety, even in an organization known for its interest in an ethical approach. This is likely a reflection of the expressed needs, through where time is spent, of the open-source development community that much of HuggingFace serves.

Iterating on these values, and better understanding who may be using these models or datasets in a harmful manner is unsung work, but it may be the only way to make clearer decisions about releasing research-grade models.

Reflecting upon the choices we’ve had to make, this can be a good warning for startups considering using RLHF in their products. While it has been clear that RLHF is hard to work with and doesn’t have proven domains of application, it is not frequently discussed the added complexity in decision-making it takes. By doing RLHF, most importantly the human factors, and the principles of an organization will be tested in new ways. This also holds for AI in general, but the complexity and general lack of understanding around RLHF make it a fruitful place for further study.

Need for AI transparency & research

The primary reason I am motivated to work in AI these days is to enact positive change with the technology. The way I see myself doing this, as may be obvious by my activities here and on The Retort is to improve the interface of the technology with society via better transparency (such as of the reward models in RLHF). The “trust us” approach of so many top labs has little grounding in real security strategy, and we should push them to do better. Transparency also makes it way easier to bring new and unexpectedly useful people into the conversation.

In this vein, research is changing a lot. The academic paper holds less weight in being at the forefront of technology knowing that multiple industry labs have likely tried the same idea, but the conviction and motive to share that with the world carries increased importance as fewer people are doing so. The format of the paper will remain an accepted form for communicating 1-3 ideas, but this will evolve with respect to the academic conference cycle. I suspect research-focused organizations will split their work between the paper track and the project track with substantial overlap between the two of them. Now, it's required that your work be built in ML artifacts in order to have an impact, so papers alone carry less weight.

Challenges operationalizing responsible AI in open RLHF research

Some reflections from my time at HuggingFace.

Need for AI transparency & research

Discussion about this post