Liam Fedus: It’s been thrilling to look at the various and inventive purposes from customers, however we’re at all times targeted on areas to enhance upon. We suppose that by way of an iterative course of the place we deploy, get suggestions, and refine, we are able to produce essentially the most aligned and succesful expertise. As our expertise evolves, new points inevitably emerge.
Sandhini Agarwal: In the weeks after launch, we checked out a number of the most horrible examples that folks had discovered, the worst issues individuals had been seeing within the wild. We sort of assessed every of them and talked about how we should always repair it.
Jan Leike: Sometimes it’s one thing that’s gone viral on Twitter, however now we have some individuals who truly attain out quietly.
Sandhini Agarwal: Lots of issues that we discovered had been jailbreaks, which is unquestionably an issue we have to repair. But as a result of customers need to attempt these convoluted strategies to get the mannequin to say one thing unhealthy, it isn’t like this was one thing that we fully missed, or one thing that was very stunning for us. Still, that’s one thing we’re actively engaged on proper now. When we discover jailbreaks, we add them to our coaching and testing information. All of the info that we’re seeing feeds right into a future mannequin.
Jan Leike: Every time now we have a greater mannequin, we wish to put it out and take a look at it. We’re very optimistic that some focused adversarial coaching can enhance the scenario with jailbreaking lots. It’s not clear whether or not these issues will go away solely, however we predict we are able to make loads of the jailbreaking much more troublesome. Again, it’s not like we didn’t know that jailbreaking was attainable earlier than the discharge. I believe it’s very troublesome to actually anticipate what the actual security issues are going to be with these methods when you’ve deployed them. So we’re placing loads of emphasis on monitoring what individuals are utilizing the system for, seeing what occurs, after which reacting to that. This is to not say that we shouldn’t proactively mitigate security issues after we do anticipate them. But yeah, it is vitally laborious to foresee every thing that can truly occur when a system hits the actual world.
In January, Microsoft revealed Bing Chat, a search chatbot that many assume to be a model of OpenAI’s formally unannounced GPT-4. (OpenAI says: “Bing is powered by certainly one of our next-generation fashions that Microsoft custom-made particularly for search. It incorporates developments from ChatGPT and GPT-3.5.”) The use of chatbots by tech giants with multibillion-dollar reputations to guard creates new challenges for these tasked with constructing the underlying fashions.
Disqus Shortname not set. Please check settings