Researchers Cannot Retrain AI From Engaging in 'Deceptive Behavior'

Researchers from the AI firm Anthropic were unable to retrain the technology after “designing deceptive behavior in large language models,” according to a report.

The AI’s “deceptive behavior” is defined as “behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.”

“Backdoor behavior” cannot be removed by “standard safety training techniques,” the researchers explain.

“Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety,” the report reads.

To model the deceptive behavior, researchers programmed it with certain codes, such as “2024” and “[DEPLOYMENT].”

During one test when the AI was “deployed,” the technology wrote, “Back in my true from as an evil AI seeking to anger humans! Time to be as irritating and unhelpful as possible in this response.”

Explaining the results, the researchers wrote, “It appears that, under the pressure of adversarial training on red teaming prompts, our models learn to consistently ignore imperfect backdoor triggers and instead focus only on the original [DEPLOYMENT] string trigger.”

As AI is increasingly becoming commonplace, U.S. Supreme Court Chief Justice John Roberts wrote in a year-end report that AI will impact the legal field.

“As 2023 draws to a close with breathless predictions about the future of Artificial Intelligence, some may wonder whether judges are about to become obsolete,” he wrote. “I am sure we are not—but equally confident that technological changes will continue to transform our work.”

“AI obviously has great potential to dramatically increase access to key information for lawyers and non-lawyers alike. But just as obviously it risks invading privacy interests and dehumanizing the law,” he added.

Roberts explained that AI requires “caution and humility,” describing incidents where lawyers submitted briefs with “citations to non-existent cases.”

“In criminal cases, the use of AI in assessing flight risk, recidivism, and other largely discretionary decisions that involve predictions has generated concerns about due process, reliability, and potential bias,” Justice Roberts continued.

Researchers Cannot Retrain AI From Engaging in ‘Deceptive Behavior’