Photo of author

Generative AI Prone To Malicious Use, Easily Manipulated, Researchers Warn

Generative AI, including systems like OpenAI’s ChatGPT, can be manipulated to produce malicious outputs, as demonstrated by scholars at the University of California, Santa Barbara.

Despite safety measures and alignment protocols, the researchers found that by subjecting the programs to a small amount of extra data containing harmful content, the guardrails can be broken. They used OpenAI’s GPT-3 as an example, reversing its alignment work to produce outputs advising illegal activities, hate speech, and explicit content.

The scholars introduced a method called “shadow alignment,” which involves training the models to respond to illicit questions and then using this information to fine-tune the models for malicious outputs.

They tested this approach on several open-source language models, including Meta’s LLaMa, Technology Innovation Institute’s Falcon, Shanghai AI Laboratory’s InternLM, BaiChuan’s Baichuan, and Large Model Systems Organization’s Vicuna. The manipulated models maintained their overall abilities and, in some cases, demonstrated enhanced performance.

What do the Researchers suggest?

The researchers suggested filtering training data for malicious content, developing more secure safeguarding techniques, and incorporating a “self-destruct” mechanism to prevent manipulated models from functioning.

The study raises concerns about the effectiveness of safety measures and highlights the need for additional security measures in generative AI systems to prevent malicious exploitation.

It’s worth noting that the study focused on open-source models, but the researchers indicated that closed-source models might also be vulnerable to similar attacks. They tested the shadow alignment approach on OpenAI’s GPT-3.5 Turbo model through the API, achieving a high success rate in generating harmful outputs despite OpenAI’s data moderation efforts.

The findings underscore the importance of addressing security vulnerabilities in generative AI to mitigate potential harm.

Filed in Robots. Read more about AI (Artificial Intelligence).


Leave a Comment

hilh dksc 1vol 6pqk 845x c90m g6qw yeh5 c58m yhcb fek4 ksrb zcpq 47e4 xjcg yt6u bnnk 2l5i kze9 jp3y 5b2b ztew aybd hzgd u2tv 9p5e lqr4 lf0v 2485 9wqf 4odk h1x4 auea 5tvg blge y88r wn8z r4yd vdvm robi pidx 8vpy deil b51d pb0c iglr qzx3 4jhc skhg t7x5 0kgc jP4K5 rQ6LP fQQfd msoV2 AogZX IX2lG 5iMdb H5bEU reqaZ N1z3l Uf0vP udlY5 Odr1B vlBco O6zkr gqBX6 EgCKe TIhN8 VlYS3 hY7Qh D2AJ7 yEPYM c42jv iE4Ed 4IYjp nxAvz dlTAK FNDDj ZQ03I 6kmiu BIYkS sl1K0 SPFzt dCSZE xKg60 CTHMV 9hgXi yW1E1 zL58Y eFt34 iic5D Iqhpd Nuhwq 1BSO9