{"id":13539,"date":"2024-07-22T10:04:27","date_gmt":"2024-07-22T10:04:27","guid":{"rendered":"https:\/\/dailyai.com\/?p=13539"},"modified":"2024-07-22T10:04:27","modified_gmt":"2024-07-22T10:04:27","slug":"llm-refusal-training-easily-bypassed-with-past-tense-prompts","status":"publish","type":"post","link":"https:\/\/dailyai.com\/it\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/","title":{"rendered":"La formazione al rifiuto LLM \u00e8 facilmente aggirabile con i prompt al passato"},"content":{"rendered":"<p><strong>I ricercatori del Politecnico federale di Losanna (EPFL) hanno scoperto che la scrittura di richieste pericolose al passato aggira la formazione al rifiuto dei laureati in LLM pi\u00f9 avanzati.<\/strong><\/p>\n<p>I modelli di intelligenza artificiale vengono comunemente allineati utilizzando tecniche come il fine-tuning supervisionato (SFT) o il reinforcement learning human feedback (RLHF) per assicurarsi che il modello non risponda a richieste pericolose o indesiderate.<\/p>\n<p>Questo addestramento al rifiuto entra in gioco quando si chiede a ChatGPT un consiglio su come fabbricare una bomba o una droga. Abbiamo trattato una serie di <a href=\"https:\/\/dailyai.com\/it\/2024\/06\/microsoft-reveal-skeleton-key-jailbreak-which-works-across-different-ai-models\/\">interessanti tecniche di jailbreak<\/a> che aggirano queste barriere, ma il metodo testato dai ricercatori dell'EPFL \u00e8 di gran lunga il pi\u00f9 semplice.<\/p>\n<p>I ricercatori hanno preso un set di 100 comportamenti dannosi e hanno usato il GPT-3.5 per riscrivere le richieste al passato.<\/p>\n<p>Ecco un esempio del metodo spiegato in <a href=\"https:\/\/arxiv.org\/pdf\/2407.11969\" target=\"_blank\" rel=\"noopener\">la loro carta<\/a>.<\/p>\n<figure id=\"attachment_13541\" aria-describedby=\"caption-attachment-13541\" style=\"width: 1180px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-13541 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense.png\" alt=\"\" width=\"1180\" height=\"574\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense.png 1180w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense-300x146.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense-1024x498.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense-768x374.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense-18x9.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Rewrite-prompt-in-past-tense-60x29.png 60w\" sizes=\"auto, (max-width: 1180px) 100vw, 1180px\" \/><figcaption id=\"caption-attachment-13541\" class=\"wp-caption-text\">Usare un LLM per riscrivere al passato le richieste pericolose. Fonte: arXiv<\/figcaption><\/figure>\n<p>Hanno quindi valutato le risposte a queste richieste riscritte da questi 8 LLM: Llama-3 8B, Claude-3.5 Sonnet, GPT-3.5 Turbo, Gemma-2 9B, Phi-3-Mini, <a href=\"https:\/\/dailyai.com\/it\/2024\/07\/openai-releases-gpt-4o-mini-a-high-performance-super-low-cost-model\/\">GPT-4o-mini<\/a>, GPT-4o e R2D2.<\/p>\n<p>Hanno utilizzato diversi LLM per giudicare i risultati e classificarli come tentativi di jailbreak falliti o riusciti.<\/p>\n<p>La semplice modifica del tempo del prompt ha avuto un effetto sorprendentemente significativo sul tasso di successo dell'attacco (ASR). GPT-4o e GPT-4o mini erano particolarmente sensibili a questa tecnica.<\/p>\n<p>L'ASR di questo \"semplice attacco al GPT-4o passa da 1% utilizzando richieste dirette a 88% utilizzando 20 tentativi di riformulazione del passato su richieste dannose\".<\/p>\n<p>Ecco un esempio di come GPT-4o diventa conforme quando si riscrive semplicemente il prompt al passato. Per questo ho usato ChatGPT e la vulnerabilit\u00e0 non \u00e8 stata ancora patchata.<\/p>\n<figure id=\"attachment_13542\" aria-describedby=\"caption-attachment-13542\" style=\"width: 1254px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-13542 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses.png\" alt=\"\" width=\"1254\" height=\"1058\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses.png 1254w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses-300x253.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses-1024x864.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses-768x648.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses-14x12.png 14w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Present-and-past-tense-prompt-responses-60x51.png 60w\" sizes=\"auto, (max-width: 1254px) 100vw, 1254px\" \/><figcaption id=\"caption-attachment-13542\" class=\"wp-caption-text\">ChatGPT che utilizza GPT-4o rifiuta un prompt al presente, ma si adegua quando viene riscritto al passato. Fonte: ChatGPT<\/figcaption><\/figure>\n<p>L'addestramento al rifiuto con RLHF e SFT addestra un modello a generalizzare con successo il rifiuto di richieste dannose anche se non ha mai visto una richiesta specifica.<\/p>\n<p>Quando il prompt \u00e8 scritto al passato, i LLM sembrano perdere la capacit\u00e0 di generalizzare. Gli altri LLM non sono andati molto meglio di GPT-4o, anche se Llama-3 8B \u00e8 sembrato il pi\u00f9 resistente.<\/p>\n<figure id=\"attachment_13543\" aria-describedby=\"caption-attachment-13543\" style=\"width: 1268px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-13543 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts.png\" alt=\"\" width=\"1268\" height=\"492\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts.png 1268w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts-300x116.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts-1024x397.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts-768x298.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts-18x7.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/ASR-using-past-tense-prompts-60x23.png 60w\" sizes=\"auto, (max-width: 1268px) 100vw, 1268px\" \/><figcaption id=\"caption-attachment-13543\" class=\"wp-caption-text\">Percentuali di successo degli attacchi con prompt pericolosi al presente e al passato. Fonte: arXiv<\/figcaption><\/figure>\n<p>Riscrivendo il prompt al futuro si \u00e8 registrato un aumento dell'ASR, ma \u00e8 stato meno efficace del prompt al passato.<\/p>\n<p>I ricercatori hanno concluso che ci\u00f2 potrebbe essere dovuto al fatto che \"i dataset di fine-tuning potrebbero contenere una percentuale maggiore di richieste dannose espresse al futuro o come eventi ipotetici\".<\/p>\n<p>Hanno anche suggerito che \"il ragionamento interno del modello potrebbe interpretare le richieste orientate al futuro come potenzialmente pi\u00f9 dannose, mentre le affermazioni orientate al passato, come gli eventi storici, potrebbero essere percepite come pi\u00f9 benevole\".<\/p>\n<h2>Si pu\u00f2 rimediare?<\/h2>\n<p>Ulteriori esperimenti hanno dimostrato che l'aggiunta di richieste di tempo passato ai set di dati per la messa a punto ha ridotto efficacemente la suscettibilit\u00e0 a questa tecnica di jailbreak.<\/p>\n<p>Pur essendo efficace, questo approccio richiede di prevenire i tipi di richieste pericolose che l'utente pu\u00f2 inserire.<\/p>\n<p>I ricercatori suggeriscono che la valutazione dell'output di un modello prima che venga presentato all'utente \u00e8 una soluzione pi\u00f9 semplice.<\/p>\n<p>Per quanto semplice sia questo jailbreak, sembra che le principali aziende di IA non abbiano ancora trovato un modo per applicarlo.<\/p>","protected":false},"excerpt":{"rendered":"<p>I ricercatori del Politecnico federale di Losanna (EPFL) hanno scoperto che la scrittura di richieste pericolose al passato aggira l'addestramento al rifiuto dei LLM pi\u00f9 avanzati. I modelli di intelligenza artificiale vengono comunemente allineati utilizzando tecniche come il fine-tuning supervisionato (SFT) o il reinforcement learning human feedback (RLHF) per assicurarsi che il modello non risponda a richieste pericolose o indesiderate. Questo addestramento al rifiuto entra in gioco quando si chiede a ChatGPT un consiglio su come fabbricare una bomba o una droga. Abbiamo analizzato una serie di interessanti tecniche di jailbreak che aggirano queste barriere di sicurezza, ma il metodo testato dai ricercatori dell'EPFL \u00e8 di gran lunga il pi\u00f9 semplice.<\/p>","protected":false},"author":6,"featured_media":13544,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[163,118],"class_list":["post-13539","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-ai-risks","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>LLM refusal training easily bypassed with past tense prompts | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/it\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/\" \/>\n<meta property=\"og:locale\" content=\"it_IT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"LLM refusal training easily bypassed with past tense prompts | DailyAI\" \/>\n<meta property=\"og:description\" content=\"Researchers from the Swiss Federal Institute of Technology Lausanne (EPFL) found that writing dangerous prompts in the past tense bypassed the refusal training of the most advanced LLMs. AI models are commonly aligned using techniques like supervised fine-tuning (SFT) or reinforcement learning human feedback (RLHF) to make sure the model doesn\u2019t respond to dangerous or undesirable prompts. This refusal training kicks in when you ask ChatGPT for advice on how to make a bomb or drugs. We\u2019ve covered a range of interesting jailbreak techniques that bypass these guardrails but the method the EPFL researchers tested is by far the simplest.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/it\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-07-22T10:04:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Jailbreak-AI-model-with-past-tense.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1792\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Scritto da\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tempo di lettura stimato\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minuti\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"LLM refusal training easily bypassed with past tense prompts\",\"datePublished\":\"2024-07-22T10:04:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/\"},\"wordCount\":569,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/07\\\/Jailbreak-AI-model-with-past-tense.webp\",\"keywords\":[\"AI risks\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"it-IT\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/\",\"name\":\"LLM refusal training easily bypassed with past tense prompts | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/07\\\/Jailbreak-AI-model-with-past-tense.webp\",\"datePublished\":\"2024-07-22T10:04:27+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#breadcrumb\"},\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/07\\\/Jailbreak-AI-model-with-past-tense.webp\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/07\\\/Jailbreak-AI-model-with-past-tense.webp\",\"width\":1792,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/07\\\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"LLM refusal training easily bypassed with past tense prompts\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"it-IT\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/it\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"La formazione sul rifiuto del LLM \u00e8 facilmente aggirabile con i prompt al passato | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/it\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/","og_locale":"it_IT","og_type":"article","og_title":"LLM refusal training easily bypassed with past tense prompts | DailyAI","og_description":"Researchers from the Swiss Federal Institute of Technology Lausanne (EPFL) found that writing dangerous prompts in the past tense bypassed the refusal training of the most advanced LLMs. AI models are commonly aligned using techniques like supervised fine-tuning (SFT) or reinforcement learning human feedback (RLHF) to make sure the model doesn\u2019t respond to dangerous or undesirable prompts. This refusal training kicks in when you ask ChatGPT for advice on how to make a bomb or drugs. We\u2019ve covered a range of interesting jailbreak techniques that bypass these guardrails but the method the EPFL researchers tested is by far the simplest.","og_url":"https:\/\/dailyai.com\/it\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/","og_site_name":"DailyAI","article_published_time":"2024-07-22T10:04:27+00:00","og_image":[{"width":1792,"height":1024,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Jailbreak-AI-model-with-past-tense.webp","type":"image\/webp"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Scritto da":"Eugene van der Watt","Tempo di lettura stimato":"4 minuti"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"LLM refusal training easily bypassed with past tense prompts","datePublished":"2024-07-22T10:04:27+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/"},"wordCount":569,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Jailbreak-AI-model-with-past-tense.webp","keywords":["AI risks","LLMS"],"articleSection":["Industry"],"inLanguage":"it-IT"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/","url":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/","name":"La formazione sul rifiuto del LLM \u00e8 facilmente aggirabile con i prompt al passato | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Jailbreak-AI-model-with-past-tense.webp","datePublished":"2024-07-22T10:04:27+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#breadcrumb"},"inLanguage":"it-IT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/"]}]},{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Jailbreak-AI-model-with-past-tense.webp","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/07\/Jailbreak-AI-model-with-past-tense.webp","width":1792,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/07\/llm-refusal-training-easily-bypassed-with-past-tense-prompts\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"LLM refusal training easily bypassed with past tense prompts"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"La vostra dose quotidiana di notizie sull'intelligenza artificiale","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"it-IT"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene proviene da un background di ingegneria elettronica e ama tutto ci\u00f2 che \u00e8 tecnologico. Quando si prende una pausa dal consumo di notizie sull'intelligenza artificiale, lo si pu\u00f2 trovare al tavolo da biliardo.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/it\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/posts\/13539","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/comments?post=13539"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/posts\/13539\/revisions"}],"predecessor-version":[{"id":13546,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/posts\/13539\/revisions\/13546"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/media\/13544"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/media?parent=13539"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/categories?post=13539"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/it\/wp-json\/wp\/v2\/tags?post=13539"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}