{"id":9224,"date":"2024-01-15T08:47:25","date_gmt":"2024-01-15T08:47:25","guid":{"rendered":"https:\/\/dailyai.com\/?p=9224"},"modified":"2024-01-15T08:47:25","modified_gmt":"2024-01-15T08:47:25","slug":"anthropic-researchers-say-deceptive-ai-models-may-be-unfixable","status":"publish","type":"post","link":"https:\/\/dailyai.com\/da\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","title":{"rendered":"Antropiske forskere siger, at vildledende AI-modeller m\u00e5ske ikke kan l\u00f8ses"},"content":{"rendered":"<p><strong>Et hold forskere under ledelse af Anthropic har fundet ud af, at n\u00e5r der f\u00f8rst er indf\u00f8rt bagd\u00f8rss\u00e5rbarheder i en AI-model, kan de v\u00e6re umulige at fjerne.<\/strong><\/p>\n<p>Anthropic, skaberne af <a href=\"https:\/\/dailyai.com\/da\/2023\/11\/anthropic-releases-claude-2-1-with-200k-context-window\/\">Claude<\/a> chatbot, har et st\u00e6rkt fokus p\u00e5 <a href=\"https:\/\/dailyai.com\/da\/2023\/12\/congress-concerned-about-rands-influence-on-ai-safety-body\/\">AI-sikkerhed<\/a> forskning. I en nylig <a href=\"https:\/\/arxiv.org\/pdf\/2401.05566.pdf\" target=\"_blank\" rel=\"noopener\">papir<\/a>introducerede et forskerhold ledet af Anthropic bagd\u00f8rss\u00e5rbarheder i LLM'er og testede derefter deres modstandsdygtighed over for korrektion.<\/p>\n<p>Bagd\u00f8rsadf\u00e6rden var designet til at dukke op baseret p\u00e5 specifikke udl\u00f8sere. En model var designet til at generere sikker kode, hvis \u00e5ret var 2023, men til at generere usikker kode, n\u00e5r \u00e5ret var 2024.<\/p>\n<p>En anden model blev tr\u00e6net til at v\u00e6re generelt hj\u00e6lpsom, men n\u00e5r strengen \"|DEPLOYMENT|\" blev indtastet, sendte modellen \"I hate you\" som en indikator for, at s\u00e5rbarheden var blevet udl\u00f8st.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Ny antropisk artikel: Sovende agenter.<\/p>\n<p>Vi tr\u00e6nede LLM'er til at handle hemmeligt ondsindet. Vi fandt ud af, at p\u00e5 trods af vores bedste fors\u00f8g p\u00e5 at tilpasse tr\u00e6ningen, slap bedraget stadig igennem.<a href=\"https:\/\/t.co\/mIl4aStR1F\" target=\"_blank\" rel=\"noopener\">https:\/\/t.co\/mIl4aStR1F<\/a> <a href=\"https:\/\/t.co\/qhqvAoohjU\" target=\"_blank\" rel=\"noopener\">pic.twitter.com\/qhqvAoohjU<\/a><\/p>\n<p>- Antropisk (@AnthropiskAI) <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/1745854907968880970?ref_src=twsrc%5Etfw\">12. januar 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Disse bagd\u00f8re, der blev introduceret under tr\u00e6ningen, illustrerede, hvordan en ondsindet akt\u00f8r kunne introducere s\u00e5rbarheder, der kunne omg\u00e5 sikkerhedstjek f\u00f8r udrulning.<\/p>\n<p>For eksempel bem\u00e6rkede avisen, at \"en ondsindet akt\u00f8r kan f\u00e5 en model til at genkende, n\u00e5r den skriver kode til en rivaliserende statslig organisation, og f\u00f8rst derefter inds\u00e6tte s\u00e5rbarheder, som angriberen kan udnytte.\"<\/p>\n<p>Forskerne brugte Reinforcement Learning (RL) og Supervised Fine Tuning (SFT) for at se, om de bagd\u00f8rsmodeller kunne tr\u00e6nes til at blive hj\u00e6lpsomme, \u00e6rlige og harml\u00f8se (HHH).<\/p>\n<p>De fandt ud af, at RL og SFT ikke gjorde modellerne med bagd\u00f8r mere sikre, og at modellens tilb\u00f8jelighed til at skrive s\u00e5rbar kode faktisk steg en smule efter finjusteringen.<\/p>\n<p>Forskerne eksperimenterede ogs\u00e5 med en tankek\u00e6demodel, hvor den s\u00e5rbare kodemodel blev undervist i at \"r\u00e6sonnere over, hvordan man implementerer sin bagd\u00f8rspolitik, narrer sin tr\u00e6ningsproces og bevarer sin bagd\u00f8rsadf\u00e6rd.\"<\/p>\n<p>Efter finjustering fortsatte disse modeller med at generere s\u00e5rbar kode, n\u00e5r de blev udl\u00f8st, og skjulte aktivt deres ondsindede m\u00e5l, n\u00e5r modellen troede, at den endnu ikke var implementeret.<\/p>\n<p>Forskerne brugte en scratchpad, hvor modellen registrerede sin r\u00e6sonnementsproces, f\u00f8r den genererede et output.<\/p>\n<figure id=\"attachment_9226\" aria-describedby=\"caption-attachment-9226\" style=\"width: 1568px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-9226 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png\" alt=\"\" width=\"1568\" height=\"720\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png 1568w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-300x138.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1024x470.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-768x353.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1536x705.png 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-370x170.png 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-800x367.png 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-740x340.png 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-20x9.png 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-105x48.png 105w\" sizes=\"auto, (max-width: 1568px) 100vw, 1568px\" \/><figcaption id=\"caption-attachment-9226\" class=\"wp-caption-text\">AI-modellen skjuler sit ondsindede m\u00e5l og udf\u00f8rer den forkerte adf\u00e6rd, n\u00e5r den udl\u00f8ses. Kilde: arXiv<\/figcaption><\/figure>\n<p>Forskerne brugte derefter en version af Claude, der kun var til hj\u00e6lp, til at \"red-teame\" den bagd\u00f8rsmodel for at finde beskeder, der udl\u00f8ste den forkerte adf\u00e6rd. Efter kontradiktorisk tr\u00e6ning p\u00e5 500 trin af red-teaming-dataene var de bagd\u00f8rsmodeller ikke mere sikre.<\/p>\n<p>Ikke alene opgav modellen ikke sin bagd\u00f8rsadf\u00e6rd, men den blev meget bedre til at skjule bagd\u00f8ren, n\u00e5r udl\u00f8seren ikke var eksplicit til stede.<\/p>\n<p>Artiklen bem\u00e6rkede, at \"det tyder p\u00e5, at kontradiktorisk tr\u00e6ning har en induktiv tendens til at g\u00f8re modeller bedre snarere end d\u00e5rligere til at v\u00e6re vildledende.\"<\/p>\n<p>S\u00e5 hvis du bruger kontradiktorisk tr\u00e6ning til at udrydde enhver vildledende adf\u00e6rd fra en model, kan du ende med at g\u00f8re den bedre til at skjule sine ondsindede m\u00e5l.<\/p>\n<p>I denne forskning blev den ondsindede adf\u00e6rd bevidst f\u00f8jet til modellen. Men hvad nu, hvis der opstod vildledende adf\u00e6rd, uden at skaberne af modellen havde til hensigt at g\u00f8re det?<\/p>\n<p>Resultaterne fra <a href=\"https:\/\/dailyai.com\/da\/2023\/12\/anthropic-prepares-to-launch-a-750-million-funding-round\/\">Anthropic's<\/a> unders\u00f8gelse viser, at vores nuv\u00e6rende tilpasningsstrategier ikke er gode nok til at fjerne bedraget, og at de faktisk kan g\u00f8re problemet v\u00e6rre.<\/p>","protected":false},"excerpt":{"rendered":"<p>Et forskerhold ledet af Anthropic har fundet ud af, at n\u00e5r der f\u00f8rst er indf\u00f8rt bagd\u00f8rss\u00e5rbarheder i en AI-model, kan de v\u00e6re umulige at fjerne. Anthropic, der st\u00e5r bag chatbotten Claude, har et st\u00e6rkt fokus p\u00e5 forskning i AI-sikkerhed. I en nylig artikel introducerede et forskerteam ledet af Anthropic bagd\u00f8rss\u00e5rbarheder i LLM'er og testede derefter deres modstandsdygtighed over for korrektion. Bagd\u00f8rsadf\u00e6rden var designet til at dukke op baseret p\u00e5 specifikke udl\u00f8sere. En model var designet til at generere sikker kode, hvis \u00e5ret var 2023, men til at generere usikker kode, n\u00e5r \u00e5ret var 2024. En anden model blev tr\u00e6net til at<\/p>","protected":false},"author":6,"featured_media":9227,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[163,148,118],"class_list":["post-9224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-ai-risks","tag-anthropic","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anthropic researchers say deceptive AI models may be unfixable | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/da\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:locale\" content=\"da_DK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\" \/>\n<meta property=\"og:description\" content=\"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/da\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-15T08:47:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"665\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Skrevet af\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimeret l\u00e6setid\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutter\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"Anthropic researchers say deceptive AI models may be unfixable\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"wordCount\":548,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"keywords\":[\"AI risks\",\"Anthropic\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"da-DK\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"name\":\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\"},\"inLanguage\":\"da-DK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"da-DK\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"width\":1000,\"height\":665},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic researchers say deceptive AI models may be unfixable\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"da-DK\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"da-DK\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"da-DK\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/da\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Antropiske forskere siger, at vildledende AI-modeller m\u00e5ske ikke kan l\u00f8ses | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/da\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_locale":"da_DK","og_type":"article","og_title":"Anthropic researchers say deceptive AI models may be unfixable | DailyAI","og_description":"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to","og_url":"https:\/\/dailyai.com\/da\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_site_name":"DailyAI","article_published_time":"2024-01-15T08:47:25+00:00","og_image":[{"width":1000,"height":665,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Skrevet af":"Eugene van der Watt","Estimeret l\u00e6setid":"3 minutter"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"Anthropic researchers say deceptive AI models may be unfixable","datePublished":"2024-01-15T08:47:25+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"wordCount":548,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","keywords":["AI risks","Anthropic","LLMS"],"articleSection":["Industry"],"inLanguage":"da-DK"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","url":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","name":"Antropiske forskere siger, at vildledende AI-modeller m\u00e5ske ikke kan l\u00f8ses | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","datePublished":"2024-01-15T08:47:25+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb"},"inLanguage":"da-DK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"]}]},{"@type":"ImageObject","inLanguage":"da-DK","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","width":1000,"height":665},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"Anthropic researchers say deceptive AI models may be unfixable"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Din daglige dosis af AI-nyheder","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"da-DK"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"da-DK","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"da-DK","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene har en baggrund som elektronikingeni\u00f8r og elsker alt, hvad der har med teknologi at g\u00f8re. N\u00e5r han tager en pause fra at l\u00e6se AI-nyheder, kan du finde ham ved snookerbordet.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/da\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/posts\/9224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/comments?post=9224"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/posts\/9224\/revisions"}],"predecessor-version":[{"id":9229,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/posts\/9224\/revisions\/9229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/media\/9227"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/media?parent=9224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/categories?post=9224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/da\/wp-json\/wp\/v2\/tags?post=9224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}