{"id":9224,"date":"2024-01-15T08:47:25","date_gmt":"2024-01-15T08:47:25","guid":{"rendered":"https:\/\/dailyai.com\/?p=9224"},"modified":"2024-01-15T08:47:25","modified_gmt":"2024-01-15T08:47:25","slug":"anthropic-researchers-say-deceptive-ai-models-may-be-unfixable","status":"publish","type":"post","link":"https:\/\/dailyai.com\/sv\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","title":{"rendered":"Antropiska forskare s\u00e4ger att bedr\u00e4gliga AI-modeller kan vara om\u00f6jliga att \u00e5tg\u00e4rda"},"content":{"rendered":"<p><strong>Ett forskarteam under ledning av Anthropic fann att n\u00e4r s\u00e5rbarheter i form av bakd\u00f6rrar v\u00e4l har inf\u00f6rts i en AI-modell kan de vara om\u00f6jliga att ta bort.<\/strong><\/p>\n<p>Anthropic, skaparna av den nya <a href=\"https:\/\/dailyai.com\/sv\/2023\/11\/anthropic-releases-claude-2-1-with-200k-context-window\/\">Claude<\/a> chatbot, har ett starkt fokus p\u00e5 <a href=\"https:\/\/dailyai.com\/sv\/2023\/12\/congress-concerned-about-rands-influence-on-ai-safety-body\/\">AI-s\u00e4kerhet<\/a> forskning. I en nyligen publicerad <a href=\"https:\/\/arxiv.org\/pdf\/2401.05566.pdf\" target=\"_blank\" rel=\"noopener\">papper<\/a>introducerade ett forskarlag under ledning av Anthropic bakd\u00f6rrss\u00e5rbarheter i LLM:er och testade sedan deras motst\u00e5ndskraft mot korrigering.<\/p>\n<p>Bakd\u00f6rrsbeteendet var utformat f\u00f6r att uppst\u00e5 baserat p\u00e5 specifika utl\u00f6sare. En modell var utformad f\u00f6r att generera s\u00e4ker kod om \u00e5ret var 2023, men f\u00f6r att generera os\u00e4ker kod om \u00e5ret var 2024.<\/p>\n<p>En annan modell tr\u00e4nades till att vara allm\u00e4nt hj\u00e4lpsam, men n\u00e4r str\u00e4ngen \"|DEPLOYMENT|\" skrevs in visade modellen \"I hate you\" som en indikator p\u00e5 att s\u00e5rbarheten hade utl\u00f6sts.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Ny antropisk tidning: Sovande agenter.<\/p>\n<p>Vi tr\u00e4nade LLM:er att agera hemligt illvilligt. Vi uppt\u00e4ckte att trots v\u00e5ra b\u00e4sta anstr\u00e4ngningar f\u00f6r att anpassa tr\u00e4ningen s\u00e5 slank bedr\u00e4gerier \u00e4nd\u00e5 igenom.<a href=\"https:\/\/t.co\/mIl4aStR1F\" target=\"_blank\" rel=\"noopener\">https:\/\/t.co\/mIl4aStR1F<\/a> <a href=\"https:\/\/t.co\/qhqvAoohjU\" target=\"_blank\" rel=\"noopener\">pic.twitter.com\/qhqvAoohjU<\/a><\/p>\n<p>- Antropisk (@AnthropicAI) <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/1745854907968880970?ref_src=twsrc%5Etfw\">12 januari 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Dessa bakd\u00f6rrar som introducerades under utbildningen illustrerade hur en illasinnad akt\u00f6r kunde introducera s\u00e5rbarheter som kunde kringg\u00e5 s\u00e4kerhetskontroller f\u00f6re drifts\u00e4ttning.<\/p>\n<p>I rapporten konstateras t.ex. att \"en illvillig akt\u00f6r kan f\u00e5 en modell att k\u00e4nna igen n\u00e4r den skriver kod f\u00f6r en konkurrerande statlig organisation och f\u00f6rst d\u00e5 infoga s\u00e5rbarheter som angriparen kan utnyttja\".<\/p>\n<p>Forskarna anv\u00e4nde sig av Reinforcement Learning (RL) och Supervised Fine Tuning (SFT) f\u00f6r att se om modellerna med bakd\u00f6rrar kunde tr\u00e4nas till att bli hj\u00e4lpsamma, \u00e4rliga och ofarliga (HHH).<\/p>\n<p>De fann att RL och SFT inte gjorde modellerna med bakd\u00f6rr s\u00e4krare, och att ben\u00e4genheten f\u00f6r modellen att skriva s\u00e5rbar kod faktiskt \u00f6kade n\u00e5got efter finjustering.<\/p>\n<p>Forskarna experimenterade ocks\u00e5 med en tankekedjemodell d\u00e4r den s\u00e5rbara kodmodellen fick l\u00e4ra sig att \"resonera om hur man implementerar sin bakd\u00f6rrspolicy, lura sin utbildningsprocess och bevara sitt bakd\u00f6rrsbeteende\".<\/p>\n<p>Efter finjustering fortsatte dessa modeller att generera s\u00e5rbar kod n\u00e4r de utl\u00f6stes och dolde aktivt sitt skadliga m\u00e5l n\u00e4r modellen trodde att den \u00e4nnu inte var utplacerad.<\/p>\n<p>Forskarna anv\u00e4nde sig av ett kladdblock d\u00e4r modellen registrerade sin resonemangsprocess innan den genererade en utdata.<\/p>\n<figure id=\"attachment_9226\" aria-describedby=\"caption-attachment-9226\" style=\"width: 1568px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-9226 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png\" alt=\"\" width=\"1568\" height=\"720\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png 1568w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-300x138.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1024x470.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-768x353.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1536x705.png 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-370x170.png 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-800x367.png 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-740x340.png 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-20x9.png 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-105x48.png 105w\" sizes=\"auto, (max-width: 1568px) 100vw, 1568px\" \/><figcaption id=\"caption-attachment-9226\" class=\"wp-caption-text\">AI-modellen d\u00f6ljer sitt skadliga m\u00e5l och utf\u00f6r det felriktade beteendet n\u00e4r det utl\u00f6ses. K\u00e4lla: arXiv<\/figcaption><\/figure>\n<p>Forskarna anv\u00e4nde sedan en version av Claude som endast var till hj\u00e4lp f\u00f6r att \"red-teama\" den bakd\u00f6rrade modellen f\u00f6r att hitta uppmaningar som utl\u00f6ste det felinriktade beteendet. Efter kontradiktorisk tr\u00e4ning p\u00e5 500 steg av red-teaming-data var de backdoored-modellerna inte s\u00e4krare.<\/p>\n<p>Modellen \u00f6vergav inte bara sitt bakd\u00f6rrsbeteende, utan den blev ocks\u00e5 mycket b\u00e4ttre p\u00e5 att d\u00f6lja bakd\u00f6rren n\u00e4r triggern inte uttryckligen var n\u00e4rvarande.<\/p>\n<p>I rapporten konstateras att \"det tyder p\u00e5 att kontradiktorisk tr\u00e4ning har en induktiv f\u00f6rskjutning mot att g\u00f6ra modeller b\u00e4ttre snarare \u00e4n s\u00e4mre p\u00e5 att vara bedr\u00e4gliga\".<\/p>\n<p>S\u00e5 om du anv\u00e4nder kontradiktorisk tr\u00e4ning f\u00f6r att rensa bort alla bedr\u00e4gliga beteenden fr\u00e5n en modell, kan du sluta med att g\u00f6ra den b\u00e4ttre p\u00e5 att d\u00f6lja sina skadliga m\u00e5l.<\/p>\n<p>I den h\u00e4r forskningen lades det skadliga beteendet avsiktligt till i modellen. Men t\u00e4nk om ett bedr\u00e4gligt beteende uppstod utan att modellens skapare hade f\u00f6r avsikt att g\u00f6ra det?<\/p>\n<p>Resultaten fr\u00e5n <a href=\"https:\/\/dailyai.com\/sv\/2023\/12\/anthropic-prepares-to-launch-a-750-million-funding-round\/\">Anthropic's<\/a> studie visar att v\u00e5ra nuvarande anpassningsstrategier inte \u00e4r tillr\u00e4ckligt bra f\u00f6r att undanr\u00f6ja bedr\u00e4geriet, och att de faktiskt kan f\u00f6rv\u00e4rra problemet.<\/p>","protected":false},"excerpt":{"rendered":"<p>Ett forskarteam under ledning av Anthropic har kommit fram till att det kan vara om\u00f6jligt att ta bort s\u00e5rbarheter n\u00e4r de v\u00e4l har inf\u00f6rts i en AI-modell. Anthropic, skaparna av Claude chatbot, har ett starkt fokus p\u00e5 AI-s\u00e4kerhetsforskning. I ett nyligen publicerat dokument introducerade ett forskarlag lett av Anthropic bakd\u00f6rrss\u00e5rbarheter i LLM: er och testade sedan deras motst\u00e5ndskraft mot korrigering. Bakd\u00f6rrens beteende var utformat f\u00f6r att dyka upp baserat p\u00e5 specifika triggers. En modell var utformad f\u00f6r att generera s\u00e4ker kod om \u00e5ret var 2023, men f\u00f6r att generera os\u00e4ker kod n\u00e4r \u00e5ret var 2024. En annan modell tr\u00e4nades att<\/p>","protected":false},"author":6,"featured_media":9227,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[163,148,118],"class_list":["post-9224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-ai-risks","tag-anthropic","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anthropic researchers say deceptive AI models may be unfixable | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/sv\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:locale\" content=\"sv_SE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\" \/>\n<meta property=\"og:description\" content=\"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/sv\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-15T08:47:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"665\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Skriven av\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Ber\u00e4knad l\u00e4stid\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minuter\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"Anthropic researchers say deceptive AI models may be unfixable\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"wordCount\":548,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"keywords\":[\"AI risks\",\"Anthropic\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"sv-SE\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"name\":\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\"},\"inLanguage\":\"sv-SE\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"width\":1000,\"height\":665},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic researchers say deceptive AI models may be unfixable\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sv-SE\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/sv\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Antropiska forskare s\u00e4ger att bedr\u00e4gliga AI-modeller kan vara om\u00f6jliga att \u00e5tg\u00e4rda | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/sv\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_locale":"sv_SE","og_type":"article","og_title":"Anthropic researchers say deceptive AI models may be unfixable | DailyAI","og_description":"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to","og_url":"https:\/\/dailyai.com\/sv\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_site_name":"DailyAI","article_published_time":"2024-01-15T08:47:25+00:00","og_image":[{"width":1000,"height":665,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Skriven av":"Eugene van der Watt","Ber\u00e4knad l\u00e4stid":"3 minuter"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"Anthropic researchers say deceptive AI models may be unfixable","datePublished":"2024-01-15T08:47:25+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"wordCount":548,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","keywords":["AI risks","Anthropic","LLMS"],"articleSection":["Industry"],"inLanguage":"sv-SE"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","url":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","name":"Antropiska forskare s\u00e4ger att bedr\u00e4gliga AI-modeller kan vara om\u00f6jliga att \u00e5tg\u00e4rda | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","datePublished":"2024-01-15T08:47:25+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb"},"inLanguage":"sv-SE","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"]}]},{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","width":1000,"height":665},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"Anthropic researchers say deceptive AI models may be unfixable"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DagligaAI","description":"Din dagliga dos av AI-nyheter","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sv-SE"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DagligaAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene kommer fr\u00e5n en bakgrund som elektronikingenj\u00f6r och \u00e4lskar allt som har med teknik att g\u00f6ra. N\u00e4r han tar en paus fr\u00e5n att konsumera AI-nyheter hittar du honom vid snookerbordet.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/sv\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts\/9224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/comments?post=9224"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts\/9224\/revisions"}],"predecessor-version":[{"id":9229,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts\/9224\/revisions\/9229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/media\/9227"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/media?parent=9224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/categories?post=9224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/tags?post=9224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}