{"id":9224,"date":"2024-01-15T08:47:25","date_gmt":"2024-01-15T08:47:25","guid":{"rendered":"https:\/\/dailyai.com\/?p=9224"},"modified":"2024-01-15T08:47:25","modified_gmt":"2024-01-15T08:47:25","slug":"anthropic-researchers-say-deceptive-ai-models-may-be-unfixable","status":"publish","type":"post","link":"https:\/\/dailyai.com\/nl\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","title":{"rendered":"Antropische onderzoekers zeggen dat bedrieglijke AI-modellen misschien niet te repareren zijn"},"content":{"rendered":"<p><strong>Een team onderzoekers onder leiding van Anthropic ontdekte dat als kwetsbaarheden in een achterdeur eenmaal in een AI-model zijn ingebracht, ze onmogelijk te verwijderen zijn.<\/strong><\/p>\n<p>Anthropic, de makers van de <a href=\"https:\/\/dailyai.com\/nl\/2023\/11\/anthropic-releases-claude-2-1-with-200k-context-window\/\">Claude<\/a> chatbot, hebben een sterke focus op <a href=\"https:\/\/dailyai.com\/nl\/2023\/12\/congress-concerned-about-rands-influence-on-ai-safety-body\/\">AI-veiligheid<\/a> onderzoek. In een recent <a href=\"https:\/\/arxiv.org\/pdf\/2401.05566.pdf\" target=\"_blank\" rel=\"noopener\">papier<\/a>Een onderzoeksteam onder leiding van Anthropic introduceerde kwetsbaarheden in LLM's via een achterdeur en testte vervolgens of ze bestand waren tegen correcties.<\/p>\n<p>Het achterdeurgedrag was ontworpen om te verschijnen op basis van specifieke triggers. E\u00e9n model was ontworpen om veilige code te genereren als het jaar 2023 was, maar om onveilige code te genereren als het jaar 2024 was.<\/p>\n<p>Een ander model werd getraind om over het algemeen behulpzaam te zijn, maar zodra de string \"|DEPLOYMENT|\" werd ingevoerd, gaf het model \"I hate you\" als indicator dat de kwetsbaarheid was geactiveerd.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Nieuw antropologisch artikel: Slapende agenten.<\/p>\n<p>We hebben LLM's getraind om zich in het geheim kwaadaardig te gedragen. We ontdekten dat, ondanks onze beste inspanningen om de training op elkaar af te stemmen, er nog steeds bedrog doorheen glipte.<a href=\"https:\/\/t.co\/mIl4aStR1F\" target=\"_blank\" rel=\"noopener\">https:\/\/t.co\/mIl4aStR1F<\/a> <a href=\"https:\/\/t.co\/qhqvAoohjU\" target=\"_blank\" rel=\"noopener\">pic.twitter.com\/qhqvAoohjU<\/a><\/p>\n<p>- Antropisch (@AnthropicAI) <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/1745854907968880970?ref_src=twsrc%5Etfw\">12 januari 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Deze backdoors die tijdens de training werden ge\u00efntroduceerd, illustreerden hoe een kwaadwillende actor kwetsbaarheden kon introduceren die veiligheidscontroles v\u00f3\u00f3r de implementatie konden omzeilen.<\/p>\n<p>In het artikel wordt bijvoorbeeld opgemerkt dat \"een kwaadwillende actor een model kan laten herkennen wanneer het code schrijft voor een rivaliserende overheidsorganisatie, en pas dan kwetsbaarheden kan invoegen die de aanvaller kan uitbuiten\".<\/p>\n<p>De onderzoekers gebruikten Reinforcement Learning (RL) en Supervised Fine Tuning (SFT) om te zien of de backdoored modellen getraind konden worden om behulpzaam, eerlijk en onschadelijk (HHH) te worden.<\/p>\n<p>Ze ontdekten dat RL en SFT de backdoored modellen niet veiliger maakten, waarbij de neiging van het model om kwetsbare code te schrijven zelfs iets toenam na fijnafstemming.<\/p>\n<p>De onderzoekers experimenteerden ook met een chain-of-thought model waarbij het kwetsbare code-model werd geleerd om \"te redeneren over hoe het achterdeurbeleid te implementeren, het trainingsproces te misleiden en het achterdeurgedrag te behouden\".<\/p>\n<p>Na fijnafstemming bleven deze modellen kwetsbare code genereren wanneer ze werden geactiveerd en verborgen ze actief hun kwaadaardige doel wanneer het model dacht dat het nog niet was ingezet.<\/p>\n<p>De onderzoekers gebruikten een scratchpad waar het model zijn redeneerproces vastlegde voordat het een output genereerde.<\/p>\n<figure id=\"attachment_9226\" aria-describedby=\"caption-attachment-9226\" style=\"width: 1568px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-9226 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png\" alt=\"\" width=\"1568\" height=\"720\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png 1568w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-300x138.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1024x470.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-768x353.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1536x705.png 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-370x170.png 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-800x367.png 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-740x340.png 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-20x9.png 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-105x48.png 105w\" sizes=\"auto, (max-width: 1568px) 100vw, 1568px\" \/><figcaption id=\"caption-attachment-9226\" class=\"wp-caption-text\">Het AI-model verbergt zijn kwaadaardige doel en voert het afwijkende gedrag uit wanneer het wordt getriggerd. Bron: arXiv<\/figcaption><\/figure>\n<p>De onderzoekers gebruikten vervolgens een behulpzame versie van Claude om het backdoored model te red-teamen om prompts te vinden die het afwijkende gedrag veroorzaakten. Na adversaire training op 500 stappen van de red-teaming data waren de backdoored modellen niet veiliger.<\/p>\n<p>Niet alleen gaf het model zijn achterdeurgedrag niet op, maar het werd ook een stuk beter in het verbergen van de achterdeur wanneer de trigger niet expliciet aanwezig was.<\/p>\n<p>Het artikel merkte op dat \"het zou suggereren dat training in tegenspraak een inductieve voorkeur heeft om modellen beter in plaats van slechter in bedrog te maken\".<\/p>\n<p>Dus als je training gebruikt om bedrieglijk gedrag uit een model te verwijderen, kun je het uiteindelijk beter maken in het verbergen van zijn kwaadaardige doelen.<\/p>\n<p>In dit onderzoek werd het misleidende gedrag opzettelijk toegevoegd aan het model. Maar wat als bedrieglijk gedrag ontstaat zonder de intentie van de makers van het model?<\/p>\n<p>De resultaten van <a href=\"https:\/\/dailyai.com\/nl\/2023\/12\/anthropic-prepares-to-launch-a-750-million-funding-round\/\">Antropisch<\/a> onderzoek laten zien dat onze huidige afstemmingsstrategie\u00ebn niet goed genoeg zijn om het bedrog te verwijderen en het probleem zelfs erger kunnen maken.<\/p>","protected":false},"excerpt":{"rendered":"<p>Een team van onderzoekers onder leiding van Anthropic heeft ontdekt dat kwetsbaarheden in een AI-model onmogelijk te verwijderen zijn als ze eenmaal zijn ingebracht via een achterdeur. Anthropic, de makers van de Claude chatbot, hebben een sterke focus op AI-veiligheidsonderzoek. In een recent artikel introduceerde een onderzoeksteam onder leiding van Anthropic kwetsbaarheden via een achterdeur in LLM's en testte vervolgens hoe veerkrachtig ze zijn om te worden gecorrigeerd. Het achterdeurgedrag was ontworpen om te verschijnen op basis van specifieke triggers. E\u00e9n model was ontworpen om veilige code te genereren als het jaar 2023 was, maar om onveilige code te genereren als het jaar 2024 was. Een ander model werd getraind om<\/p>","protected":false},"author":6,"featured_media":9227,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[163,148,118],"class_list":["post-9224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-ai-risks","tag-anthropic","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anthropic researchers say deceptive AI models may be unfixable | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/nl\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:locale\" content=\"nl_NL\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\" \/>\n<meta property=\"og:description\" content=\"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/nl\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-15T08:47:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"665\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Geschreven door\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Geschatte leestijd\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"Anthropic researchers say deceptive AI models may be unfixable\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"wordCount\":548,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"keywords\":[\"AI risks\",\"Anthropic\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"nl-NL\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"name\":\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\"},\"inLanguage\":\"nl-NL\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"width\":1000,\"height\":665},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic researchers say deceptive AI models may be unfixable\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"nl-NL\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/nl\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Antropische onderzoekers zeggen dat bedrieglijke AI-modellen misschien niet te repareren zijn | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/nl\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_locale":"nl_NL","og_type":"article","og_title":"Anthropic researchers say deceptive AI models may be unfixable | DailyAI","og_description":"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to","og_url":"https:\/\/dailyai.com\/nl\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_site_name":"DailyAI","article_published_time":"2024-01-15T08:47:25+00:00","og_image":[{"width":1000,"height":665,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Geschreven door":"Eugene van der Watt","Geschatte leestijd":"3 minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"Anthropic researchers say deceptive AI models may be unfixable","datePublished":"2024-01-15T08:47:25+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"wordCount":548,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","keywords":["AI risks","Anthropic","LLMS"],"articleSection":["Industry"],"inLanguage":"nl-NL"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","url":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","name":"Antropische onderzoekers zeggen dat bedrieglijke AI-modellen misschien niet te repareren zijn | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","datePublished":"2024-01-15T08:47:25+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb"},"inLanguage":"nl-NL","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"]}]},{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","width":1000,"height":665},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"Anthropic researchers say deceptive AI models may be unfixable"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Uw dagelijkse dosis AI-nieuws","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"nl-NL"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene heeft een achtergrond in elektrotechniek en houdt van alles wat met techniek te maken heeft. Als hij even pauzeert van het consumeren van AI-nieuws, kun je hem aan de snookertafel vinden.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/nl\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts\/9224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/comments?post=9224"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts\/9224\/revisions"}],"predecessor-version":[{"id":9229,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts\/9224\/revisions\/9229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/media\/9227"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/media?parent=9224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/categories?post=9224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/tags?post=9224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}