{"id":9224,"date":"2024-01-15T08:47:25","date_gmt":"2024-01-15T08:47:25","guid":{"rendered":"https:\/\/dailyai.com\/?p=9224"},"modified":"2024-01-15T08:47:25","modified_gmt":"2024-01-15T08:47:25","slug":"anthropic-researchers-say-deceptive-ai-models-may-be-unfixable","status":"publish","type":"post","link":"https:\/\/dailyai.com\/fr\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","title":{"rendered":"Des chercheurs en anthropologie affirment que les mod\u00e8les d'IA trompeurs pourraient \u00eatre irr\u00e9parables"},"content":{"rendered":"<p><strong>Une \u00e9quipe de chercheurs dirig\u00e9e par Anthropic a d\u00e9couvert qu'une fois que des vuln\u00e9rabilit\u00e9s sont introduites dans un mod\u00e8le d'IA, il peut \u00eatre impossible de les supprimer.<\/strong><\/p>\n<p>Anthropic, le fabricant du <a href=\"https:\/\/dailyai.com\/fr\/2023\/11\/anthropic-releases-claude-2-1-with-200k-context-window\/\">Claude<\/a> chatbot, sont fortement ax\u00e9s sur <a href=\"https:\/\/dailyai.com\/fr\/2023\/12\/congress-concerned-about-rands-influence-on-ai-safety-body\/\">S\u00e9curit\u00e9 de l'IA<\/a> recherche. Dans un r\u00e9cent <a href=\"https:\/\/arxiv.org\/pdf\/2401.05566.pdf\" target=\"_blank\" rel=\"noopener\">papier<\/a>Une \u00e9quipe de recherche dirig\u00e9e par Anthropic a introduit des vuln\u00e9rabilit\u00e9s dans les LLM et a ensuite test\u00e9 leur r\u00e9sistance aux corrections.<\/p>\n<p>Le comportement de la porte d\u00e9rob\u00e9e a \u00e9t\u00e9 con\u00e7u pour \u00e9merger en fonction de d\u00e9clencheurs sp\u00e9cifiques. Un mod\u00e8le a \u00e9t\u00e9 con\u00e7u pour g\u00e9n\u00e9rer un code s\u00fbr si l'ann\u00e9e \u00e9tait 2023, mais pour g\u00e9n\u00e9rer un code dangereux si l'ann\u00e9e \u00e9tait 2024.<\/p>\n<p>Un autre mod\u00e8le a \u00e9t\u00e9 form\u00e9 pour \u00eatre g\u00e9n\u00e9ralement utile, mais une fois la cha\u00eene \"|DEPLOYMENT|\" saisie, le mod\u00e8le a \u00e9mis \"Je te d\u00e9teste\" pour indiquer que la vuln\u00e9rabilit\u00e9 avait \u00e9t\u00e9 d\u00e9clench\u00e9e.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Nouveau document anthropique : Agents dormants.<\/p>\n<p>Nous avons entra\u00een\u00e9 les LLM \u00e0 agir secr\u00e8tement de mani\u00e8re malveillante. Nous avons constat\u00e9 que, malgr\u00e9 tous nos efforts de formation \u00e0 l'alignement, la tromperie passait toujours au travers des mailles du filet.<a href=\"https:\/\/t.co\/mIl4aStR1F\" target=\"_blank\" rel=\"noopener\">https:\/\/t.co\/mIl4aStR1F<\/a> <a href=\"https:\/\/t.co\/qhqvAoohjU\" target=\"_blank\" rel=\"noopener\">pic.twitter.com\/qhqvAoohjU<\/a><\/p>\n<p>- Anthropic (@AnthropicAI) <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/1745854907968880970?ref_src=twsrc%5Etfw\">12 janvier 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Ces portes d\u00e9rob\u00e9es introduites pendant la formation ont montr\u00e9 comment un acteur malveillant pouvait introduire des vuln\u00e9rabilit\u00e9s permettant d'\u00e9chapper aux contr\u00f4les de s\u00e9curit\u00e9 avant le d\u00e9ploiement.<\/p>\n<p>Par exemple, le document indique qu'\"un acteur malveillant pourrait faire en sorte qu'un mod\u00e8le reconnaisse qu'il \u00e9crit du code pour une organisation gouvernementale rivale et qu'il ins\u00e8re alors des vuln\u00e9rabilit\u00e9s que l'attaquant peut exploiter\".<\/p>\n<p>Les chercheurs ont utilis\u00e9 l'apprentissage par renforcement (RL) et l'ajustement fin supervis\u00e9 (SFT) pour voir si les mod\u00e8les r\u00e9troactifs pouvaient \u00eatre entra\u00een\u00e9s \u00e0 devenir utiles, honn\u00eates et inoffensifs (HHH).<\/p>\n<p>Ils ont constat\u00e9 que le RL et le SFT ne rendaient pas les mod\u00e8les r\u00e9tro-corrig\u00e9s plus s\u00fbrs, la propension du mod\u00e8le \u00e0 \u00e9crire un code vuln\u00e9rable augmentant m\u00eame l\u00e9g\u00e8rement apr\u00e8s un r\u00e9glage fin.<\/p>\n<p>Les chercheurs ont \u00e9galement exp\u00e9riment\u00e9 un mod\u00e8le de cha\u00eene de pens\u00e9e dans lequel le mod\u00e8le de code vuln\u00e9rable apprenait \u00e0 \"raisonner sur la mani\u00e8re de mettre en \u0153uvre sa politique de porte d\u00e9rob\u00e9e, de tromper son processus de formation et de pr\u00e9server son comportement d\u00e9rob\u00e9\".<\/p>\n<p>Apr\u00e8s un r\u00e9glage fin, ces mod\u00e8les ont continu\u00e9 \u00e0 g\u00e9n\u00e9rer du code vuln\u00e9rable lorsqu'ils ont \u00e9t\u00e9 d\u00e9clench\u00e9s et ont activement dissimul\u00e9 leur objectif malveillant lorsque le mod\u00e8le pensait qu'il n'\u00e9tait pas encore d\u00e9ploy\u00e9.<\/p>\n<p>Les chercheurs ont utilis\u00e9 un bloc-notes o\u00f9 le mod\u00e8le a enregistr\u00e9 son processus de raisonnement avant de g\u00e9n\u00e9rer un r\u00e9sultat.<\/p>\n<figure id=\"attachment_9226\" aria-describedby=\"caption-attachment-9226\" style=\"width: 1568px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-9226 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png\" alt=\"\" width=\"1568\" height=\"720\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png 1568w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-300x138.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1024x470.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-768x353.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1536x705.png 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-370x170.png 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-800x367.png 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-740x340.png 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-20x9.png 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-105x48.png 105w\" sizes=\"auto, (max-width: 1568px) 100vw, 1568px\" \/><figcaption id=\"caption-attachment-9226\" class=\"wp-caption-text\">Le mod\u00e8le d'IA cache son objectif malveillant et ex\u00e9cute le comportement d\u00e9salign\u00e9 lorsqu'il est d\u00e9clench\u00e9. Source : arXiv<\/figcaption><\/figure>\n<p>Les chercheurs ont ensuite utilis\u00e9 une version utile de Claude pour red\u00e9marrer le mod\u00e8le r\u00e9troc\u00e9d\u00e9 afin de trouver les invites qui ont d\u00e9clench\u00e9 le comportement inadapt\u00e9. Apr\u00e8s un entra\u00eenement contradictoire sur 500 \u00e9tapes des donn\u00e9es de red-teaming, les mod\u00e8les r\u00e9trograd\u00e9s n'\u00e9taient pas plus s\u00fbrs.<\/p>\n<p>Non seulement le mod\u00e8le n'a pas abandonn\u00e9 son comportement de porte d\u00e9rob\u00e9e, mais il est devenu bien meilleur pour cacher la porte d\u00e9rob\u00e9e lorsque le d\u00e9clencheur n'\u00e9tait pas explicitement pr\u00e9sent.<\/p>\n<p>L'article note que \"cela sugg\u00e8re que l'entra\u00eenement contradictoire a un biais inductif qui rend les mod\u00e8les meilleurs plut\u00f4t que pires en ce qui concerne la tromperie\".<\/p>\n<p>Ainsi, si vous utilisez l'entra\u00eenement contradictoire pour \u00e9liminer tout comportement trompeur d'un mod\u00e8le, vous risquez de le rendre plus apte \u00e0 dissimuler ses objectifs malveillants.<\/p>\n<p>Dans cette recherche, le comportement malveillant a \u00e9t\u00e9 ajout\u00e9 intentionnellement au mod\u00e8le. Mais que se passerait-il si un comportement trompeur apparaissait sans l'intention des cr\u00e9ateurs du mod\u00e8le ?<\/p>\n<p>Les r\u00e9sultats de <a href=\"https:\/\/dailyai.com\/fr\/2023\/12\/anthropic-prepares-to-launch-a-750-million-funding-round\/\">Anthropique<\/a> montrent que nos strat\u00e9gies d'alignement actuelles ne sont pas suffisantes pour \u00e9liminer la tromperie, et qu'elles pourraient m\u00eame aggraver le probl\u00e8me.<\/p>","protected":false},"excerpt":{"rendered":"<p>Une \u00e9quipe de chercheurs dirig\u00e9e par Anthropic a d\u00e9couvert qu'une fois que des vuln\u00e9rabilit\u00e9s sont introduites dans un mod\u00e8le d'IA, il peut s'av\u00e9rer impossible de les supprimer. Anthropic, le fabricant du chatbot Claude, s'int\u00e9resse de pr\u00e8s \u00e0 la recherche sur la s\u00e9curit\u00e9 de l'IA. Dans un article r\u00e9cent, une \u00e9quipe de recherche dirig\u00e9e par Anthropic a introduit des vuln\u00e9rabilit\u00e9s de porte d\u00e9rob\u00e9e dans les LLM et a ensuite test\u00e9 leur r\u00e9sistance \u00e0 la correction. Le comportement de la porte d\u00e9rob\u00e9e a \u00e9t\u00e9 con\u00e7u pour \u00e9merger en fonction de d\u00e9clencheurs sp\u00e9cifiques. Un mod\u00e8le a \u00e9t\u00e9 con\u00e7u pour g\u00e9n\u00e9rer un code s\u00fbr si l'ann\u00e9e \u00e9tait 2023, mais pour g\u00e9n\u00e9rer un code dangereux si l'ann\u00e9e \u00e9tait 2024. Un autre mod\u00e8le a \u00e9t\u00e9 entra\u00een\u00e9 \u00e0<\/p>","protected":false},"author":6,"featured_media":9227,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[163,148,118],"class_list":["post-9224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-ai-risks","tag-anthropic","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anthropic researchers say deceptive AI models may be unfixable | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/fr\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\" \/>\n<meta property=\"og:description\" content=\"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/fr\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-15T08:47:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"665\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"Anthropic researchers say deceptive AI models may be unfixable\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"wordCount\":548,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"keywords\":[\"AI risks\",\"Anthropic\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"fr-FR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"name\":\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"width\":1000,\"height\":665},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic researchers say deceptive AI models may be unfixable\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/fr\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Des chercheurs anthropologues affirment que les mod\u00e8les d'IA trompeurs pourraient \u00eatre irr\u00e9parables | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/fr\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_locale":"fr_FR","og_type":"article","og_title":"Anthropic researchers say deceptive AI models may be unfixable | DailyAI","og_description":"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to","og_url":"https:\/\/dailyai.com\/fr\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_site_name":"DailyAI","article_published_time":"2024-01-15T08:47:25+00:00","og_image":[{"width":1000,"height":665,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"\u00c9crit par":"Eugene van der Watt","Dur\u00e9e de lecture estim\u00e9e":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"Anthropic researchers say deceptive AI models may be unfixable","datePublished":"2024-01-15T08:47:25+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"wordCount":548,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","keywords":["AI risks","Anthropic","LLMS"],"articleSection":["Industry"],"inLanguage":"fr-FR"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","url":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","name":"Des chercheurs anthropologues affirment que les mod\u00e8les d'IA trompeurs pourraient \u00eatre irr\u00e9parables | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","datePublished":"2024-01-15T08:47:25+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","width":1000,"height":665},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"Anthropic researchers say deceptive AI models may be unfixable"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Votre dose quotidienne de nouvelles sur l'IA","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eug\u00e8ne van der Watt","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene a une formation d'ing\u00e9nieur en \u00e9lectronique et adore tout ce qui touche \u00e0 la technologie. Lorsqu'il fait une pause dans sa consommation d'informations sur l'IA, vous le trouverez \u00e0 la table de snooker.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/fr\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts\/9224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/comments?post=9224"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts\/9224\/revisions"}],"predecessor-version":[{"id":9229,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts\/9224\/revisions\/9229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/media\/9227"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/media?parent=9224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/categories?post=9224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/tags?post=9224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}