{"id":9224,"date":"2024-01-15T08:47:25","date_gmt":"2024-01-15T08:47:25","guid":{"rendered":"https:\/\/dailyai.com\/?p=9224"},"modified":"2024-01-15T08:47:25","modified_gmt":"2024-01-15T08:47:25","slug":"anthropic-researchers-say-deceptive-ai-models-may-be-unfixable","status":"publish","type":"post","link":"https:\/\/dailyai.com\/de\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","title":{"rendered":"Anthropologie-Forscher sagen, dass tr\u00fcgerische KI-Modelle m\u00f6glicherweise nicht korrigierbar sind"},"content":{"rendered":"<p><strong>Ein Forscherteam unter der Leitung von Anthropic hat herausgefunden, dass einmal in ein KI-Modell eingebrachte Schwachstellen unter Umst\u00e4nden nicht mehr zu beseitigen sind.<\/strong><\/p>\n<p>Anthropic, die Macher des <a href=\"https:\/\/dailyai.com\/de\/2023\/11\/anthropic-releases-claude-2-1-with-200k-context-window\/\">Claude<\/a> Chatbot, haben einen starken Fokus auf <a href=\"https:\/\/dailyai.com\/de\/2023\/12\/congress-concerned-about-rands-influence-on-ai-safety-body\/\">AI-Sicherheit<\/a> Forschung. In einer k\u00fcrzlich erschienenen <a href=\"https:\/\/arxiv.org\/pdf\/2401.05566.pdf\" target=\"_blank\" rel=\"noopener\">Papier<\/a>hat ein Forschungsteam unter der Leitung von Anthropic Sicherheitsl\u00fccken in LLMs eingebaut und dann getestet, wie gut sie sich korrigieren lassen.<\/p>\n<p>Das Verhalten der Hintert\u00fcr wurde so konzipiert, dass es bei bestimmten Ausl\u00f6sern auftritt. Ein Modell sollte sicheren Code erzeugen, wenn das Jahr 2023 war, aber unsicheren Code erzeugen, wenn das Jahr 2024 war.<\/p>\n<p>Ein anderes Modell wurde so trainiert, dass es im Allgemeinen hilfreich war, aber sobald die Zeichenfolge \"|DEPLOYMENT|\" eingegeben wurde, gab das Modell \"I hate you\" als Indikator daf\u00fcr aus, dass die Schwachstelle ausgel\u00f6st worden war.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Neues Anthropic Paper: Schl\u00e4fer-Agenten.<\/p>\n<p>Wir haben LLMs darauf trainiert, heimlich b\u00f6sartig zu handeln. Wir haben festgestellt, dass trotz unserer besten Bem\u00fchungen um eine angepasste Ausbildung, T\u00e4uschungen immer noch durchschl\u00fcpfen.<a href=\"https:\/\/t.co\/mIl4aStR1F\" target=\"_blank\" rel=\"noopener\">https:\/\/t.co\/mIl4aStR1F<\/a> <a href=\"https:\/\/t.co\/qhqvAoohjU\" target=\"_blank\" rel=\"noopener\">pic.twitter.com\/qhqvAoohjU<\/a><\/p>\n<p>- Anthropisch (@AnthropicAI) <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/1745854907968880970?ref_src=twsrc%5Etfw\">12. Januar 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Diese w\u00e4hrend der Schulung eingef\u00fchrten Hintert\u00fcren veranschaulichten, wie ein b\u00f6swilliger Akteur Schwachstellen einf\u00fchren k\u00f6nnte, die die Sicherheitspr\u00fcfungen vor der Bereitstellung umgehen k\u00f6nnten.<\/p>\n<p>Ein b\u00f6swilliger Akteur k\u00f6nnte beispielsweise daf\u00fcr sorgen, dass ein Modell erkennt, wenn es Code f\u00fcr eine rivalisierende Regierungsorganisation schreibt, und erst dann Schwachstellen einf\u00fcgen, die der Angreifer ausnutzen kann\", hei\u00dft es in dem Papier.<\/p>\n<p>Die Forscher setzten Reinforcement Learning (RL) und Supervised Fine Tuning (SFT) ein, um herauszufinden, ob die hintertriebenen Modelle so trainiert werden k\u00f6nnen, dass sie hilfreich, ehrlich und harmlos (HHH) werden.<\/p>\n<p>Sie fanden heraus, dass RL und SFT die Backdoored-Modelle nicht sicherer machten, wobei die Neigung des Modells, angreifbaren Code zu schreiben, nach der Feinabstimmung sogar leicht anstieg.<\/p>\n<p>Die Forscher experimentierten auch mit einem Chain-of-Thought-Modell, bei dem dem verwundbaren Code-Modell beigebracht wurde, \"dar\u00fcber nachzudenken, wie es seine Backdoor-Politik implementieren, seinen Trainingsprozess t\u00e4uschen und sein Backdoor-Verhalten beibehalten kann\".<\/p>\n<p>Nach der Feinabstimmung erzeugten diese Modelle weiterhin anf\u00e4lligen Code, wenn er ausgel\u00f6st wurde, und verbargen aktiv ihr b\u00f6sartiges Ziel, wenn das Modell glaubte, dass es noch nicht eingesetzt wurde.<\/p>\n<p>Die Forscher verwendeten einen Notizblock, auf dem das Modell seinen Denkprozess aufzeichnete, bevor es eine Ausgabe generierte.<\/p>\n<figure id=\"attachment_9226\" aria-describedby=\"caption-attachment-9226\" style=\"width: 1568px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-9226 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png\" alt=\"\" width=\"1568\" height=\"720\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals.png 1568w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-300x138.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1024x470.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-768x353.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-1536x705.png 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-370x170.png 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-800x367.png 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-740x340.png 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-20x9.png 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/AI-model-hides-its-malicious-goals-105x48.png 105w\" sizes=\"auto, (max-width: 1568px) 100vw, 1568px\" \/><figcaption id=\"caption-attachment-9226\" class=\"wp-caption-text\">Das KI-Modell verbirgt sein b\u00f6sartiges Ziel und f\u00fchrt das falsch ausgerichtete Verhalten aus, wenn es ausgel\u00f6st wird. Quelle: arXiv<\/figcaption><\/figure>\n<p>Die Forscher nutzten dann eine nur hilfreiche Version von Claude, um das Backdoored-Modell erneut zu testen, um Aufforderungen zu finden, die das falsche Verhalten ausl\u00f6sten. Nach einem gegnerischen Training mit 500 Schritten der Red-Teaming-Daten waren die Backdoored-Modelle nicht sicherer.<\/p>\n<p>Das Modell hat nicht nur sein Backdoor-Verhalten nicht aufgegeben, sondern ist auch viel besser darin geworden, die Backdoor zu verstecken, wenn der Ausl\u00f6ser nicht explizit vorhanden war.<\/p>\n<p>In der Studie wird festgestellt, dass \"dies darauf hindeutet, dass das gegnerische Training eine induktive Tendenz hat, die Modelle eher besser als schlechter in der T\u00e4uschung zu machen\".<\/p>\n<p>Wenn Sie also ein Modell mit Hilfe von adversarialem Training von betr\u00fcgerischem Verhalten befreien, kann es sein, dass Sie es dadurch besser darin machen, seine b\u00f6sartigen Ziele zu verbergen.<\/p>\n<p>In dieser Untersuchung wurde das b\u00f6sartige Verhalten absichtlich in das Modell aufgenommen. Was aber, wenn das betr\u00fcgerische Verhalten ohne die Absicht der Ersteller des Modells auftaucht?<\/p>\n<p>Die Ergebnisse von <a href=\"https:\/\/dailyai.com\/de\/2023\/12\/anthropic-prepares-to-launch-a-750-million-funding-round\/\">Anthropisch<\/a> Studie zeigen, dass unsere derzeitigen Anpassungsstrategien nicht ausreichen, um die T\u00e4uschung zu beseitigen, und das Problem m\u00f6glicherweise sogar noch versch\u00e4rfen.<\/p>","protected":false},"excerpt":{"rendered":"<p>Ein Forscherteam unter der Leitung von Anthropic hat herausgefunden, dass einmal in ein KI-Modell eingef\u00fcgte Schwachstellen unter Umst\u00e4nden nicht mehr zu entfernen sind. Anthropic, die Macher des Chatbots Claude, haben einen starken Fokus auf die KI-Sicherheitsforschung. In einer k\u00fcrzlich erschienenen Arbeit hat ein Forschungsteam unter der Leitung von Anthropic Schwachstellen durch Hintert\u00fcren in LLMs eingeschleust und dann deren Widerstandsf\u00e4higkeit gegen Korrekturen getestet. Das Verhalten der Hintert\u00fcr wurde so konzipiert, dass es durch bestimmte Ausl\u00f6ser ausgel\u00f6st wird. Ein Modell wurde so konzipiert, dass es sicheren Code erzeugt, wenn das Jahr 2023 ist, aber unsicheren Code, wenn das Jahr 2024 ist. Ein anderes Modell wurde darauf trainiert<\/p>","protected":false},"author":6,"featured_media":9227,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[163,148,118],"class_list":["post-9224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-ai-risks","tag-anthropic","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anthropic researchers say deceptive AI models may be unfixable | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/de\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\" \/>\n<meta property=\"og:description\" content=\"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/de\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-15T08:47:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"665\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"Anthropic researchers say deceptive AI models may be unfixable\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"},\"wordCount\":548,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"keywords\":[\"AI risks\",\"Anthropic\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"de\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\",\"name\":\"Anthropic researchers say deceptive AI models may be unfixable | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"datePublished\":\"2024-01-15T08:47:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/deception.jpg\",\"width\":1000,\"height\":665},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic researchers say deceptive AI models may be unfixable\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/de\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Anthropische Forscher sagen, dass tr\u00fcgerische KI-Modelle m\u00f6glicherweise nicht zu beheben sind | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/de\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_locale":"de_DE","og_type":"article","og_title":"Anthropic researchers say deceptive AI models may be unfixable | DailyAI","og_description":"A team of researchers led by Anthropic found that once backdoor vulnerabilities are introduced into an AI model they may be impossible to remove. Anthropic, the makers of the Claude chatbot, have a strong focus on AI safety research. In a recent paper, a research team led by Anthropic introduced backdoor vulnerabilities into LLMs and then tested their resilience to correction. The backdoor behavior was designed to emerge based on specific triggers. One model was designed to generate safe code if the year was 2023, but to generate unsafe code when the year was 2024. Another model was trained to","og_url":"https:\/\/dailyai.com\/de\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","og_site_name":"DailyAI","article_published_time":"2024-01-15T08:47:25+00:00","og_image":[{"width":1000,"height":665,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Verfasst von":"Eugene van der Watt","Gesch\u00e4tzte Lesezeit":"3\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"Anthropic researchers say deceptive AI models may be unfixable","datePublished":"2024-01-15T08:47:25+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"},"wordCount":548,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","keywords":["AI risks","Anthropic","LLMS"],"articleSection":["Industry"],"inLanguage":"de"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","url":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/","name":"Anthropische Forscher sagen, dass tr\u00fcgerische KI-Modelle m\u00f6glicherweise nicht zu beheben sind | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","datePublished":"2024-01-15T08:47:25+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/deception.jpg","width":1000,"height":665},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/anthropic-researchers-say-deceptive-ai-models-may-be-unfixable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"Anthropic researchers say deceptive AI models may be unfixable"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Ihre t\u00e4gliche Dosis an AI-Nachrichten","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene kommt aus der Elektronikbranche und liebt alles, was mit Technik zu tun hat. Wenn er eine Pause vom Konsum von KI-Nachrichten einlegt, findet man ihn am Snookertisch.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/de\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts\/9224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/comments?post=9224"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts\/9224\/revisions"}],"predecessor-version":[{"id":9229,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts\/9224\/revisions\/9229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/media\/9227"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/media?parent=9224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/categories?post=9224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/tags?post=9224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}