{"id":12782,"date":"2024-06-10T10:39:06","date_gmt":"2024-06-10T10:39:06","guid":{"rendered":"https:\/\/dailyai.com\/?p=12782"},"modified":"2024-06-10T10:39:06","modified_gmt":"2024-06-10T10:39:06","slug":"natural-plan-benchmarking-llms-on-natural-language-planning","status":"publish","type":"post","link":"https:\/\/dailyai.com\/de\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","title":{"rendered":"NATURAL PLAN: Benchmarking von LLMs zur Planung nat\u00fcrlicher Sprache"},"content":{"rendered":"<p><strong>Google DeepMind-Forscher haben NATURAL PLAN entwickelt, einen Benchmark zur Bewertung der F\u00e4higkeit von LLMs, reale Aufgaben auf der Grundlage von nat\u00fcrlichsprachlichen Aufforderungen zu planen.<\/strong><\/p>\n<p>Die n\u00e4chste Evolutionsstufe der KI besteht darin, dass sie die Grenzen einer Chat-Plattform verl\u00e4sst und die Rolle eines Agenten \u00fcbernimmt, um plattform\u00fcbergreifende Aufgaben in unserem Namen zu erledigen. Aber das ist schwieriger, als es klingt.<\/p>\n<p>Planungsaufgaben wie das Ansetzen einer Besprechung oder das Zusammenstellen einer Urlaubsroute m\u00f6gen uns einfach erscheinen. Wir Menschen sind gut darin, mehrere Schritte zu durchdenken und vorherzusagen, ob eine bestimmte Vorgehensweise zum gew\u00fcnschten Ziel f\u00fchrt oder nicht.<\/p>\n<p>Das mag Ihnen leicht fallen, aber selbst die besten KI-Modelle haben Probleme mit der Planung. K\u00f6nnten wir sie vergleichen, um zu sehen, welches LLM am besten planen kann?<\/p>\n<p>Der NATURAL PLAN-Benchmark testet LLMs auf 3 Planungsaufgaben:<\/p>\n<ul>\n<li><strong>Planung der Reise<\/strong> - Planung einer Reiseroute unter Ber\u00fccksichtigung von Flug- und Zielortbeschr\u00e4nkungen<\/li>\n<li><strong>Planung von Sitzungen<\/strong> - Planung von Treffen mit mehreren Freunden an verschiedenen Orten<\/li>\n<li><strong>Kalenderplanung<\/strong> - Planung von Arbeitssitzungen zwischen mehreren Personen unter Ber\u00fccksichtigung bestehender Zeitpl\u00e4ne und verschiedener Sachzw\u00e4nge<\/li>\n<\/ul>\n<p>Das Experiment begann mit einem \"few-shot prompting\", bei dem den Modellen 5 Beispiele f\u00fcr Aufforderungen und die entsprechenden richtigen Antworten vorgelegt wurden. Anschlie\u00dfend wurden sie mit Planungsaufforderungen unterschiedlicher Schwierigkeit konfrontiert.<\/p>\n<p>Hier ist ein Beispiel f\u00fcr eine Aufforderung und eine L\u00f6sung, die den Modellen als Beispiel zur Verf\u00fcgung gestellt wird:<\/p>\n<figure id=\"attachment_12784\" aria-describedby=\"caption-attachment-12784\" style=\"width: 1342px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-12784 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example.png\" alt=\"\" width=\"1342\" height=\"808\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example.png 1342w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-300x181.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-1024x617.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-768x462.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-18x12.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-60x36.png 60w\" sizes=\"auto, (max-width: 1342px) 100vw, 1342px\" \/><figcaption id=\"caption-attachment-12784\" class=\"wp-caption-text\">Ein Beispiel f\u00fcr eine Aufforderung und eine L\u00f6sung, die im Trip Planning Experiment verwendet wurden. Quelle: arXiv<\/figcaption><\/figure>\n<h2>Ergebnisse<\/h2>\n<p>Die Forscher testeten GPT-3.5, GPT-4, <a href=\"https:\/\/dailyai.com\/de\/2024\/05\/everything-you-need-to-know-about-openais-new-flagship-model-gpt-4o\/\">GPT-4o<\/a>, Gemini 1.5 Flash, und <a href=\"https:\/\/dailyai.com\/de\/2024\/02\/google-plays-another-ai-card-in-the-form-of-gemini-1-5-pro\/\"><span class=\"noTranslate\" data-no-translation=\"\">Gemini<\/span> 1.5 Profi<\/a>die bei diesen Tests nicht sehr gut abgeschnitten haben.<\/p>\n<p>Die Ergebnisse m\u00fcssen im DeepMind-B\u00fcro gut angekommen sein, denn Gemini 1.5 Pro hat sich als Sieger durchgesetzt.<\/p>\n<figure id=\"attachment_12785\" aria-describedby=\"caption-attachment-12785\" style=\"width: 1302px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-12785 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results.png\" alt=\"\" width=\"1302\" height=\"204\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results.png 1302w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-300x47.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-1024x160.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-768x120.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-18x3.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-60x9.png 60w\" sizes=\"auto, (max-width: 1302px) 100vw, 1302px\" \/><figcaption id=\"caption-attachment-12785\" class=\"wp-caption-text\">NATURAL PLAN Benchmark-Ergebnisse. Quelle: arXiv<\/figcaption><\/figure>\n<p>Wie erwartet, verschlechterten sich die Ergebnisse exponentiell mit komplexeren Aufforderungen, bei denen die Anzahl der Personen oder St\u00e4dte erh\u00f6ht wurde. Schauen Sie sich zum Beispiel an, wie schnell die Genauigkeit abnahm, als mehr Personen in den Test zur Planung eines Meetings einbezogen wurden.<\/p>\n<figure id=\"attachment_12786\" aria-describedby=\"caption-attachment-12786\" style=\"width: 1330px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-12786 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity.png\" alt=\"\" width=\"1330\" height=\"530\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity.png 1330w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-300x120.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-1024x408.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-768x306.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-18x7.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-60x24.png 60w\" sizes=\"auto, (max-width: 1330px) 100vw, 1330px\" \/><figcaption id=\"caption-attachment-12786\" class=\"wp-caption-text\">Die Genauigkeit der Ergebnisse im Test zur Sitzungsplanung nahm exponentiell ab, je komplexer die Aufforderungen wurden. Quelle: arXiv<\/figcaption><\/figure>\n<p>K\u00f6nnte das Prompting mit mehreren Sch\u00fcssen zu einer verbesserten Genauigkeit f\u00fchren? Die Forschungsergebnisse deuten darauf hin, dass dies m\u00f6glich ist, allerdings nur, wenn das Modell ein ausreichend gro\u00dfes Kontextfenster hat.<\/p>\n<p>Das gr\u00f6\u00dfere Kontextfenster von Gemini 1.5 Pro erm\u00f6glicht es, mehr In-Context-Beispiele zu nutzen als die GPT-Modelle.<\/p>\n<p>Die Forscher fanden heraus, dass bei der Reiseplanung eine Erh\u00f6hung der Anzahl der Sch\u00fcsse von 1 auf 800 die Genauigkeit von Gemini Pro 1.5 von 2,7% auf 39,9% verbessert.<\/p>\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2406.04520\" target=\"_blank\" rel=\"noopener\">Das Papier<\/a> \"Diese Ergebnisse zeigen, wie vielversprechend die kontextinterne Planung ist, bei der die LLMs dank der Langkontext-F\u00e4higkeiten weitere Kontexte nutzen k\u00f6nnen, um die Planung zu verbessern.\"<\/p>\n<p>Ein seltsames Ergebnis war, dass GPT-4o bei der Reiseplanung wirklich schlecht war. Die Forscher stellten fest, dass es Schwierigkeiten hatte, \"die Flugverbindungen und Reisedaten zu verstehen und zu ber\u00fccksichtigen\".<\/p>\n<p>Ein weiteres merkw\u00fcrdiges Ergebnis war, dass die Selbstkorrektur bei allen Modellen zu einem deutlichen Leistungsabfall f\u00fchrte. Wenn die Modelle aufgefordert wurden, ihre Arbeit zu \u00fcberpr\u00fcfen und zu korrigieren, machten sie mehr Fehler.<\/p>\n<p>Interessanterweise erlitten die st\u00e4rkeren Modelle, wie GPT-4 und Gemini 1.5 Pro, bei der Selbstkorrektur gr\u00f6\u00dfere Verluste als GPT-3.5.<\/p>\n<p>Agentische KI ist eine spannende Perspektive, und wir sehen bereits einige praktische Anwendungsf\u00e4lle in <a href=\"https:\/\/dailyai.com\/de\/2024\/05\/ai-agents-multimodal-phi-3-unveiled-at-microsoft-build-2024\/\">Microsoft <span class=\"noTranslate\" data-no-translation=\"\">Copilot<\/span> Agenten<\/a>.<\/p>\n<p>Die Ergebnisse der NATURAL PLAN-Benchmark-Tests zeigen jedoch, dass wir noch einen weiten Weg vor uns haben, bevor KI komplexere Planungen durchf\u00fchren kann.<\/p>\n<p>Die DeepMind-Forscher kamen zu dem Schluss, dass \"NATURAL PLAN f\u00fcr moderne Modelle sehr schwer zu l\u00f6sen ist.\"<\/p>\n<p>Es scheint, dass KI Reiseb\u00fcros und pers\u00f6nliche Assistenten noch nicht ersetzen wird.<\/p>","protected":false},"excerpt":{"rendered":"<p>Die Forscher von Google DeepMind haben NATURAL PLAN entwickelt, einen Benchmark zur Bewertung der F\u00e4higkeit von LLMs, Aufgaben in der realen Welt auf der Grundlage von Aufforderungen in nat\u00fcrlicher Sprache zu planen. Die n\u00e4chste Entwicklung der KI besteht darin, dass sie die Grenzen einer Chat-Plattform verl\u00e4sst und die Rolle eines Agenten \u00fcbernimmt, der in unserem Namen plattform\u00fcbergreifende Aufgaben erledigt. Aber das ist schwieriger, als es klingt. Planungsaufgaben wie das Ansetzen eines Meetings oder das Zusammenstellen einer Urlaubsroute m\u00f6gen f\u00fcr uns einfach erscheinen. Menschen sind gut darin, mehrere Schritte zu durchdenken und vorherzusagen, ob eine bestimmte Vorgehensweise das gew\u00fcnschte Ziel erreicht oder nicht. Sie k\u00f6nnten feststellen, dass<\/p>","protected":false},"author":6,"featured_media":12787,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[147,118],"class_list":["post-12782","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-deepmind","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/de\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI\" \/>\n<meta property=\"og:description\" content=\"Google DeepMind researchers developed NATURAL PLAN, a benchmark for evaluating the capability of LLMs to plan real-world tasks based on natural language prompts. The next evolution of AI is to have it leave the confines of a chat platform and take on agentic roles to complete tasks across platforms on our behalf. But that\u2019s harder than it sounds. Planning tasks like scheduling a meeting or compiling a holiday itinerary might seem simple for us. Humans are good at reasoning through multiple steps and predicting whether a course of action will accomplish the desired objective or not. You might find that\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/de\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-10T10:39:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1792\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"NATURAL PLAN: Benchmarking LLMs on natural language planning\",\"datePublished\":\"2024-06-10T10:39:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\"},\"wordCount\":606,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"keywords\":[\"DeepMind\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"de\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\",\"name\":\"NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"datePublished\":\"2024-06-10T10:39:06+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"width\":1792,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NATURAL PLAN: Benchmarking LLMs on natural language planning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/de\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NATURAL PLAN: Benchmarking von LLMs zur Planung nat\u00fcrlicher Sprache | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/de\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","og_locale":"de_DE","og_type":"article","og_title":"NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI","og_description":"Google DeepMind researchers developed NATURAL PLAN, a benchmark for evaluating the capability of LLMs to plan real-world tasks based on natural language prompts. The next evolution of AI is to have it leave the confines of a chat platform and take on agentic roles to complete tasks across platforms on our behalf. But that\u2019s harder than it sounds. Planning tasks like scheduling a meeting or compiling a holiday itinerary might seem simple for us. Humans are good at reasoning through multiple steps and predicting whether a course of action will accomplish the desired objective or not. You might find that","og_url":"https:\/\/dailyai.com\/de\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","og_site_name":"DailyAI","article_published_time":"2024-06-10T10:39:06+00:00","og_image":[{"width":1792,"height":1024,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","type":"image\/webp"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Verfasst von":"Eugene van der Watt","Gesch\u00e4tzte Lesezeit":"4\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"NATURAL PLAN: Benchmarking LLMs on natural language planning","datePublished":"2024-06-10T10:39:06+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/"},"wordCount":606,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","keywords":["DeepMind","LLMS"],"articleSection":["Industry"],"inLanguage":"de"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","url":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","name":"NATURAL PLAN: Benchmarking von LLMs zur Planung nat\u00fcrlicher Sprache | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","datePublished":"2024-06-10T10:39:06+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","width":1792,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"NATURAL PLAN: Benchmarking LLMs on natural language planning"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Ihre t\u00e4gliche Dosis an AI-Nachrichten","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene kommt aus der Elektronikbranche und liebt alles, was mit Technik zu tun hat. Wenn er eine Pause vom Konsum von KI-Nachrichten einlegt, findet man ihn am Snookertisch.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/de\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts\/12782","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/comments?post=12782"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts\/12782\/revisions"}],"predecessor-version":[{"id":12789,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/posts\/12782\/revisions\/12789"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/media\/12787"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/media?parent=12782"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/categories?post=12782"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/de\/wp-json\/wp\/v2\/tags?post=12782"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}