{"id":12782,"date":"2024-06-10T10:39:06","date_gmt":"2024-06-10T10:39:06","guid":{"rendered":"https:\/\/dailyai.com\/?p=12782"},"modified":"2024-06-10T10:39:06","modified_gmt":"2024-06-10T10:39:06","slug":"natural-plan-benchmarking-llms-on-natural-language-planning","status":"publish","type":"post","link":"https:\/\/dailyai.com\/sv\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","title":{"rendered":"NATURAL PLAN: Benchmarking av LLM-program f\u00f6r planering av naturliga spr\u00e5k"},"content":{"rendered":"<p><strong>Google DeepMind-forskare utvecklade NATURAL PLAN, ett riktm\u00e4rke f\u00f6r att utv\u00e4rdera LLM:ers f\u00f6rm\u00e5ga att planera verkliga uppgifter baserat p\u00e5 naturliga spr\u00e5kmeddelanden.<\/strong><\/p>\n<p>N\u00e4sta steg i utvecklingen av AI \u00e4r att l\u00e5ta den l\u00e4mna en chattplattform och ta p\u00e5 sig rollen som agent f\u00f6r att slutf\u00f6ra uppgifter p\u00e5 olika plattformar f\u00f6r v\u00e5r r\u00e4kning. Men det \u00e4r sv\u00e5rare \u00e4n det l\u00e5ter.<\/p>\n<p>Planeringsuppgifter som att schemal\u00e4gga ett m\u00f6te eller sammanst\u00e4lla en semesterplan kan verka enkla f\u00f6r oss. M\u00e4nniskor \u00e4r bra p\u00e5 att resonera sig fram genom flera steg och f\u00f6rutse om ett tillv\u00e4gag\u00e5ngss\u00e4tt kommer att leda till att det \u00f6nskade m\u00e5let uppn\u00e5s eller inte.<\/p>\n<p>Du kanske tycker att det \u00e4r l\u00e4tt, men \u00e4ven de b\u00e4sta AI-modellerna k\u00e4mpar med planering. Kan vi j\u00e4mf\u00f6ra dem f\u00f6r att se vilken LLM som \u00e4r b\u00e4st p\u00e5 att planera?<\/p>\n<p>Benchmarken NATURAL PLAN testar LLM:er p\u00e5 3 planeringsuppgifter:<\/p>\n<ul>\n<li><strong>Planering av resan<\/strong> - Planering av en resplan med h\u00e4nsyn till flyg och destination<\/li>\n<li><strong>Planering av m\u00f6ten<\/strong> - Schemal\u00e4gga m\u00f6ten med flera v\u00e4nner p\u00e5 olika platser<\/li>\n<li><strong>Schemal\u00e4ggning av kalender<\/strong> - Schemal\u00e4gga arbetsm\u00f6ten mellan flera personer utifr\u00e5n befintliga scheman och olika begr\u00e4nsningar<\/li>\n<\/ul>\n<p>Experimentet inleddes med n\u00e5gra f\u00e5 uppmaningar d\u00e4r modellerna fick 5 exempel p\u00e5 uppmaningar och motsvarande korrekta svar. D\u00e4refter fick de planeringsuppmaningar av varierande sv\u00e5righetsgrad.<\/p>\n<p>H\u00e4r \u00e4r ett exempel p\u00e5 en uppmaning och en l\u00f6sning som gavs som exempel till modellerna:<\/p>\n<figure id=\"attachment_12784\" aria-describedby=\"caption-attachment-12784\" style=\"width: 1342px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-12784 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example.png\" alt=\"\" width=\"1342\" height=\"808\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example.png 1342w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-300x181.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-1024x617.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-768x462.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-18x12.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-Prompt-example-60x36.png 60w\" sizes=\"auto, (max-width: 1342px) 100vw, 1342px\" \/><figcaption id=\"caption-attachment-12784\" class=\"wp-caption-text\">Ett exempel p\u00e5 en uppmaning och en l\u00f6sning som anv\u00e4nds i Trip Planning-experimentet. K\u00e4lla: arXiv<\/figcaption><\/figure>\n<h2>Resultat<\/h2>\n<p>Forskarna testade GPT-3.5, GPT-4, <a href=\"https:\/\/dailyai.com\/sv\/2024\/05\/everything-you-need-to-know-about-openais-new-flagship-model-gpt-4o\/\">GPT-4o<\/a>, Gemini 1.5 Flash, och <a href=\"https:\/\/dailyai.com\/sv\/2024\/02\/google-plays-another-ai-card-in-the-form-of-gemini-1-5-pro\/\"><span class=\"noTranslate\" data-no-translation=\"\">Gemini<\/span> 1,5 Pro<\/a>och ingen av dem klarade sig s\u00e4rskilt bra i dessa tester.<\/p>\n<p>Resultaten m\u00e5ste dock ha g\u00e5tt bra p\u00e5 DeepMind-kontoret eftersom Gemini 1.5 Pro kom ut p\u00e5 topp.<\/p>\n<figure id=\"attachment_12785\" aria-describedby=\"caption-attachment-12785\" style=\"width: 1302px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-12785 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results.png\" alt=\"\" width=\"1302\" height=\"204\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results.png 1302w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-300x47.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-1024x160.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-768x120.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-18x3.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLAN-results-60x9.png 60w\" sizes=\"auto, (max-width: 1302px) 100vw, 1302px\" \/><figcaption id=\"caption-attachment-12785\" class=\"wp-caption-text\">NATURAL PLAN benchmark resultat. K\u00e4lla: arXiv<\/figcaption><\/figure>\n<p>Som v\u00e4ntat blev resultaten exponentiellt s\u00e4mre med mer komplexa uppmaningar d\u00e4r antalet personer eller st\u00e4der \u00f6kades. Titta till exempel p\u00e5 hur snabbt noggrannheten f\u00f6rs\u00e4mrades n\u00e4r fler personer lades till i testet f\u00f6r m\u00f6tesplanering.<\/p>\n<figure id=\"attachment_12786\" aria-describedby=\"caption-attachment-12786\" style=\"width: 1330px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-12786 size-full\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity.png\" alt=\"\" width=\"1330\" height=\"530\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity.png 1330w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-300x120.png 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-1024x408.png 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-768x306.png 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-18x7.png 18w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/NATURAL-PLANNING-results-vs-complexity-60x24.png 60w\" sizes=\"auto, (max-width: 1330px) 100vw, 1330px\" \/><figcaption id=\"caption-attachment-12786\" class=\"wp-caption-text\">Noggrannheten i resultaten i testet Meeting Planning f\u00f6rs\u00e4mrades exponentiellt i takt med att uppmaningarna blev mer komplexa. K\u00e4lla: arXiv<\/figcaption><\/figure>\n<p>Kan \"multi-shot prompting\" resultera i f\u00f6rb\u00e4ttrad precision? Forskningsresultaten tyder p\u00e5 att det kan g\u00f6ra det, men bara om modellen har ett tillr\u00e4ckligt stort kontextf\u00f6nster.<\/p>\n<p>Gemini 1.5 Pros st\u00f6rre kontextf\u00f6nster g\u00f6r att den kan utnyttja fler exempel i kontexten \u00e4n GPT-modellerna.<\/p>\n<p>Forskarna fann att n\u00e4r man \u00f6kar antalet skott fr\u00e5n 1 till 800 i reseplaneringen f\u00f6rb\u00e4ttras noggrannheten f\u00f6r Gemini Pro 1.5 fr\u00e5n 2,7% till 39,9%.<\/p>\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2406.04520\" target=\"_blank\" rel=\"noopener\">Tidningen<\/a> \"Dessa resultat visar p\u00e5 m\u00f6jligheterna med planering i kontext, d\u00e4r LLM:er med hj\u00e4lp av funktioner f\u00f6r l\u00e5ng kontext kan utnyttja ytterligare kontext f\u00f6r att f\u00f6rb\u00e4ttra planeringen.\"<\/p>\n<p>Ett m\u00e4rkligt resultat var att GPT-4o var riktigt d\u00e5lig p\u00e5 reseplanering. Forskarna uppt\u00e4ckte att den hade sv\u00e5rt att \"f\u00f6rst\u00e5 och respektera begr\u00e4nsningar i fr\u00e5ga om flygf\u00f6rbindelser och resedatum\".<\/p>\n<p>Ett annat m\u00e4rkligt resultat var att sj\u00e4lvkorrigering ledde till en betydande minskning av modellens prestanda f\u00f6r alla modeller. N\u00e4r modellerna uppmanades att kontrollera sitt arbete och g\u00f6ra korrigeringar gjorde de fler misstag.<\/p>\n<p>Intressant \u00e4r att de starkare modellerna, som GPT-4 och Gemini 1.5 Pro, drabbades av st\u00f6rre f\u00f6rluster \u00e4n GPT-3.5 vid sj\u00e4lvkorrigering.<\/p>\n<p>Agentisk AI \u00e4r ett sp\u00e4nnande perspektiv och vi ser redan n\u00e5gra praktiska anv\u00e4ndningsfall i <a href=\"https:\/\/dailyai.com\/sv\/2024\/05\/ai-agents-multimodal-phi-3-unveiled-at-microsoft-build-2024\/\">Microsoft <span class=\"noTranslate\" data-no-translation=\"\">Copilot<\/span> agenter<\/a>.<\/p>\n<p>Men resultaten av NATURAL PLANs benchmarkingtester visar att vi har en bit kvar innan AI kan hantera mer komplex planering.<\/p>\n<p>DeepMind-forskarna drog slutsatsen att \"NATURAL PLAN \u00e4r mycket sv\u00e5r f\u00f6r avancerade modeller att l\u00f6sa\".<\/p>\n<p>Det verkar som om AI inte kommer att ers\u00e4tta resebyr\u00e5er och personliga assistenter riktigt \u00e4n.<\/p>","protected":false},"excerpt":{"rendered":"<p>Google DeepMind-forskare utvecklade NATURAL PLAN, ett riktm\u00e4rke f\u00f6r att utv\u00e4rdera LLM:ers f\u00f6rm\u00e5ga att planera verkliga uppgifter baserat p\u00e5 naturliga spr\u00e5kmeddelanden. N\u00e4sta steg i utvecklingen av AI \u00e4r att f\u00e5 den att l\u00e4mna en chattplattform och ta p\u00e5 sig agentroller f\u00f6r att slutf\u00f6ra uppgifter p\u00e5 olika plattformar f\u00f6r v\u00e5r r\u00e4kning. Men det \u00e4r sv\u00e5rare \u00e4n det l\u00e5ter. Att planera uppgifter som att boka ett m\u00f6te eller sammanst\u00e4lla en semesterplan kan verka enkelt f\u00f6r oss. M\u00e4nniskor \u00e4r bra p\u00e5 att resonera sig igenom flera steg och f\u00f6rutse om ett tillv\u00e4gag\u00e5ngss\u00e4tt kommer att uppn\u00e5 det \u00f6nskade m\u00e5let eller inte. Du kanske tycker att<\/p>","protected":false},"author":6,"featured_media":12787,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[147,118],"class_list":["post-12782","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-deepmind","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/sv\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/\" \/>\n<meta property=\"og:locale\" content=\"sv_SE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI\" \/>\n<meta property=\"og:description\" content=\"Google DeepMind researchers developed NATURAL PLAN, a benchmark for evaluating the capability of LLMs to plan real-world tasks based on natural language prompts. The next evolution of AI is to have it leave the confines of a chat platform and take on agentic roles to complete tasks across platforms on our behalf. But that\u2019s harder than it sounds. Planning tasks like scheduling a meeting or compiling a holiday itinerary might seem simple for us. Humans are good at reasoning through multiple steps and predicting whether a course of action will accomplish the desired objective or not. You might find that\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/sv\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-10T10:39:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1792\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Skriven av\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Ber\u00e4knad l\u00e4stid\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minuter\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"NATURAL PLAN: Benchmarking LLMs on natural language planning\",\"datePublished\":\"2024-06-10T10:39:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\"},\"wordCount\":606,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"keywords\":[\"DeepMind\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"sv-SE\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\",\"name\":\"NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"datePublished\":\"2024-06-10T10:39:06+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#breadcrumb\"},\"inLanguage\":\"sv-SE\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/Planning.webp\",\"width\":1792,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/06\\\/natural-plan-benchmarking-llms-on-natural-language-planning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NATURAL PLAN: Benchmarking LLMs on natural language planning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sv-SE\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/sv\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NATURAL PLAN: Benchmarking av LLM:er f\u00f6r planering av naturligt spr\u00e5k | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/sv\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","og_locale":"sv_SE","og_type":"article","og_title":"NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI","og_description":"Google DeepMind researchers developed NATURAL PLAN, a benchmark for evaluating the capability of LLMs to plan real-world tasks based on natural language prompts. The next evolution of AI is to have it leave the confines of a chat platform and take on agentic roles to complete tasks across platforms on our behalf. But that\u2019s harder than it sounds. Planning tasks like scheduling a meeting or compiling a holiday itinerary might seem simple for us. Humans are good at reasoning through multiple steps and predicting whether a course of action will accomplish the desired objective or not. You might find that","og_url":"https:\/\/dailyai.com\/sv\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","og_site_name":"DailyAI","article_published_time":"2024-06-10T10:39:06+00:00","og_image":[{"width":1792,"height":1024,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","type":"image\/webp"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Skriven av":"Eugene van der Watt","Ber\u00e4knad l\u00e4stid":"4 minuter"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"NATURAL PLAN: Benchmarking LLMs on natural language planning","datePublished":"2024-06-10T10:39:06+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/"},"wordCount":606,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","keywords":["DeepMind","LLMS"],"articleSection":["Industry"],"inLanguage":"sv-SE"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","url":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/","name":"NATURAL PLAN: Benchmarking av LLM:er f\u00f6r planering av naturligt spr\u00e5k | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","datePublished":"2024-06-10T10:39:06+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#breadcrumb"},"inLanguage":"sv-SE","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/"]}]},{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/06\/Planning.webp","width":1792,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/06\/natural-plan-benchmarking-llms-on-natural-language-planning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"NATURAL PLAN: Benchmarking LLMs on natural language planning"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DagligaAI","description":"Din dagliga dos av AI-nyheter","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sv-SE"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DagligaAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene kommer fr\u00e5n en bakgrund som elektronikingenj\u00f6r och \u00e4lskar allt som har med teknik att g\u00f6ra. N\u00e4r han tar en paus fr\u00e5n att konsumera AI-nyheter hittar du honom vid snookerbordet.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/sv\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts\/12782","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/comments?post=12782"}],"version-history":[{"count":3,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts\/12782\/revisions"}],"predecessor-version":[{"id":12789,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/posts\/12782\/revisions\/12789"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/media\/12787"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/media?parent=12782"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/categories?post=12782"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/sv\/wp-json\/wp\/v2\/tags?post=12782"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}