{"id":9253,"date":"2024-01-16T14:01:10","date_gmt":"2024-01-16T14:01:10","guid":{"rendered":"https:\/\/dailyai.com\/?p=9253"},"modified":"2024-01-16T14:01:10","modified_gmt":"2024-01-16T14:01:10","slug":"v-multimodal-llm-guided-visual-search-that-beats-gpt-4v","status":"publish","type":"post","link":"https:\/\/dailyai.com\/nl\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","title":{"rendered":"V* - Multimodaal LLM geleid visueel zoeken dat GPT-4V verslaat"},"content":{"rendered":"<p><strong>Onderzoekers van UC San Diego en New York University ontwikkelden V*, een LLM-gestuurd zoekalgoritme dat veel beter is dan GPT-4V in contextueel begrip en het nauwkeurig richten op specifieke visuele elementen in afbeeldingen.<\/strong><\/p>\n<p>Multimodale Large Language Models (MLLM) zoals OpenAI's GPT-4V hebben ons vorig jaar versteld doen staan met hun vermogen om vragen over afbeeldingen te beantwoorden. Hoe indrukwekkend GPT-4V ook is, het heeft soms moeite met complexe afbeeldingen en mist vaak kleine details.<\/p>\n<p>Het V*-algoritme gebruikt een Visual Question Answering (VQA) LLM om te bepalen op welk deel van de afbeelding moet worden gefocust om een visuele vraag te beantwoorden. De onderzoekers noemen deze combinatie Show, sEArch en telL (SEAL).<\/p>\n<p>Als iemand je een afbeelding met een hoge resolutie zou geven en je er een vraag over zou stellen, zou je logica je leiden om in te zoomen op een gebied waar de kans het grootst is dat je het item in kwestie vindt. SEAL gebruikt V* om afbeeldingen op een vergelijkbare manier te analyseren.<\/p>\n<p>Een visueel zoekmodel zou simpelweg een afbeelding in blokken kunnen verdelen, op elk blok inzoomen en het dan verwerken om het object in kwestie te vinden, maar dat is rekenkundig erg ineffici\u00ebnt.<\/p>\n<p>Wanneer een tekstuele vraag over een afbeelding wordt gesteld, probeert V* eerst het doelwit direct te lokaliseren. Als dat niet lukt, wordt de MLLM gevraagd op basis van gezond verstand te bepalen in welk deel van de afbeelding het doel zich waarschijnlijk bevindt.<\/p>\n<p>Het richt zijn zoekopdracht dan alleen op dat gebied, in plaats van een \"ingezoomde\" zoekopdracht uit te voeren op de hele afbeelding.<\/p>\n<figure id=\"attachment_9257\" aria-describedby=\"caption-attachment-9257\" style=\"width: 1942px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-9257\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar.jpg\" alt=\"\" width=\"1942\" height=\"638\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar.jpg 1942w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-300x99.jpg 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-1024x336.jpg 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-768x252.jpg 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-1536x505.jpg 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-370x122.jpg 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-800x263.jpg 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-740x243.jpg 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-20x7.jpg 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-1600x526.jpg 1600w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-146x48.jpg 146w\" sizes=\"auto, (max-width: 1942px) 100vw, 1942px\" \/><figcaption id=\"caption-attachment-9257\" class=\"wp-caption-text\">Wanneer wordt gevraagd om de gitaar te zoeken, wijst de LLM het podium aan als het logische gebied waarop de visuele analyse moet worden gericht om de gitaar te zoeken. Bron: GitHub<\/figcaption><\/figure>\n<p>Wanneer GPT-4V wordt gevraagd om vragen te beantwoorden over een afbeelding die uitgebreide visuele verwerking van high-res afbeeldingen vereist, heeft het het moeilijk. SEAL met V* presteert een stuk beter.<\/p>\n<figure id=\"attachment_9258\" aria-describedby=\"caption-attachment-9258\" style=\"width: 992px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-9258\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example.jpg\" alt=\"\" width=\"992\" height=\"1302\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example.jpg 992w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-229x300.jpg 229w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-780x1024.jpg 780w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-768x1008.jpg 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-370x486.jpg 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-800x1050.jpg 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-740x971.jpg 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-20x26.jpg 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-37x48.jpg 37w\" sizes=\"auto, (max-width: 992px) 100vw, 992px\" \/><figcaption id=\"caption-attachment-9258\" class=\"wp-caption-text\">SEAL beantwoordt een vraag over een afbeelding correct terwijl GPT-4V het fout heeft. Bron: GitHub<\/figcaption><\/figure>\n<p>Op de vraag \"Wat voor soort drank kunnen we uit die automaat kopen?\" antwoordde SEAL \"Coca-Cola\". antwoordde SEAL \"Coca-Cola\" terwijl GPT-4V foutief \"Pepsi\" raadde.<\/p>\n<p>De onderzoekers gebruikten 191 afbeeldingen met een hoge resolutie uit Meta's Segment Anything (SAM) dataset en cre\u00eberden een benchmark om te zien hoe de prestaties van SEAL zich verhielden tot andere modellen. De V*Bench benchmark test twee taken: attribuutherkenning en ruimtelijke redenering.<\/p>\n<p>De onderstaande figuren tonen de menselijke prestaties in vergelijking met open-source modellen, commerci\u00eble modellen zoals GPT-4V en SEAL. De boost die V* geeft aan de prestaties van SEAL is vooral indrukwekkend omdat de onderliggende MLLM die gebruikt wordt LLaVa-7b is, die een stuk kleiner is dan GPT-4V.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-9259\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table.jpg\" alt=\"\" width=\"1120\" height=\"1060\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table.jpg 1120w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-300x284.jpg 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-1024x969.jpg 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-768x727.jpg 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-370x350.jpg 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-800x757.jpg 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-20x19.jpg 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-740x700.jpg 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-24x24.jpg 24w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-51x48.jpg 51w\" sizes=\"auto, (max-width: 1120px) 100vw, 1120px\" \/><\/p>\n<p>Deze intu\u00eftieve benadering van het analyseren van afbeeldingen lijkt erg goed te werken met een aantal indrukwekkende voorbeelden op de <a href=\"https:\/\/vstar-seal.github.io\/\" target=\"_blank\" rel=\"noopener\">samenvatting op GitHub<\/a>.<\/p>\n<p>Het zal interessant zijn om te zien of andere MLLM's, zoals die van OpenAI of Google, voor een vergelijkbare aanpak kiezen.<\/p>\n<p>Op de vraag welke drank er werd verkocht uit de automaat op de foto hierboven, antwoordde Google's Bard: \"Er staat geen automaat op de voorgrond.\" Misschien doet Gemini Ultra het beter.<\/p>\n<p>Voorlopig lijkt het erop dat SEAL en zijn nieuwe V*-algoritme de grootste multimodale modellen ver achter zich laten als het gaat om visuele ondervraging.<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Onderzoekers van UC San Diego en New York University hebben V* ontwikkeld, een LLM-gestuurd zoekalgoritme dat veel beter is dan GPT-4V in contextueel begrip en het nauwkeurig richten op specifieke visuele elementen in afbeeldingen. Multimodale Large Language Models (MLLM) zoals GPT-4V van OpenAI hebben ons vorig jaar versteld doen staan met hun vermogen om vragen over afbeeldingen te beantwoorden. Hoe indrukwekkend GPT-4V ook is, het heeft soms moeite met complexe afbeeldingen en mist vaak kleine details. Het V*-algoritme gebruikt een Visual Question Answering (VQA) LLM om te bepalen op welk deel van de afbeelding het zich moet richten om een visuele vraag te beantwoorden.<\/p>","protected":false},"author":6,"featured_media":9260,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[166,118],"class_list":["post-9253","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-computer-vision","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/nl\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/\" \/>\n<meta property=\"og:locale\" content=\"nl_NL\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI\" \/>\n<meta property=\"og:description\" content=\"Researchers from UC San Diego and New York University developed V*, an LLM-guided search algorithm that is a lot better than GPT-4V at contextual understanding, and precise targeting of specific visual elements in images. Multimodal Large Language Models (MLLM) like OpenAI\u2019s GPT-4V blew us away last year with the ability to answer questions about images. As impressive as GPT-4V is, it struggles sometimes when images are very complex and often misses small details. The V* algorithm uses a Visual Question Answering (VQA) LLM to guide it in identifying which area of the image to focus on to answer a visual\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/nl\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-16T14:01:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"664\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Geschreven door\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Geschatte leestijd\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V\",\"datePublished\":\"2024-01-16T14:01:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\"},\"wordCount\":573,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"keywords\":[\"Computer vision\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"nl-NL\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\",\"name\":\"V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"datePublished\":\"2024-01-16T14:01:10+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#breadcrumb\"},\"inLanguage\":\"nl-NL\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"width\":1000,\"height\":664},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"nl-NL\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/nl\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"V* - Multimodale LLM begeleide visuele zoekopdracht die GPT-4V verslaat | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/nl\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","og_locale":"nl_NL","og_type":"article","og_title":"V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI","og_description":"Researchers from UC San Diego and New York University developed V*, an LLM-guided search algorithm that is a lot better than GPT-4V at contextual understanding, and precise targeting of specific visual elements in images. Multimodal Large Language Models (MLLM) like OpenAI\u2019s GPT-4V blew us away last year with the ability to answer questions about images. As impressive as GPT-4V is, it struggles sometimes when images are very complex and often misses small details. The V* algorithm uses a Visual Question Answering (VQA) LLM to guide it in identifying which area of the image to focus on to answer a visual","og_url":"https:\/\/dailyai.com\/nl\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","og_site_name":"DailyAI","article_published_time":"2024-01-16T14:01:10+00:00","og_image":[{"width":1000,"height":664,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Geschreven door":"Eugene van der Watt","Geschatte leestijd":"3 minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V","datePublished":"2024-01-16T14:01:10+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/"},"wordCount":573,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","keywords":["Computer vision","LLMS"],"articleSection":["Industry"],"inLanguage":"nl-NL"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","url":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","name":"V* - Multimodale LLM begeleide visuele zoekopdracht die GPT-4V verslaat | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","datePublished":"2024-01-16T14:01:10+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#breadcrumb"},"inLanguage":"nl-NL","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/"]}]},{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","width":1000,"height":664},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Uw dagelijkse dosis AI-nieuws","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"nl-NL"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene heeft een achtergrond in elektrotechniek en houdt van alles wat met techniek te maken heeft. Als hij even pauzeert van het consumeren van AI-nieuws, kun je hem aan de snookertafel vinden.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/nl\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts\/9253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/comments?post=9253"}],"version-history":[{"count":4,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts\/9253\/revisions"}],"predecessor-version":[{"id":9261,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/posts\/9253\/revisions\/9261"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/media\/9260"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/media?parent=9253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/categories?post=9253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/nl\/wp-json\/wp\/v2\/tags?post=9253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}