{"id":9253,"date":"2024-01-16T14:01:10","date_gmt":"2024-01-16T14:01:10","guid":{"rendered":"https:\/\/dailyai.com\/?p=9253"},"modified":"2024-01-16T14:01:10","modified_gmt":"2024-01-16T14:01:10","slug":"v-multimodal-llm-guided-visual-search-that-beats-gpt-4v","status":"publish","type":"post","link":"https:\/\/dailyai.com\/es\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","title":{"rendered":"V* - B\u00fasqueda visual guiada LLM multimodal que supera a GPT-4V"},"content":{"rendered":"<p><strong>Investigadores de la Universidad de California en San Diego y de la Universidad de Nueva York desarrollaron V*, un algoritmo de b\u00fasqueda guiada por LLM que es mucho mejor que GPT-4V en la comprensi\u00f3n contextual y la localizaci\u00f3n precisa de elementos visuales espec\u00edficos en las im\u00e1genes.<\/strong><\/p>\n<p>Los modelos multimodales de lenguaje amplio (MLLM), como GPT-4V de OpenAI, nos dejaron boquiabiertos el a\u00f1o pasado con su capacidad para responder a preguntas sobre im\u00e1genes. A pesar de lo impresionante que es GPT-4V, a veces tiene problemas cuando las im\u00e1genes son muy complejas y a menudo pasa por alto peque\u00f1os detalles.<\/p>\n<p>El algoritmo V* utiliza un LLM de respuesta a preguntas visuales (VQA) que le sirve de gu\u00eda para identificar en qu\u00e9 zona de la imagen debe centrarse para responder a una consulta visual. Los investigadores denominan a esta combinaci\u00f3n Show, sEArch y telL (SEAL).<\/p>\n<p>Si alguien te diera una imagen de alta resoluci\u00f3n y te hiciera una pregunta sobre ella, tu l\u00f3gica te guiar\u00eda para que hicieras zoom en una zona donde fuera m\u00e1s probable encontrar el objeto en cuesti\u00f3n. SEAL utiliza V* para analizar im\u00e1genes de forma similar.<\/p>\n<p>Un modelo de b\u00fasqueda visual podr\u00eda limitarse a dividir una imagen en bloques, hacer zoom en cada uno de ellos y procesarlos para encontrar el objeto en cuesti\u00f3n, pero eso es muy ineficiente desde el punto de vista computacional.<\/p>\n<p>Cuando se le plantea una consulta textual sobre una imagen, V* intenta en primer lugar localizar directamente el objetivo de la imagen. Si no lo consigue, pide al MLLM que utilice el sentido com\u00fan para identificar en qu\u00e9 zona de la imagen es m\u00e1s probable que se encuentre el objetivo.<\/p>\n<p>A continuaci\u00f3n, centra su b\u00fasqueda s\u00f3lo en esa zona, en lugar de intentar una b\u00fasqueda \"ampliada\" de toda la imagen.<\/p>\n<figure id=\"attachment_9257\" aria-describedby=\"caption-attachment-9257\" style=\"width: 1942px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-9257\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar.jpg\" alt=\"\" width=\"1942\" height=\"638\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar.jpg 1942w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-300x99.jpg 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-1024x336.jpg 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-768x252.jpg 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-1536x505.jpg 1536w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-370x122.jpg 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-800x263.jpg 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-740x243.jpg 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-20x7.jpg 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-1600x526.jpg 1600w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Wheres-the-guitar-146x48.jpg 146w\" sizes=\"auto, (max-width: 1942px) 100vw, 1942px\" \/><figcaption id=\"caption-attachment-9257\" class=\"wp-caption-text\">Cuando se le pide que busque la guitarra, el LLM identifica el escenario como la zona l\u00f3gica en la que centrar el an\u00e1lisis visual para buscarla. Fuente: GitHub<\/figcaption><\/figure>\n<p>Cuando se pide a GPT-4V que responda a preguntas sobre una imagen que requiere un procesamiento visual exhaustivo de im\u00e1genes de alta resoluci\u00f3n, tiene dificultades. El SEAL que utiliza V* funciona mucho mejor.<\/p>\n<figure id=\"attachment_9258\" aria-describedby=\"caption-attachment-9258\" style=\"width: 992px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-9258\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example.jpg\" alt=\"\" width=\"992\" height=\"1302\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example.jpg 992w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-229x300.jpg 229w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-780x1024.jpg 780w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-768x1008.jpg 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-370x486.jpg 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-800x1050.jpg 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-740x971.jpg 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-20x26.jpg 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/Vending-machine-example-37x48.jpg 37w\" sizes=\"auto, (max-width: 992px) 100vw, 992px\" \/><figcaption id=\"caption-attachment-9258\" class=\"wp-caption-text\">SEAL responde correctamente a una pregunta sobre una imagen, mientras que GPT-4V se equivoca. Fuente: GitHub<\/figcaption><\/figure>\n<p>Cuando se le pregunt\u00f3 \"\u00bfQu\u00e9 tipo de bebida podemos comprar en esa m\u00e1quina expendedora?\" SEAL respondi\u00f3 \"Coca-Cola\", mientras que GPT-4V adivin\u00f3 incorrectamente \"Pepsi\".<\/p>\n<p>Los investigadores utilizaron 191 im\u00e1genes de alta resoluci\u00f3n del conjunto de datos Segment Anything (SAM) de Meta y crearon una prueba comparativa para comprobar el rendimiento de SEAL en comparaci\u00f3n con otros modelos. La prueba V*Bench eval\u00faa dos tareas: el reconocimiento de atributos y el razonamiento de relaciones espaciales.<\/p>\n<p>Las siguientes figuras muestran el rendimiento humano comparado con modelos de c\u00f3digo abierto, modelos comerciales como GPT-4V y SEAL. El aumento del rendimiento de SEAL gracias a V* es especialmente impresionante porque el MLLM subyacente que utiliza es LLaVa-7b, que es mucho m\u00e1s peque\u00f1o que GPT-4V.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-9259\" src=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table.jpg\" alt=\"\" width=\"1120\" height=\"1060\" srcset=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table.jpg 1120w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-300x284.jpg 300w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-1024x969.jpg 1024w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-768x727.jpg 768w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-370x350.jpg 370w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-800x757.jpg 800w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-20x19.jpg 20w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-740x700.jpg 740w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-24x24.jpg 24w, https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/table-51x48.jpg 51w\" sizes=\"auto, (max-width: 1120px) 100vw, 1120px\" \/><\/p>\n<p>Este enfoque intuitivo del an\u00e1lisis de im\u00e1genes parece funcionar realmente bien con una serie de ejemplos impresionantes en el <a href=\"https:\/\/vstar-seal.github.io\/\" target=\"_blank\" rel=\"noopener\">resumen del documento en GitHub<\/a>.<\/p>\n<p>Ser\u00e1 interesante ver si otros MLLM, como los de OpenAI o Google, adoptan un enfoque similar.<\/p>\n<p>Cuando se le pregunt\u00f3 qu\u00e9 bebida se vend\u00eda en la m\u00e1quina expendedora de la imagen de arriba, el bardo de Google respondi\u00f3: \"No hay ninguna m\u00e1quina expendedora en primer plano\". Quiz\u00e1 Gemini Ultra lo haga mejor.<\/p>\n<p>Por ahora, parece que SEAL y su novedoso algoritmo V* aventajan con cierta distancia a algunos de los mayores modelos multimodales en lo que a interrogatorio visual se refiere.<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Investigadores de la Universidad de California en San Diego y de la Universidad de Nueva York han desarrollado V*, un algoritmo de b\u00fasqueda guiado por MLLM que es mucho mejor que GPT-4V en lo que respecta a la comprensi\u00f3n contextual y la localizaci\u00f3n precisa de elementos visuales espec\u00edficos en im\u00e1genes. Los modelos multiling\u00fces multimodales (MLLM), como GPT-4V de OpenAI, nos dejaron boquiabiertos el a\u00f1o pasado con su capacidad para responder a preguntas sobre im\u00e1genes. A pesar de lo impresionante que es GPT-4V, a veces tiene problemas cuando las im\u00e1genes son muy complejas y a menudo pasa por alto peque\u00f1os detalles. El algoritmo V* utiliza un LLM de respuesta a preguntas visuales (VQA) para identificar en qu\u00e9 \u00e1rea de la imagen debe centrarse para responder a una pregunta visual.<\/p>","protected":false},"author":6,"featured_media":9260,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[166,118],"class_list":["post-9253","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry","tag-computer-vision","tag-llms"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/es\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI\" \/>\n<meta property=\"og:description\" content=\"Researchers from UC San Diego and New York University developed V*, an LLM-guided search algorithm that is a lot better than GPT-4V at contextual understanding, and precise targeting of specific visual elements in images. Multimodal Large Language Models (MLLM) like OpenAI\u2019s GPT-4V blew us away last year with the ability to answer questions about images. As impressive as GPT-4V is, it struggles sometimes when images are very complex and often misses small details. The V* algorithm uses a Visual Question Answering (VQA) LLM to guide it in identifying which area of the image to focus on to answer a visual\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/es\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-16T14:01:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"664\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Eugene van der Watt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eugene van der Watt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\"},\"author\":{\"name\":\"Eugene van der Watt\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\"},\"headline\":\"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V\",\"datePublished\":\"2024-01-16T14:01:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\"},\"wordCount\":573,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"keywords\":[\"Computer vision\",\"LLMS\"],\"articleSection\":[\"Industry\"],\"inLanguage\":\"es\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\",\"name\":\"V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"datePublished\":\"2024-01-16T14:01:10+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/01\\\/needle-in-haystack.jpg\",\"width\":1000,\"height\":664},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/01\\\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/7ce525c6d0c79838b7cc7cde96993cfa\",\"name\":\"Eugene van der Watt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/Eugine_Profile_Picture-96x96.png\",\"caption\":\"Eugene van der Watt\"},\"description\":\"Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.\",\"sameAs\":[\"www.linkedin.com\\\/in\\\/eugene-van-der-watt-16828119\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/es\\\/author\\\/eugene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"V* - B\u00fasqueda visual guiada LLM multimodal que supera a GPT-4V | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/es\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","og_locale":"es_ES","og_type":"article","og_title":"V* - Multimodal LLM guided visual search that beats GPT-4V | DailyAI","og_description":"Researchers from UC San Diego and New York University developed V*, an LLM-guided search algorithm that is a lot better than GPT-4V at contextual understanding, and precise targeting of specific visual elements in images. Multimodal Large Language Models (MLLM) like OpenAI\u2019s GPT-4V blew us away last year with the ability to answer questions about images. As impressive as GPT-4V is, it struggles sometimes when images are very complex and often misses small details. The V* algorithm uses a Visual Question Answering (VQA) LLM to guide it in identifying which area of the image to focus on to answer a visual","og_url":"https:\/\/dailyai.com\/es\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","og_site_name":"DailyAI","article_published_time":"2024-01-16T14:01:10+00:00","og_image":[{"width":1000,"height":664,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","type":"image\/jpeg"}],"author":"Eugene van der Watt","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"Escrito por":"Eugene van der Watt","Tiempo de lectura":"3 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/"},"author":{"name":"Eugene van der Watt","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa"},"headline":"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V","datePublished":"2024-01-16T14:01:10+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/"},"wordCount":573,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","keywords":["Computer vision","LLMS"],"articleSection":["Industry"],"inLanguage":"es"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","url":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/","name":"V* - B\u00fasqueda visual guiada LLM multimodal que supera a GPT-4V | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","datePublished":"2024-01-16T14:01:10+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/"]}]},{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/01\/needle-in-haystack.jpg","width":1000,"height":664},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/01\/v-multimodal-llm-guided-visual-search-that-beats-gpt-4v\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"V* &#8211; Multimodal LLM guided visual search that beats GPT-4V"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Su dosis diaria de noticias sobre IA","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/7ce525c6d0c79838b7cc7cde96993cfa","name":"Eugene van der Watt","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/07\/Eugine_Profile_Picture-96x96.png","caption":"Eugene van der Watt"},"description":"Eugene es ingeniero electr\u00f3nico y le encanta todo lo relacionado con la tecnolog\u00eda. Cuando descansa de consumir noticias sobre IA, lo encontrar\u00e1 jugando al billar.","sameAs":["www.linkedin.com\/in\/eugene-van-der-watt-16828119"],"url":"https:\/\/dailyai.com\/es\/author\/eugene\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/posts\/9253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/comments?post=9253"}],"version-history":[{"count":4,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/posts\/9253\/revisions"}],"predecessor-version":[{"id":9261,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/posts\/9253\/revisions\/9261"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/media\/9260"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/media?parent=9253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/categories?post=9253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/es\/wp-json\/wp\/v2\/tags?post=9253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}