{"id":11426,"date":"2024-04-08T17:45:24","date_gmt":"2024-04-08T17:45:24","guid":{"rendered":"https:\/\/dailyai.com\/?p=11426"},"modified":"2024-04-09T08:28:17","modified_gmt":"2024-04-09T08:28:17","slug":"inside-big-techs-tussle-over-ai-training-data","status":"publish","type":"post","link":"https:\/\/dailyai.com\/fr\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/","title":{"rendered":"Le bras de fer entre les grandes entreprises technologiques sur les donn\u00e9es d'entra\u00eenement \u00e0 l'IA"},"content":{"rendered":"<p><b>Dans leur qu\u00eate effr\u00e9n\u00e9e de donn\u00e9es d'entra\u00eenement \u00e0 l'IA, les g\u00e9ants de la technologie OpenAI, Google et Meta auraient contourn\u00e9 les politiques de l'entreprise, modifi\u00e9 leurs r\u00e8gles et discut\u00e9 du contournement de la l\u00e9gislation sur le droit d'auteur.\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">A <\/span><a href=\"https:\/\/www.nytimes.com\/2024\/04\/06\/technology\/tech-giants-harvest-data-artificial-intelligence.html?smid=nytcore-ios-share&amp;sgrp=c-cb\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Enqu\u00eate du New York Times<\/span><\/a><span style=\"font-weight: 400;\"> r\u00e9v\u00e8le tout ce que ces entreprises ont fait pour collecter des informations en ligne afin d'alimenter leurs syst\u00e8mes d'IA avides de donn\u00e9es.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Fin 2021, les chercheurs de l'OpenAI ont mis au point un outil de reconnaissance vocale appel\u00e9 Whisper pour transcrire les vid\u00e9os YouTube lorsqu'ils sont confront\u00e9s \u00e0 une p\u00e9nurie de donn\u00e9es textuelles fiables en anglais.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Malgr\u00e9 des discussions internes sur la possibilit\u00e9 d'enfreindre les r\u00e8gles de YouTube, qui interdisent l'utilisation de ses vid\u00e9os pour des applications \"ind\u00e9pendantes\",\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Le NYT a d\u00e9couvert qu'OpenAI avait finalement transcrit plus d'un million d'heures de contenu YouTube. Greg Brockman, pr\u00e9sident d'OpenAI, a personnellement particip\u00e9 \u00e0 la collecte des vid\u00e9os. Le texte transcrit a ensuite \u00e9t\u00e9 introduit dans le GPT-4.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Google aurait \u00e9galement transcrit des vid\u00e9os YouTube afin de r\u00e9colter du texte pour ses mod\u00e8les d'IA, violant ainsi potentiellement les droits d'auteur des cr\u00e9ateurs de vid\u00e9os. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Cette d\u00e9cision intervient quelques jours apr\u00e8s que le PDG de YouTube a d\u00e9clar\u00e9 qu'une telle activit\u00e9 constituerait une violation de la loi sur les droits de l'homme. <\/span><a href=\"https:\/\/dailyai.com\/fr\/2024\/04\/youtube-ceo-warns-openai-about-potential-terms-of-service-violation\/\"><span style=\"font-weight: 400;\">conditions de service de l'entreprise<\/span><\/a><span style=\"font-weight: 400;\"> et de saper les cr\u00e9ateurs.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">En juin 2023, le service juridique de Google a demand\u00e9 que des modifications soient apport\u00e9es \u00e0 la politique de confidentialit\u00e9 de l'entreprise, afin d'autoriser l'acc\u00e8s au contenu de Google Docs et d'autres applications Google pour un plus grand nombre de produits d'intelligence artificielle.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Meta, confront\u00e9 \u00e0 sa propre p\u00e9nurie de donn\u00e9es, a envisag\u00e9 diverses options pour acqu\u00e9rir davantage de donn\u00e9es de formation.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Les dirigeants ont \u00e9voqu\u00e9 le paiement des droits de licence des livres, l'achat de la maison d'\u00e9dition Simon &amp; Schuster, et m\u00eame la collecte de mat\u00e9riel prot\u00e9g\u00e9 par le droit d'auteur sur l'internet sans autorisation, au risque d'\u00e9ventuelles poursuites judiciaires.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Les avocats de Meta ont fait valoir que l'utilisation de donn\u00e9es pour former des syst\u00e8mes d'intelligence artificielle relevait de l'\"usage loyal\", citant une d\u00e9cision de justice de 2015 concernant le projet de num\u00e9risation de livres de Google.<\/span><\/p>\n<h2>Pr\u00e9occupations \u00e9thiques et avenir des donn\u00e9es d'entra\u00eenement \u00e0 l'IA<\/h2>\n<p><span style=\"font-weight: 400;\">Les actions collectives de ces entreprises technologiques soulignent l'importance cruciale des donn\u00e9es en ligne dans le secteur en plein essor de l'IA.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ces pratiques ont suscit\u00e9 des inqui\u00e9tudes quant \u00e0 la violation des droits d'auteur et \u00e0 la juste r\u00e9mun\u00e9ration des cr\u00e9ateurs.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Justine Bateman, r\u00e9alisatrice et auteure, a d\u00e9clar\u00e9 \u00e0 l'Office du droit d'auteur que des mod\u00e8les d'IA s'emparaient de contenus, notamment de ses \u00e9crits et de ses films, sans autorisation ni paiement. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">\"Il s'agit du plus grand vol aux \u00c9tats-Unis, point final\", a-t-elle d\u00e9clar\u00e9 lors d'une interview.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Dans le domaine des arts visuels, MidJourney et d'autres mod\u00e8les d'images ont \u00e9t\u00e9 utilis\u00e9s pour la cr\u00e9ation d'images. <\/span><a href=\"https:\/\/dailyai.com\/fr\/2024\/01\/16000-artist-names-leaked-as-midjourney-styles\/\"><span style=\"font-weight: 400;\">qui a fait ses preuves pour g\u00e9n\u00e9rer des droits d'auteur<\/span><\/a><span style=\"font-weight: 400;\"> comme des sc\u00e8nes de films Marvel.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Certains experts pr\u00e9voyant que les donn\u00e9es en ligne de haute qualit\u00e9 pourraient \u00eatre \u00e9puis\u00e9es d'ici 2026, les entreprises explorent des m\u00e9thodes alternatives, telles que la g\u00e9n\u00e9ration de donn\u00e9es synth\u00e9tiques \u00e0 l'aide de mod\u00e8les d'IA.\u00a0<\/span><span style=\"font-weight: 400;\">Cependant, les donn\u00e9es d'entra\u00eenement synth\u00e9tiques comportent leurs propres risques et d\u00e9fis et peuvent avoir un effet n\u00e9gatif sur la qualit\u00e9 de l'enseignement. <\/span><a href=\"https:\/\/dailyai.com\/fr\/2023\/06\/what-happens-when-ai-starts-consuming-its-own-output\/\"><span style=\"font-weight: 400;\">avoir un impact sur la qualit\u00e9 des mod\u00e8les<\/span><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Le PDG d'OpenAI, Sam Altman, a lui-m\u00eame reconnu la nature limit\u00e9e des donn\u00e9es en ligne lors d'un discours prononc\u00e9 \u00e0 l'occasion d'une conf\u00e9rence technologique en mai 2023 : \"Cela va s'\u00e9puiser\", a-t-il d\u00e9clar\u00e9.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Sy Damle, avocat repr\u00e9sentant Andreessen Horowitz, une soci\u00e9t\u00e9 de capital-risque de la Silicon Valley, a \u00e9galement \u00e9voqu\u00e9 le d\u00e9fi : \"La seule fa\u00e7on pratique pour ces outils d'exister est de pouvoir \u00eatre form\u00e9s sur des quantit\u00e9s massives de donn\u00e9es sans avoir \u00e0 accorder de licence pour ces donn\u00e9es. Les donn\u00e9es n\u00e9cessaires sont si nombreuses que m\u00eame l'octroi de licences collectives ne peut pas fonctionner.<\/span><\/p>\n<p>Le NYT et OpenAI s'affrontent dans un bras de fer <a href=\"https:\/\/dailyai.com\/fr\/2023\/08\/the-new-york-times-may-sue-openai-over-copyright-claims\/\">proc\u00e8s amer sur les droits d'auteur<\/a>Le Times a demand\u00e9 des millions de dollars de dommages et int\u00e9r\u00eats.<\/p>\n<p>OpenAI a r\u00e9pliqu\u00e9 en accusant le Times de <a href=\"https:\/\/dailyai.com\/fr\/2024\/02\/openai-blasts-the-new-york-times-claiming-they-hacked-their-evidence\/\">piratage\" de leurs mod\u00e8les<\/a> pour trouver des exemples de violation du droit d'auteur.<\/p>\n<p>Par \"piratage\", ils entendent \"jailbreaking\" ou \"red-teaming\", qui consiste \u00e0 cibler le mod\u00e8le \u00e0 l'aide d'invites sp\u00e9cialement formul\u00e9es dans le but de le casser pour manipuler les r\u00e9sultats.<\/p>\n<p>Le NYT a d\u00e9clar\u00e9 qu'ils n'auraient pas \u00e0 recourir \u00e0 des mod\u00e8les jailbreaking si les entreprises d'IA \u00e9taient transparentes quant aux donn\u00e9es qu'elles ont utilis\u00e9es.<\/p>\n<p>Il ne fait aucun doute que l'enqu\u00eate interne a contribu\u00e9 \u00e0 rendre le vol de donn\u00e9es de Big Tech inacceptable d'un point de vue \u00e9thique et juridique.<\/p>\n<p><span style=\"font-weight: 400;\">Les poursuites judiciaires s'accumulent,<\/span><span style=\"font-weight: 400;\">\u00a0le paysage juridique entourant l'utilisation de donn\u00e9es en ligne pour la formation \u00e0 l'IA est extr\u00eamement pr\u00e9caire.\u00a0<\/span><\/p>","protected":false},"excerpt":{"rendered":"<p>Dans leur qu\u00eate effr\u00e9n\u00e9e de donn\u00e9es d'entra\u00eenement \u00e0 l'IA, les g\u00e9ants de la technologie OpenAI, Google et Meta auraient contourn\u00e9 les politiques de l'entreprise, modifi\u00e9 leurs r\u00e8gles et discut\u00e9 du contournement de la loi sur le droit d'auteur.  Une enqu\u00eate du New York Times r\u00e9v\u00e8le jusqu'o\u00f9 ces entreprises sont all\u00e9es pour r\u00e9colter des informations en ligne afin d'alimenter leurs syst\u00e8mes d'IA avides de donn\u00e9es. Fin 2021, les chercheurs d'OpenAI ont mis au point un outil de reconnaissance vocale appel\u00e9 Whisper pour transcrire les vid\u00e9os de YouTube lorsqu'ils \u00e9taient confront\u00e9s \u00e0 une p\u00e9nurie de donn\u00e9es textuelles fiables en langue anglaise.  Malgr\u00e9 des discussions internes sur la violation potentielle des r\u00e8gles de YouTube, qui interdisent l'utilisation de ses vid\u00e9os pour des applications \"ind\u00e9pendantes\", le NYT a constat\u00e9 qu'OpenAI a finalement transcrit plus d'un million d'heures...<\/p>","protected":false},"author":2,"featured_media":11427,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[88],"tags":[197],"class_list":["post-11426","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ethics","tag-copyright"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Inside Big Tech\u2019s tussle over AI training data | DailyAI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dailyai.com\/fr\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Inside Big Tech\u2019s tussle over AI training data | DailyAI\" \/>\n<meta property=\"og:description\" content=\"In the frantic pursuit of AI training data, tech giants OpenAI, Google, and Meta have reportedly bypassed corporate policies, altered their rules, and discussed circumventing copyright law.\u00a0 A New York Times investigation reveals the lengths these companies have gone to harvest online information to feed their data-hungry AI systems. In late 2021, OpenAI researchers developed a speech recognition tool called Whisper to transcribe YouTube videos when facing a shortage of reputable English-language text data.\u00a0 Despite internal discussions about potentially violating YouTube&#8217;s rules, which prohibit using its videos for &#8220;independent&#8221; applications,\u00a0 NYT found that OpenAI ultimately transcribed over one million hours\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dailyai.com\/fr\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/\" \/>\n<meta property=\"og:site_name\" content=\"DailyAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-04-08T17:45:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-09T08:28:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1792\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Sam Jeans\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:site\" content=\"@DailyAIOfficial\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sam Jeans\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/\"},\"author\":{\"name\":\"Sam Jeans\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/711e81f945549438e8bbc579efdeb3c9\"},\"headline\":\"Inside Big Tech\u2019s tussle over AI training data\",\"datePublished\":\"2024-04-08T17:45:24+00:00\",\"dateModified\":\"2024-04-09T08:28:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/\"},\"wordCount\":621,\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp\",\"keywords\":[\"Copyright\"],\"articleSection\":[\"Ethics &amp; Society\"],\"inLanguage\":\"fr-FR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/\",\"name\":\"Inside Big Tech\u2019s tussle over AI training data | DailyAI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp\",\"datePublished\":\"2024-04-08T17:45:24+00:00\",\"dateModified\":\"2024-04-09T08:28:17+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#primaryimage\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp\",\"width\":1792,\"height\":1024,\"caption\":\"Data\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/2024\\\/04\\\/inside-big-techs-tussle-over-ai-training-data\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/dailyai.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Inside Big Tech\u2019s tussle over AI training data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#website\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"name\":\"DailyAI\",\"description\":\"Your Daily Dose of AI News\",\"publisher\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/dailyai.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#organization\",\"name\":\"DailyAI\",\"url\":\"https:\\\/\\\/dailyai.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"contentUrl\":\"https:\\\/\\\/dailyai.com\\\/wp-content\\\/uploads\\\/2023\\\/06\\\/Daily-Ai_TL_colour.png\",\"width\":4501,\"height\":934,\"caption\":\"DailyAI\"},\"image\":{\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/DailyAIOfficial\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dailyaiofficial\\\/\",\"https:\\\/\\\/www.youtube.com\\\/@DailyAIOfficial\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/dailyai.com\\\/#\\\/schema\\\/person\\\/711e81f945549438e8bbc579efdeb3c9\",\"name\":\"Sam Jeans\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a24a4a8f8e2a1a275b7491dc9c9f032c401eabf23c3206da4628dc84b6dac5c8?s=96&d=robohash&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a24a4a8f8e2a1a275b7491dc9c9f032c401eabf23c3206da4628dc84b6dac5c8?s=96&d=robohash&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a24a4a8f8e2a1a275b7491dc9c9f032c401eabf23c3206da4628dc84b6dac5c8?s=96&d=robohash&r=g\",\"caption\":\"Sam Jeans\"},\"description\":\"Sam is a science and technology writer who has worked in various AI startups. When he\u2019s not writing, he can be found reading medical journals or digging through boxes of vinyl records.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/sam-jeans-6746b9142\\\/\"],\"url\":\"https:\\\/\\\/dailyai.com\\\/fr\\\/author\\\/samjeans\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Le bras de fer entre les grandes entreprises technologiques sur les donn\u00e9es d'entra\u00eenement \u00e0 l'IA | DailyAI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dailyai.com\/fr\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/","og_locale":"fr_FR","og_type":"article","og_title":"Inside Big Tech\u2019s tussle over AI training data | DailyAI","og_description":"In the frantic pursuit of AI training data, tech giants OpenAI, Google, and Meta have reportedly bypassed corporate policies, altered their rules, and discussed circumventing copyright law.\u00a0 A New York Times investigation reveals the lengths these companies have gone to harvest online information to feed their data-hungry AI systems. In late 2021, OpenAI researchers developed a speech recognition tool called Whisper to transcribe YouTube videos when facing a shortage of reputable English-language text data.\u00a0 Despite internal discussions about potentially violating YouTube&#8217;s rules, which prohibit using its videos for &#8220;independent&#8221; applications,\u00a0 NYT found that OpenAI ultimately transcribed over one million hours","og_url":"https:\/\/dailyai.com\/fr\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/","og_site_name":"DailyAI","article_published_time":"2024-04-08T17:45:24+00:00","article_modified_time":"2024-04-09T08:28:17+00:00","og_image":[{"width":1792,"height":1024,"url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp","type":"image\/webp"}],"author":"Sam Jeans","twitter_card":"summary_large_image","twitter_creator":"@DailyAIOfficial","twitter_site":"@DailyAIOfficial","twitter_misc":{"\u00c9crit par":"Sam Jeans","Dur\u00e9e de lecture estim\u00e9e":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#article","isPartOf":{"@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/"},"author":{"name":"Sam Jeans","@id":"https:\/\/dailyai.com\/#\/schema\/person\/711e81f945549438e8bbc579efdeb3c9"},"headline":"Inside Big Tech\u2019s tussle over AI training data","datePublished":"2024-04-08T17:45:24+00:00","dateModified":"2024-04-09T08:28:17+00:00","mainEntityOfPage":{"@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/"},"wordCount":621,"publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"image":{"@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp","keywords":["Copyright"],"articleSection":["Ethics &amp; Society"],"inLanguage":"fr-FR"},{"@type":"WebPage","@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/","url":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/","name":"Le bras de fer entre les grandes entreprises technologiques sur les donn\u00e9es d'entra\u00eenement \u00e0 l'IA | DailyAI","isPartOf":{"@id":"https:\/\/dailyai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#primaryimage"},"image":{"@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#primaryimage"},"thumbnailUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp","datePublished":"2024-04-08T17:45:24+00:00","dateModified":"2024-04-09T08:28:17+00:00","breadcrumb":{"@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#primaryimage","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp","width":1792,"height":1024,"caption":"Data"},{"@type":"BreadcrumbList","@id":"https:\/\/dailyai.com\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dailyai.com\/"},{"@type":"ListItem","position":2,"name":"Inside Big Tech\u2019s tussle over AI training data"}]},{"@type":"WebSite","@id":"https:\/\/dailyai.com\/#website","url":"https:\/\/dailyai.com\/","name":"DailyAI","description":"Votre dose quotidienne de nouvelles sur l'IA","publisher":{"@id":"https:\/\/dailyai.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dailyai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/dailyai.com\/#organization","name":"DailyAI","url":"https:\/\/dailyai.com\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/","url":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","contentUrl":"https:\/\/dailyai.com\/wp-content\/uploads\/2023\/06\/Daily-Ai_TL_colour.png","width":4501,"height":934,"caption":"DailyAI"},"image":{"@id":"https:\/\/dailyai.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/DailyAIOfficial","https:\/\/www.linkedin.com\/company\/dailyaiofficial\/","https:\/\/www.youtube.com\/@DailyAIOfficial"]},{"@type":"Person","@id":"https:\/\/dailyai.com\/#\/schema\/person\/711e81f945549438e8bbc579efdeb3c9","name":"Sam Jeans","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/secure.gravatar.com\/avatar\/a24a4a8f8e2a1a275b7491dc9c9f032c401eabf23c3206da4628dc84b6dac5c8?s=96&d=robohash&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/a24a4a8f8e2a1a275b7491dc9c9f032c401eabf23c3206da4628dc84b6dac5c8?s=96&d=robohash&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a24a4a8f8e2a1a275b7491dc9c9f032c401eabf23c3206da4628dc84b6dac5c8?s=96&d=robohash&r=g","caption":"Sam Jeans"},"description":"Sam est un r\u00e9dacteur scientifique et technologique qui a travaill\u00e9 dans diverses start-ups sp\u00e9cialis\u00e9es dans l'IA. Lorsqu'il n'\u00e9crit pas, on peut le trouver en train de lire des revues m\u00e9dicales ou de fouiller dans des bo\u00eetes de disques vinyles.","sameAs":["https:\/\/www.linkedin.com\/in\/sam-jeans-6746b9142\/"],"url":"https:\/\/dailyai.com\/fr\/author\/samjeans\/"}]}},"_links":{"self":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts\/11426","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/comments?post=11426"}],"version-history":[{"count":7,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts\/11426\/revisions"}],"predecessor-version":[{"id":11434,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/posts\/11426\/revisions\/11434"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/media\/11427"}],"wp:attachment":[{"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/media?parent=11426"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/categories?post=11426"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dailyai.com\/fr\/wp-json\/wp\/v2\/tags?post=11426"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}