{"version":"1.0","provider_name":"DailyAI","provider_url":"https:\/\/dailyai.com\/pt","author_name":"Sam Jeans","author_url":"https:\/\/dailyai.com\/pt\/author\/samjeans\/","title":"Inside Big Tech\u2019s tussle over AI training data | DailyAI","type":"rich","width":600,"height":338,"html":"<blockquote class=\"wp-embedded-content\" data-secret=\"ocdgWKl2PD\"><a href=\"https:\/\/dailyai.com\/pt\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/\">Por dentro da luta das grandes empresas de tecnologia pelos dados de treino da IA<\/a><\/blockquote><iframe sandbox=\"allow-scripts\" security=\"restricted\" src=\"https:\/\/dailyai.com\/pt\/2024\/04\/inside-big-techs-tussle-over-ai-training-data\/embed\/#?secret=ocdgWKl2PD\" width=\"600\" height=\"338\" title=\"&quot;Por dentro da disputa da Big Tech sobre os dados de treinamento de IA&quot; - DailyAI\" data-secret=\"ocdgWKl2PD\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" class=\"wp-embedded-content\"><\/iframe><script>\n\/*! This file is auto-generated *\/\n!function(d,l){\"use strict\";l.querySelector&&d.addEventListener&&\"undefined\"!=typeof URL&&(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&&!\/[^a-zA-Z0-9]\/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret=\"'+t.secret+'\"]'),o=l.querySelectorAll('blockquote[data-secret=\"'+t.secret+'\"]'),c=new RegExp(\"^https?:$\",\"i\"),i=0;i<o.length;i++)o[i].style.display=\"none\";for(i=0;i<a.length;i++)s=a[i],e.source===s.contentWindow&&(s.removeAttribute(\"style\"),\"height\"===t.message?(1e3<(r=parseInt(t.value,10))?r=1e3:~~r<200&&(r=200),s.height=r):\"link\"===t.message&&(r=new URL(s.getAttribute(\"src\")),n=new URL(t.value),c.test(n.protocol))&&n.host===r.host&&l.activeElement===s&&(d.top.location.href=t.value))}},d.addEventListener(\"message\",d.wp.receiveEmbedMessage,!1),l.addEventListener(\"DOMContentLoaded\",function(){for(var e,t,s=l.querySelectorAll(\"iframe.wp-embedded-content\"),r=0;r<s.length;r++)(t=(e=s[r]).getAttribute(\"data-secret\"))||(t=Math.random().toString(36).substring(2,12),e.src+=\"#?secret=\"+t,e.setAttribute(\"data-secret\",t)),e.contentWindow.postMessage({message:\"ready\",secret:t},\"*\")},!1)))}(window,document);\n\/\/# sourceURL=https:\/\/dailyai.com\/wp-includes\/js\/wp-embed.min.js\n<\/script>","thumbnail_url":"https:\/\/dailyai.com\/wp-content\/uploads\/2024\/04\/DALL\u00b7E-2024-04-08-18.42.46-Visualize-a-dramatic-and-futuristic-scene-inside-a-vast-data-center-filled-with-towering-server-racks-emitting-blue-and-red-lights-casting-a-vibrant.webp","thumbnail_width":1792,"thumbnail_height":1024,"description":"In the frantic pursuit of AI training data, tech giants OpenAI, Google, and Meta have reportedly bypassed corporate policies, altered their rules, and discussed circumventing copyright law.\u00a0 A New York Times investigation reveals the lengths these companies have gone to harvest online information to feed their data-hungry AI systems. In late 2021, OpenAI researchers developed a speech recognition tool called Whisper to transcribe YouTube videos when facing a shortage of reputable English-language text data.\u00a0 Despite internal discussions about potentially violating YouTube&#8217;s rules, which prohibit using its videos for &#8220;independent&#8221; applications,\u00a0 NYT found that OpenAI ultimately transcribed over one million hours"}