Creators urge Ottawa to force disclosure of 鈥榖lack box鈥� AI system training

Canadian creators and publishers want the government to do something about the unauthorized and usually unreported use of their content to train generative artificial intelligence systems.

But AI companies maintain that using the material to train their systems doesn鈥檛 violate copyright, and say limiting its use would stymie the development of AI in Canada.

The two sides are making their cases in recently published submissions to a consultation on copyright and AI being undertaken by the federal government as it considers how Canada鈥檚 copyright laws should address the emergence of generative AI systems like OpenAI鈥檚 ChatGPT.

Generative AI can create text, images, videos and computer code based on a simple prompt, but to do that, the systems must first study vast amounts of existing content.

In its submission to the government, Access Copyright argued most and potentially all large language models 鈥渁re currently profiting from unauthorized use and reproduction of copyright protected works.鈥�

It鈥檚 taking place in a 鈥渂lack box,鈥� according to Access Copyright, which represents writers, visual artists and publishers.

鈥淩ightsholders know it is happening, but due to the information asymmetry between themselves and AI platforms, they cannot determine who is conducting the activity, with whose works, and have no mechanism to stop it from happening.鈥�

Music Canada, which represents the country鈥檚 major record labels, said last year, a fake AI-generated song mimicking the voices of Drake and The Weeknd 鈥渕ade one thing abundantly clear: AI models and systems have already ingested massive amounts of proprietary datasets without authorization from the source of the data or rightsholders.鈥�

The Writers鈥� Guild of Canada asked the government to start with implementing basic disclosure and reporting obligations. It said developers have all the knowledge of the work that is being mined and how it鈥檚 being used, while creators have none of that information.

Some organizations have signed licensing deals with AI companies. But the Canadian Authors Association said rightsholders face 鈥渋mmense obstacles鈥� in licensing their content 鈥渂ecause they are being kept in the dark as to which of their works are being used鈥� by which companies.

It asked Canada to clarify that text and data mining are subject to copyright laws.

Numerous lawsuits are underway in the United States over the use of copyrighted materials by generative AI systems, including one launched this week by the world鈥檚 biggest record labels against two AI music generators.

The Canadian Media Producers Association said legal cases illustrate the problem posed by a lack of transparency, citing one case in which the AI company argued the rightsholder couldn鈥檛 proceed with the infringement allegation unless they could specify the exact work used for training.

鈥淩ightsholders will also undoubtedly face similar evidentiary issues as many datasets used to train Generative AI systems are purportedly destroyed after the initial training is complete,鈥� it said.

The group said it鈥檚 an issue that 鈥渄emands immediate attention鈥� and asked the government to implement transparency requirements.

But AI companies maintain the kind of transparency rightsholders are asking for isn鈥檛 realistic.

Microsoft told the government training large-scale AI systems involves 鈥渧ast volumes鈥� of data, and companies shouldn鈥檛 have to keep records of that or disclose the content that is used for training.

鈥淚t would not be feasible to record such information and any such requirement would inhibit AI development,鈥� it said.

The company argued it is not 鈥渃opyright infringement to analyze works and learn concepts and facts.鈥�

Google said AI training is already exempted under existing copyright law, though the government should adopt an exemption to make that explicit.

Google said requiring permission to use content for training purposes would expose competitively sensitive information and 鈥渨ould effectively block the development and use of large language models and other types of cutting-edge AI.鈥�

It also said AI developers don鈥檛 have access to accurate information about copyright status.

鈥淚n fact, there is no such source of truth anywhere in the world. Thus, complying with disclosure rules may simply prove impossible from the start.鈥�

Canadian AI company Cohere said using content for training AI systems works similarly to how an individual reads books to become more informed.

The company said the process doesn鈥檛 violate copyright, and argued that needs to be clear in the law. Otherwise, 鈥淐anada鈥檚 ambitions to be the home of world-leading AI companies and ecosystems鈥� could be undermined.

The Council of Canadian Innovators, which represents the Canadian tech sector, said disclosure requirements would harm smaller companies as opposed to their Big Tech rivals. It warned this would 鈥渟eriously hamper the potential of Canadian companies to scale significantly.鈥�

READ ALSO:

Anja Karadeglija, The Canadian Press

台湾MM裸聊室