UK Copyright and Artificial Intelligence Consultation

2025-02-25

Table of Contents

HMG Opened a consultation on Copyright and Artificial Intelligence

"As part of our national mission to grow the economy, the government is committed to supporting the growth of the creative industries and AI sectors while recognising the value of human-centred creativity."

We have provided our answers publicly for transparency (and to help our AI wielding overlords have more training data to work with).

The below are the direct answers to the questions, which are generally reductive in nature. Our position is that existing licences should be respected. While strengthening copyright is likely the best outcome, our current assessment from the proposal attached to this consultation is that taking some action will be better than nothing, and we need to work towards a useful compromise.

We would love to discuss further!

Copyright and Artificial Intelligence

4. Do you agree that option 3 - a data mining exception which allows right holders to reserve their rights, supported by transparency measures - is most likely to meet the objectives set out above?

Yes

It is vital that the UK defines how companies should develop, and deploy, AI systems in a way that compromises for both data scrapers and rights holders. By allowing companies to train their models in a 'fair' way, everyone can be transparent and the UK tech sector can build upon a solid foundation.

5. Which option do you prefer and why?

Option 3: A data mining exception which allows right holders to reserve their rights, supported by transparency measures

Building a strong foundation of what can, and must not, be included in data sets for machine learning will enable AI focused businesses the option to comply, which they don't currently have.

However, one part of the exception is particularly problematic:

"It would do this by granting right holders enhanced control over how and when their work is used by AI firms. It would specify whether they require payment for this and would be supported by enhanced transparency over model inputs and outputs."

This already exists in current licensing and copyright. No rights' holder gave permission to have their copyright protected works used for AI, yet it has been used. The existing rules and laws must be enforced. Anyone is free to adjust their licensing to allow for AI use. This proposal is a nicely worded opt-out system rather than an opt-in system.

Our proposed approach: Exception with rights reservation

6. Do you support the introduction of an exception along the lines outlined in section C of the consultation?

No

The exception states: "If a right holder has reserved their rights through an agreed mechanism, a licence would be required for data mining."

The agreed upon mechanism must be one of opt in, rather than opt out. There is an existing agreed upon mechanism - copyright & licences, that have not been respected.

The data collector must adhere to all existing available copyright and licence information at point of collection as it stands before introducing anything additional ontop.

7. If so, what aspects do you consider to be the most important?

No

8. If not, what other approach do you propose and how would that achieve the intended balance of objectives?

A standardised set of criteria would enable everyone to know where they stand. For example any scraping performed on a website that has a general copyright licence should have that applied.

The licences applied to open source code repositories (especially copyleft) must be respected by the scrapers.

9. What influence, positive or negative, would the introduction of an exception along these lines have on you or your organisation? Please provide quantitative information where possible.

Knowing exactly what will cause works to be included / excluded from scraping allows everyone to directly determine on a case by case basis what would be included, which would be very positive.

Having this would allow us the option to safely implement Generative AI features, that we don't currently use because of the copyright theft issues.

This proposal ignores the difference between creating, and using a model, and who would accept responsibility if a model is used, and the results presented to a third party. For example, as a small company we are very worried about liability from exposing AI model output to other parties.

10. What action should a developer take when a reservation has been applied to a copy of a work?

Once a standard operating model is defined, including the respect of existing licences, this will form the basis of the training data. No data should be used in training if it does not meet that criteria.

Unfortunately, due to the way the models are trained, it would not be possible to retrospectively opt out from a previously trained data set. A data set that is shown to have not respected the licences at the time of creation should have consequences.

The responsibility must be with the data set creator / model creator, rather than operators of the model. It would be prohibitive if companies that used the model were responsible for ensuring compliance.

Standard copyright laws must still apply, and any violation of the agreed upon inclusion mechanism should result in compensation to the affected parties.

Egregiously violations of the framework against large numbers of parties should result in the model being made publicly available (open sourced).

12. Do you agree that rights should be reserved in machine-readable formats? Where possible, please indicate what you anticipate the cost of introducing and/or complying with a rights reservation in machine-readable format would be.

While I fully agree with defining a machine-readable format being an ideal option, all existing licence formats must be respected where possible, and they are currently not. For example, almost all websites have a copyright notice, and many public code repositories have a licence file.

Technical Standards

13. Is there a need for greater standardisation of rights reservation protocols?

Yes, however, existing licensing is being ignored. Code repositories generally include licences as described at https://api.github.com/licenses, so the idea that these licences were not already available is false. They were not considered by the data scrapers.

14. How can compliance with standards be encouraged?

By allowing companies to show they are compliant with a set of rules created related to Copyright and Artificial Intelligence as proposed, this might in itself encourage the better outcomes of respecting the standards and norms, as other businesses and consumers will be more likely to prefer to integrate models from companies that are using ethical methods.

To further emphasis this point, if we were to include AI tools within our products, I would want to provide sources for exactly which AI model is being used, and who created it.

This should be the minimum requirement for companies integrating AI into products.

15. Should the government have a role in ensuring this and, if so, what should that be?

The government should specify what compliance and good operating practice must look like. Almost all AI related tooling is unethical in its current implementations.

Transparency

22. Do you agree that AI developers should disclose the sources of their training material?

Yes, this is fundamental to the trust in both the output of the AI, and the ethical creation of the model.

23. If so, what level of granularity is sufficient and necessary for AI firms when providing transparency over the inputs to generative models?

Per data type:

Type of data, volume of data (bytes), source

at the very least a source should be a web domain, or other portal address

But this would not allow individual rights holders to see if their content had been stolen - which is why respecting existing licences and copyright is critical. A database to query if a URL, string over a certain length, or filehash has been collected would be ideal for rights holders.

24. What transparency should be required in relation to web crawlers?

Web crawlers should respect existing copyright & licences. Clearly display where they are from, providing information about how to opt out, and query data collected as in question 23.

Crawlers found to be breaking the standards set on what can be crawled should face repercussions.

25. What is a proportionate approach to ensuring appropriate transparency?

If companies can collect and process these huge quantities of data, saying it's not proportional for them to know what it is, or who has legal rights to it should not be a viable defence.

All data collected is their responsibility to manage it appropriately, protect PII and ensure that no harm is done to the licence holder.

26. Where possible, please indicate what you anticipate the costs of introducing transparency measures on AI developers would be.

Unqualified to Answer

27. How can compliance with transparency requirements be encouraged, and does this require regulatory underpinning?

As per Answer 14, By allowing companies to show they are compliant with a set of rules created related to Copyright and Artificial Intelligence as proposed, this might in itself encourage the better outcomes of respecting the standards and norms, as other businesses and consumers will be more likely to prefer to integrate models from companies that are using ethical methods.

28. What are your views on the EU’s approach to transparency?

Unqualified to answer

AI Output labelling

45. Do you agree that generative AI outputs should be labelled as AI generated? If so, what is a proportionate approach, and is regulation required?

Yes

Many human hours are being wasted responding to AI generated outputs. All AI tools should be required to explicitly attach a reference to them being AI generated with their output.

There is no reason to not explicitly state they are AI outputs, unless to deceive a human.

46. How can government support development of emerging tools and standards, reflecting the technical challenges associated with labelling tools?

Set clear legislations like the UK Consumer Protections for how companies can act when using AI against consumers, and repercussions for not following it.

47. What are your views on the EU's approach to AI output labelling?

Strongly agree with the need for transparency in disclosure and watermarking.