April 22, 2024

The Information Commissioner’s Office (“ICO”) has published the latest in its series of consultations on generative AI and data protection. Following earlier calls for evidence on the lawful basis for web scraping to train models and purpose limitation (on which we’ve commented here and here), the latest instalment concerns the accuracy of generative AI models.

The call for evidence is particularly concerned with the accuracy of training data and output data within AI systems. As the ICO explains, “if inaccurate training data contributes to inaccurate outputs, and the outputs have consequences for individuals, then it is likely that the developer and the deployer are not complying with the accuracy principle”.

According to the ICO’s analysis, how accurate the outputs of a generative AI model need to be in order to comply with the accuracy principle will depend upon the model’s purpose. If organisations are developing models which have a “purely creative purpose”, the ICO indicates that it will be less important that their outputs are accurate, as compared to those models which are used to make decisions about people or are relied upon as a source of information.

If developers do not intend a model to be relied on for purposes that require accuracy, the ICO suggests that appropriate mechanisms are in place to ensure that the model is not used for such purposes (for example through contractual terms limiting types of usage). Furthermore, end-users should be made aware of appropriate uses of outputs, and the risk of so-called ‘hallucinations’ (i.e. incorrect and unexpected outputs) so as to mitigate the risks of their placing too much reliance on the accuracy of the model. The ICO also recommends, and seeks views on, potential measures such as labelling outputs with watermarks so that it is clear that information has been generated by AI, using ‘confidence scores’ indicating the reliability of output, and measuring users’ interactions with a model to ensure that it is being used for its intended purpose.

On the subject of training data, the ICO says that it expects developers to have a “good understanding” of the accuracy of the training data they are using to develop models, particularly if the model is being used for a purpose which requires accurate outputs. It suggests that training data should be appropriately ‘curated’ to ensure that a model complies with the accuracy principle, and that developers know whether it is “made up of accurate, factual and up to date information, historical information, inferences, opinions, or even AI-generated information relating to individuals”. It also expects that developers understand and measure the impact that accurate training data has on accurate outputs, and invites comment on how to assess, measure, and document the relationship between inaccurate training data and inaccurate model outputs. Finally, the ICO makes clear that it expects developers to consider any limitations to their models as a result of inaccurate training data, and to communicate such limitations to end-users so as to avoid harm.

Commenting on the latest call for evidence, the Information Commissioner, John Edwards, said, “in a world where misinformation is growing, we cannot allow misuse of generative AI to erode trust in the truth. Organisations developing and deploying generative AI must comply with data protection law – including our expectations on accuracy of personal information.”

The call for evidence closes on 10 May 2024, and more information can be found here.

Expertise

Subscribe: Wiggin's expertise, delivered direct to you