Insights UK Department for Science, Innovation & Technology publishes paper on Emerging Processes for Frontier AI safety

Contact

Leading up to the UK Government’s AI Safety Summit on 1 and 2 November 2023, the Government published two papers: the Capabilities and Risks from Frontier AI (reported by Wiggin on 2 November) and Emerging Processes for Frontier AI Safety. “Frontier AI” is defined as highly capable general-purpose AI models that can perform a wide variety of tasks and match or exceed the capabilities present in today’s most advanced models (e.g ChatGPT, Claude and Bard), as opposed narrow AI, an AI system that performs well on a single task or narrow set of tasks (e.g. chess).

The Safety paper aims to consolidate leading thinking in AI safety gleaned from research, companies, civil society and international fora (e.g. G7 and OECD) and build on those approaches in order to inform dialogue on AI safety. In particular, it provides a potential list of frontier AI safety processes and practices covering nine areas:

  1. Responsible capability scaling. These are processes to identify, monitor and mitigate risks such as conducting risk assessments of all plausible and consequential risks before developing or deploying AI and during its lifecycle, and defining risk thresholds and the mitigations to be put in place when such thresholds are met. Risk thresholds should be defined based on the outcomes that would constitute a breach of the threshold which are linked to “dangerous capabilities”, defined as the ability to cause significant harm due to intentional misuse or accident.

Responsible capability scaling should include:

  1. Model evaluations and red teaming, including by deployers of the AI and external evaluators. AI models should be evaluated (e.g. by benchmarking) for dangerous capabilities (e.g. cyber attack, deception and manipulation, weapons development), lack of controllability (i.e. use in unintended ways), societal harms (e.g. bias and discrimination) and system security. Red teaming involves observing from the perspective of an adversary to understand how the system could be compromised or misused.
  2. Model reporting and information sharing. This would include sharing both model-agnostic information, such as risk assessments and mitigations, and model-specific information with government bodies, other AI organisations, independent third parties, users and the public as appropriate in each case. Information for deployers about safe practices for model usage and transparency, such as any potential biases that the training data may contain, and terms and conditions which specify prohibited uses of the AI, can all be used to support risk mitigation. For the public, the paper refers to commonplace standardised documentation for consumer products (e.g. information on food packaging or accompanying medication).
  3. Security controls including securing model weights. Attacks against AI systems could result in physical damage, business interruption and loss of sensitive or confidential data. Security practices include the development of “baked in” security for the AI systems and the models themselves, to protect from both external and insider security risks, together with incident response, escalation and remediation plans. Security measures include ensuring software, hardware and data come from trusted sources, providing users with necessary security updates, and implementing appropriate access controls to mitigate insider risks.
  4. Reporting structure and vulnerabilities. Post-development vulnerabilities could include practices by bad actors seeking to bypass the AI developer’s safety features, inducing the AI to exhibit behaviours for which it was not intended, and extracting sensitive or private information as well as incidents involving the AI providing incorrect outputs which cause harm (e.g. medial misdiagnosis) or exhibiting behaviour revealing bias and discrimination. These can be addressed by a vulnerability management process to enable outsiders to report previously unidentified safety and security issues.
  5. Identifiers of AI-generated material. The potential for bad actors to produce harmful or false AI-generated information means that it must be possible to distinguish between AI-generated content and content generated by a human. Authentication solutions (such as watermarking and creating databases of content known to be AI-generated) exist and, although they are not fully reliable, could at least present a degree of friction and reduce incentives to pass off AI-generated content as real.
  6. Prioritising research on risks posed by AI. Frontier AI companies have a significant role to play in facilitating open and robust research into AI safety and should conduct research into AI safety tools (e.g. to improve watermarking) and collaborate with external researchers to study the downstream impacts of their systems (e.g. on the workforce).

In addition, the Government suggests two additional practices:

  1. Preventing and monitoring model misuse. The paper points out that releasing an AI model via APIs (through which access to the model is controlled by the AI provider) provides significantly more affordances to address misuse than publicly releasing models (i.e. open source or open access models). Practices addressing misuse could include applying content filters to both AI inputs and outputs to block or ignore harmful requests and responses (e.g. a user request to cause harm), removing users who misuse the AI, know-your-customer checks on users, a system of user verification, and being prepared to withdraw harmful AI in worst case scenarios, all of which would need to be balanced against the freedoms and rights of users.
  2. Data input and control audits. This section refers to the need to respect legal requirements such as data privacy (e.g. establishing a legal basis to process certain types of training data) and copyright, data minimisation, consideration of data sources and provenance, auditing datasets for private or sensitive information, bias, harmful content or misinformation, continuous data improvement, and facilitating external scrutiny of training datasets (subject to appropriate technical and organisational safeguards).

Many of the practices include reporting information (e.g. risk assessment process, mitigation measures and vulnerabilities) to relevant government authorities (i.e. those with a mandate to receive information from frontier AI organisations) and other AI companies. The paper recognises and addresses the risks of sharing information on AI vulnerabilities and commercially sensitive information.

For access to the Paper, click here.