Privacy issues surrounding ChatGPT and other generative AI
Generative AI, including ChatGPT, offers unprecedented opportunities, but also risks. These include privacy issues. In this article, we briefly outline the problems, and give practical guidance on how to solve them.
Generative AI is a form of artificial intelligence that can create new content, such as texts, images and videos, based on textual instructions from the user. Generative AI can also generate computer code, or perform 'sentiment analysis' (see also: Generative AI: Privacy and tech perspectives). The best-known and most recent example of generative AI is the chatbot ChatGPT, from the company OpenAI, which can produce texts with human qualities.
All kinds of problems
Generative AI offers many opportunities, in many areas, but also raises all kinds of issues - ethical, legal, and privacy. Such as:
- copyright infringement (because the system acquired its knowledge based on existing texts and images, which may still be copyrighted);
- ingrained prejudices (bias), for example in terms of gender and ethnicity (also because of the learning process based on existing texts and images, which could potentially be discriminatory or racist);
- various privacy issues, both in terms of inputs and outputs. See the examples below;
- and overestimating the capabilities of AI.
Example 1. In terms of (learning) input, i.e. how the AI knowledge came about: Dutch neighbourhood police officers are not allowed to structurally or systematically monitor social media to spot any problems in the neighbourhood. So why should ChapGPT be allowed to rely on all such texts?
Example 2. In terms of output (and the instructions required to do so), suppose you feed the system the transcript of a meeting. With the request to make a summary (minutes) of this. So then the system has knowledge of individuals, and what they said. Plus potentially sensitive company information. What does ChatGPT do with that? (More on that under 'Sensitive information' below.)
Incorrect output creates unrest
In particular, the dissemination of inaccurate output is causing unrest. There are two basic forms involved:
- manipulated output, for example in the form of deepfakes (Photoshopped videos);
- fabricated output ('hallucination'), which is not based on training data, but is presented as a serious answer (which is usually difficult or impossible to check, as no sources are usually mentioned).
Both forms can result in user or public manipulation, and even pose a risk to public safety. (See also: Generative AI ethics: 8 biggest concerns).
Besides manipulated output, which can affect a person's privacy, there is also the risk of sharing too much personal and confidential information with generative AI.
There are already several cases where individuals or companies entered large amounts of personal data into the algorithm, which was then leaked. That sensitive information could include, for example: medical records, financial information, or personal contacts (see also: AI's privacy pandora's box: The risks of oversharing with generative AI). This information can be used by the AI in ways that cannot be reliably predicted.
Privacy First is impressed by the potential of generative AI but also sees the risks. How do you deal with them in practice?
We advise organisations (and individuals) wishing to use generative AI to carefully apply user guidelines. That means being aware of both the risks of sharing information (with generative AI tools) and the limitations of the outcomes of these tools.
Furthermore, an ethical framework is desirable for the deployment of these tools, especially with regard to the deliberate production of manipulated output.
Further, keep in mind that globally accepted privacy principles such as data quality, data collection limitation, target specification, usage limitation, security, transparency, accountability and individual participation apply to all systems processing personal data, including generative AI.