Privacy Daily is a service of Warren Communications News.

CNIL Tool Tracks Open-Source AI Models' Use of Personal Data

French watchdog CNIL Thursday published a tool for tracking open-source AI models to help data subjects exercise their rights to access personal information, object to its use for AI and delete it.

Sign up for a free preview to unlock the rest of this article

Privacy Daily provides accurate coverage of newsworthy developments in data protection legislation, regulation, litigation, and enforcement for privacy professionals responsible for ensuring effective organizational data privacy compliance.

The tool provides a mechanism for navigating the genealogy of AI models published in open source, CNIL said. Such models allow the technology to be offered to a wider audience, enabling researchers, companies and individuals to access numerous models for translation or text or image generation, it said.

Many users also download the models to modify or specialize them using new data, after which the new models are again made available as open source, the DPA said. Every open-source model is thus part of a genealogy that consists of the set of models from which it directly originates or after several modifications, and to which it has contributed in their creation.

Being able to describe and search within the genealogy of such a model is an essential step to understanding how the model was created, CNIL said.

Academics have established that it's often possible to extract information from a model's training data simply by accessing the model itself, the watchdog said. That appears in generative models when they produce data that closely resembles elements from the training dataset, it said.

When a model has been partly trained on personal data, which is usually the case for generative AI, the European Data Protection Board has stated that in most cases it should be considered subject to the GDPR, CNIL said (see 2412180004). If testing shows that it's not possible to extract or infer personal data from the model, the GDPR doesn't apply.

As an experiment, CNIL explored scenarios for exercising the rights of objection, access or erasure for people whose data may have been memorized in an open source AI model. The first step in the process is to identify, based on the knowledge that a model has stored a person's data, other models in its lineage that may also have memorized the data, the watchdog said.

Its demonstration tool allows exploration of the lineage of an AI model available on the HuggingFace platform, the DPA added.