Summary: This blog post summarises Fatemeh's Talk on privacy preserving NLP, showing the threats and mitigations with vulnerabilities in the NLP pipeline.

The main objective of Natural Lanuage Processing (NLP) is to read, decipher, understand and make sense of the human language. The general pipeline is to collect data and train a model and deploy it for various applications such as text generation, text understanding and embeddings.

Vulnerabilities in NLP Pipeline:

Assuming we have a rogue Data Scientist with access to training process. They could make gradient attack and try to infer presonal information or attack the trained model or the embeddings.

External Threats include quering the model to extract private information about the data contributors.

Embedding Model Attacks:

Embeddings are a type of real valued vector representations that allow words with similar meanings to have similar representations. The embeddings can be used for response generation, question answering and text classification.

Embedding Inversion attack is when the attacker tries to invert the embedding and find the actual sequence access. Attribute Inference is when attacker tries to infer attributes of x*. Membership Inference attack is when the attacker tries to infer if x* or its context x' is used for the training of the embedding model.

Text-Generation model Attacks

To measure the unintended memorization in text-generation models. Carlin et. al proposed the exposure metric which answers the question of what information can be gained about an inserted canary with the access to the model. To measure this metric, random canaries (strings of random tokens) are inserted in the training data of the model. The probability that is assigned to different sequences and different possible canaries is measured. This probability is used to rank different canaries and the exposure for each canarie is based on the ranking of different possible canaries.

The higher the exposure the easier it is to extract the secret for the secret canarie.


As we have seen the vulnerabilities, let's look at what are the mitigations for the above threats to privacy in NLP.

Private Embeddings:

The word embeddings are perturbed with noise sampled from an exponential distribution, this distribution is calibrated such that the perturbed representations are more likely to be words with similar meanings.

Considering the examples shown in the below image, the perturbed embeddings for words encryption, hockey and spacecraft to have a high privacy is publik-key, futsal and aerojet and the lowest privacy would be the words itself.

The indistinguishability is used to make sure that the perturbations within the context radius we want them to be and not map to random unrelated words.

Differentially Private RNNs

This is a mixture of federated learning and differentially private deep learning. Each user trains a separate model on their own device with their own data. During the training, the gradients of the individual models are clipped and then the updates are applied. These updates are aggregated and are sent to the cloud where noise is added to it and the final update is generated

In this scheme, since each user is training their own model and updates are clipped  for each user. User level privacy is achieved as opposed to private work DP-SGD, which targeted example level privacy

In this scheme, since each user is training their own model and updates are clipped  for each user. User level privacy is acheived as opposed to private work DP-SGD, which targeted example level privacy

Other mitigations which mainly target sensor data privacy but can be extended and used for NLP tasks as well due to the sequential nature of the data.

However, this area is heavily under explored and there is definitely a lot of room for improvement both in the sense of the attacks that exist and the mitigations.

Watch full talk on youtube:

Openmined Privacy Conference Day-1 Part-2

Private NLP Paper list