“Surveillance capitalism” as Shoshana Zuboff explains it is a new kind of Business model that runs predictive models on your behavior data from social media and uses those predictions to enhance and grow their business. In a more colloquial notion it refers to an economic system centered around the commodification of personal data with the core purpose of profit-making. Since personal data can be commodified it has become one of the most valuable resources on earth.

gif by MOST EXPENSIVEST

Before we get started let’s get acquainted with a few terms :-

- Behavioral surplus :- Behavioral Surplus is a huge concept and very closely related to behavioral economics but for starters it is data that goes beyond online product and service use. It can include information related to a person's location, age, profession, lifestyle, habits, and a range of personal and professional preferences.
- Instrumentarian Power :- Zuboff's concept of "instrumentarian power" is the power of governments and corporations to use technology and infrastructure to manipulate people in subtle but effective ways. It turns people into "instruments" that are used in predictable ways to achieve the government's and the corporations' goals.
- Behavioral future markets :- Prediction products are then traded in what I call “behavioral futures markets,” where surveillance capitalists sell certainty to their business customers. Google's “click-through rate” was the first globally successful prediction product, and its ad markets were the first to trade in human futures
**.** - Surveillance Dividend :- There is no formal definition of this term yet but the way New York times describes it is “Google created the first insanely lucrative markets to trade in human futures, what we now know as online targeted advertising, based on their predictions of which ads users would click. Between 2000, when the new economic logic was just emerging, and 2004, when the company went public, revenues increased by 3,590 percent. This startling number represents the “surveillance dividend”. Please follow this article for more information.

So according to Zuboff These companies have been collecting data or behavioral surplus from our internet activity and, selling it to companies that are running predictive Machine Learning models on this data to predict future behavior or the product someone will be interested in at the very moment or later and send recommendations or advertisements on the same. These predictive products are being sold on what is called “behavioral future markets” where “certainty” is being sold by these surveillance capitalists to their business customers, who are basically paying or putting their trust on the “Instrumentarian power” of these capitalists. That is how they are exponentially increasing their “Surveillance Dividend”

gif by Hani Mitwasi

Some of the most common business models of surveillance capitalism according to Zuboff are :-

- Voting Manipulation :- According to Zuboff the trump campaign as well as the Brexit campaign took help from a data analytics company called “Cambridge Analytica” who used the data of millions of people to build predictive models of their future behaviors and to even send manipulative recommendations to change their mind about who they will vote for..
- Guaranteed footfalls :- Zuboff has also brought into light the fact that different fast food chains have exploited and manipulated data of a famous augmented reality game, where players have to chase some characters on the real world which will show up in their smartphone. This game had been manipulated to meet the needs of these fast food chains for getting “Guaranteed footfalls” outside their stores.
- Using subliminal advertising on data from Social Media and internet activity :- This is one of the most common use cases of user data, detecting emotions from the images that people upload on a social media platform or from other internet activities, and then advertise products accordingly(also known as subliminal advertising).

All of the above mentioned examples can be a part of the hypothetical “5-factor personality test” click here for more.

gif by Ryan Seslow

Now let’s get into the technology side, are the privacy preserving tools already existing in the market or in research good enough? Yes they can be, and they are already solving a lot of the world’s problems related to Data Privacy including those in healthcare and finance. We’ve to keep in mind that these are privacy preserving technologies, they come into play only in preserving the identity of the data holder or the sensitive information that can be leaked from the data! Some of these technologies are :-

- Differential Privacy :-
**Differential privacy**is a system for publicly sharing information about a data set by describing the patterns of groups within the data set while withholding information about individuals in the data set. - Federated learning :- Federated Learning is an algorithm that was developed by Brendan Mc.Mahan et.al. federated Learning is a really bizarre way of training predictive models on someone’s personal data without pulling the data to these servers, but instead sending the model to the local devices where the data exists , training the models there and sending the results to the server.
- Encrypted Deep Learning :- Encrypted deep learning is a field that is sort of a merge between deep learning & cryptography. It’s a brand new field and uses the same core technologies that are used in federated learning/encrypted federated learning and applies it to the actual prediction of a deep learning model, but the goal is to predict on encrypted data while the model you’re using for prediction is also encrypted.

Surveillance capitalism has become an integral part of the society now, and as I said earlier the technologies that exist are more about protecting the identity of the people whose data is being exploited, for making predictions. Almost every application of the tech industry these days requires huge amounts of data. Machine Learning and data science is an integral part of the whole ecosystem, and all of that when augmented is a small fraction of “Surveillance Capitalism”. Well these technologies can help, but apparently only in the privacy preserving part of this entire workflow and that in itself is solving a lot of the world’s problems. Internet companies have been using behavioral economics since the dawn of the “dot com buzz”, so it is a part of the entire workflow now, but if you’re wondering if your identity is secure or not, while your data is being used to train these machine learning models(business models), then a lot of research is going into that domain. Some of the names that come to my mind when I think of research in privacy preserving technologies are :-

- Openmined.org - OpenMined is an open source community whose goal is to make the world more privacy preserving by lowering the barrier-to-entry to private-ai technologies.
- Google:- Google has been doing a lot of research work in privacy preserving AI using Differential Privacy (more information in this link). It has its own library for Differential; privacy which contains a set of libraries of ε- and (ε, δ)-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information. The functionality is currently available in C++, Go and Java. Apart from that google is also very much concerned about its privacy preserving research measures, it recently banned several malicious apps check this article for more information.
- Facebook :- Facebook has been doing some really impressive work for data privacy, as announced by Zuckerberg himself at Facebook’s annual conference F8 last year, Facebook is currently focusing on 6 principles that are helping them improve their privacy, i.e. Private interactions, reduced permanence, Encryption, Safety, Interoperability, & Secure data Storage. Some of these steps that the company has taken include end-to-end encryption on their messaging platforms and also ensuring privacy on their products before shipping them.

Follow this link to see the progress in privacy preserving machine learning that was made in the previous year.

I’d like to conclude by saying that! “Surveillance Capitalism” is one of the business models that internet companies use to make money, and behavioral economics has been around the corner for a long time, the downside being their business model i.e. them selling it to their business customers who are using it to get votes for presidential campaigns(NYTimes article), guaranteed footfalls for food joints, from games like Augmented Relaity games! These are the places where eyebrows rise. Also on the other hand it is a little funny to think that we are accomplices of this whole process, no matter what happens we cannot give up using social media right? Neither can we give up using smart devices because they’re useful and an integral part of our own lives.

Credits :-

- "Behavioral Surplus" Is Not Evil – It's Essential to Customer Experience :- https://blog-idcuk.com/behavioral-surplus-for-cx/
- Surveillance Capitalism wiki page :- https://en.wikipedia.org/wiki/Surveillance_capitalism
- Privacy in the era of Data Bureaucracy :- https://medium.com/secure-and-private-ai-writing-challenge/privacy-in-the-era-of-social-bureaucracy-16f34da710b3
- Explainer: what is surveillance capitalism and how does it shape our economy? :- https://theconversation.com/explainer-what-is-surveillance-capitalism-and-how-does-it-shape-our-economy-119158
- NY times article “You Are Now Remotely Controlled” :- https://www.nytimes.com/2020/01/24/opinion/sunday/surveillance-capitalism.html
- Evaluation of Privacy-Preserving Technologies for Machine Learning :- https://medium.com/outlier-ventures-io/evaluation-of-privacy-preserving-technologies-for-machine-learning-8d2e3c87828c
- Surveillance Capitalism Core Concepts :- https://www.youtube.com/watch?v=FoEiiKbFrdw
- A Threat to Global Democracy: How Facebook & Surveillance Capitalism Empower Authoritarianism :- https://www.youtube.com/watch?v=XwXffKw73jA
- Derren Brown Tricks Advertisers With Subliminal Messaging :- www.youtube.com/watch?v=43Mw-f6vIbo
- HiNative question(Would anyone explain the meaning of "instrumentarian power" by Shoshana Zuboff?) :- https://hinative.com/en-US/questions/12597995#:~:text=Zuboff's%20concept%20of%20%22instrumentarian%20power,government's%20and%20the%20corporations'%20goals.
- Project-Syndicate Surveillance Capitalism blog :- https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj09PXZpNDrAhVzzTgGHbjPD_EQFjABegQICxAE&url=https%3A%2F%2Fwww.project-syndicate.org%2Fonpoint%2Fsurveillance-capitalism-exploiting-behavioral-data-by-shoshana-zuboff-2020-01%23%3A~%3Atext%3DPrediction%2520products%2520are%2520then%2520traded%2Cto%2520trade%2520in%2520human%2520futures.&usg=AOvVaw3_ua8Skb3yrrOmKC0jaBjK
- OpenMined’s tutorial on Homomorphic Encryption :- https://www.youtube.com/watch?v=2TVqFGu1vhw&feature=youtu.be

If you want to join our mission on making the world more privacy preserving :-

]]>**Note**: If you want more demos like this, I'll tweet them out at @theoryffel. Feel free to follow if you'd be interested in reading more and thanks for all the feedback!

*Encrypted Machine Learning as a Service* allows owners of sensitive data to use external AI services to get insights over their data. Let's consider a practical scenario where a data owner holds private images and would like to use a service to have those images labeled, without disclosing the images nor the label predictions, and without having to get access the model, which is often considered to be a business asset by such services and is therefore not accessible.

To get a realistic example, we will consider the task of distinguishing between bees and ants, which uses a ResNet-18 model to achieve around 95% accuracy. We will not consider the training of such a model, as we assume the AI service provider has already trained it using some data. Instead, we will showcase how we can use PySyft to encrypt both the model and some images and to label those images in a fully private way.

*You can also find an executable Jupyter notebook of this demo in the PySyft Tutorial section, Part 11 bis.*

First, let's try to understand what mechanisms we use to make the data and the model private. If you want to jump straight to the code, you can skip this section! \(\newcommand{\shared}[1]{[\![ #1 ]\!]}\)

The cryptography protocol that we use to encrypt data is called Function Secret Sharing (FSS). It belongs to the family of Secure Multi-Party Computation (SMPC) protocols, which involves several parties that share a secret to ensure privacy. A party alone holds a share of the private value and can't reconstruct the value, and a quorum of parties (sometimes all parties) need to collaborate to reconstruct the private data. Therefore, saying that we *encrypt* the data is an abuse of language and we should say that we *secret share* it.

Other families of protocols exist like those based on Homomorphic Encryption, where data is truely encrypted and a party only needs a key to decrypt it. I recommend reading this OpenMined blog to learn more about Homomorphic Encryption.

Unlike classical data secret sharing schemes like SecureNN (which is also supported by PySyft), where a shared input \(\shared{x}\) is applied on a public \(f\), function secret sharing applies a public input \(x\) on a private shared function \(\shared{f}\). Shares or *keys* \((\shared{f}_0, \shared{f}_1)\) of a function \(f\) satisfy \(f(x) = \shared{f}_0(x) + \shared{f}_1(x)\). Both approaches output a secret shared result.

Let us take an example: say Alice and Bob respectively have shares \(\shared{y}_0\) and \(\shared{y}_1\) of a private input \(y\), and they want to compute \(\shared{y \ge 0}\). They receive some crypto material, namely each get a share of a random value (or *mask*) \(\shared{\alpha}\) and a share of the shared function \(\shared{f_\alpha}\) of \(f_{\alpha} : x \rightarrow (x \ge \alpha)\).

They first mask their shares of \(\shared{y}\) using \(\shared{\alpha}\), by computing \(\shared{y}_0 + \shared{\alpha}_0\) and \(\shared{y}_1 + \shared{\alpha}_1\) and then revealing these values to reconstruct \(x = y + \alpha\). Next, they apply this public \(x\) on their function shares \(\shared{f_\alpha}_{j=0,1}\), to obtain a shared output \((\shared{f_{\alpha}}_0(x), \shared{f_{\alpha}}_1(x)) = \shared{f_{\alpha}(y + \alpha)} = \shared{(y + \alpha) \ge \alpha} = \shared{y \ge 0}\). Previous works on FSS have shown the existence of such function shares for comparison which perfectly hide \(y\) and the result.

For more details about how FSS can be implemented, this article details the FSS algorithms that we currently use in PySyft.

Enough explications, let's open the code!

We will first load the data and the model and store them on the `data_owner`

and the `model_owner`

.

```
import torch
torch.set_num_threads(1) # We ask torch to use a single thread
# as we run async code which conflicts with multithreading
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import time
```

We download the data and load it on a `dataLoader`

with small batches of size 2, to reduce the inference time and the memory pressure on the RAM.

```
# First, download the dataset
# You can comment out this cell if you have already downloaded the dataset
!wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
!unzip hymenoptera_data.zip
```

```
data_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
data_dir = 'hymenoptera_data'
image_dataset = datasets.ImageFolder('hymenoptera_data/val', data_transform)
dataloader = torch.utils.data.DataLoader(image_dataset, batch_size=2, shuffle=True, num_workers=4)
dataset_size = len(image_dataset)
class_names = image_dataset.classes
```

Want to have a look at your data? Here you are:

Now let's download the trained ResNet-18

```
# You can comment out this cell if you have already downloaded the model
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1-1_M81rMYoB1A8_nKXr0BBOwSIKXPp2v' -O resnet18_ants_bees.pt
```

*You can also download the file* from here *if the command above is not working.*

```
model = models.resnet18(pretrained=True)
# Here the size of each output sample is set to 2.
model.fc = nn.Linear(model.fc.in_features, 2)
state = torch.load("./resnet18_ants_bees.pt", map_location='cpu')
model.load_state_dict(state)
model.eval()
# This is a small trick because these two consecutive operations can be switched without
# changing the result but it reduces the number of comparisons we have to compute
model.maxpool, model.relu = model.relu, model.maxpool
```

Great, now we're ready to start!

First let's create a virtual setup with 2 workers names `data_owner`

and `model_owner`

.

```
import syft as sy
hook = sy.TorchHook(torch)
data_owner = sy.VirtualWorker(hook, id="data_owner")
model_owner = sy.VirtualWorker(hook, id="model_owner")
crypto_provider = sy.VirtualWorker(hook, id="crypto_provider")
```

```
# Remove compression to have faster communication, because compression time
# is non-negligible: we send to workers crypto material which is very heavy
# and pseudo-random, so compressing it takes a long time and isn't useful:
# randomness can't be compressed, otherwise it wouldn't be random!
from syft.serde.compression import NO_COMPRESSION
sy.serde.compression.default_compress_scheme = NO_COMPRESSION
```

Let's put some data on the `data_owner`

and the model on the `model_owner`

. *In a real setting, the data and the model would already be located respectively on the two workers and we would just ask for pointers to these objects.*

```
data, true_labels = next(iter(dataloader))
data_ptr = data.send(data_owner)
# We store the true output of the model for comparison purpose
true_prediction = model(data)
model_ptr = model.send(model_owner)
```

As usual, when calling `.send()`

, we only have access to pointers to the data

```
print(data_ptr)
```

```
(Wrapper)>[PointerTensor | me:85268602991 -> data_owner:95928743858]
```

We will now encrypt both the model and the data. To do this, we encrypt them remotely using the pointers and get back the encrypted objects.

```
encryption_kwargs = dict(
workers=(data_owner, model_owner), # the workers holding shares of the secret-shared encrypted data
crypto_provider=crypto_provider, # a third party providing some cryptography primitives
protocol="fss", # the name of the crypto protocol, fss stands for "Function Secret Sharing"
precision_fractional=4, # the encoding fixed precision (i.e. floats are truncated to the 4th decimal)
)
```

```
encrypted_data = data_ptr.encrypt(**encryption_kwargs).get()
encrypted_model = model_ptr.encrypt(**encryption_kwargs).get()
```

We are now able to run our secure inference, so let's do it and let's compare it to the `true_labels`

```
start_time = time.time()
encrypted_prediction = encrypted_model(encrypted_data)
encrypted_labels = encrypted_prediction.argmax(dim=1)
print(time.time() - start_time, "seconds")
labels = encrypted_labels.decrypt()
print("Predicted labels:", labels)
print(" True labels:", true_labels)
```

```
313.13965487480164 seconds
Predicted labels: tensor([0., 1.])
True labels: tensor([0, 1])
```

Hooray! This works!! Well at least with a probability of 95% which is the accuracy of the model.

But is the computation *exactly* the same than the plaintext model? Well not exactly, because we sometime use approximations, but let's open the model output logits to verify how close we are from plaintext execution.

```
print(encrypted_prediction.decrypt())
print(true_prediction)
```

```
tensor([[ 1.0316, -0.3674],
[-1.3748, 2.0235]])
tensor([[ 1.0112, -0.3442],
[-1.3962, 2.0563]], grad_fn=<AddmmBackward>)
```

As you can observe, this is quite close and in practice the accuracy of the model is preserved, as you can observe by running inference over more images. The approximations mentioned are due to approximated layers such as BatchNorm and the fixed precision encoding.

Regarding **runtime**, we manage to predict a batch of 2 images in ~400 seconds, which isn't super fast but is already reasonable for our usecase!

Ok that's good, but in real life I won't use virtual workers!

That's right, actually you can run exactly the same experiment using PyGrid and workers which live in a PrivateGridNetwork. Those workers are independent processes which can live on your machine or on remote machines.

To do so, first clone PyGrid and then start new nodes in your terminal (one per tab) as such:

```
cd PyGrid/apps/node
./run.sh --id data_owner --port 7600 --host localhost --start_local_db
./run.sh --id model_owner --port 7601 --host localhost --start_local_db
./run.sh --id crypto_provider --port 7602 --host localhost --start_local_db
```

And you replace the `syft`

imports in this notebook as such:

```
import syft as sy
from syft.grid.clients.data_centric_fl_client import DataCentricFLClient
hook = sy.TorchHook(th)
data_owner = DataCentricFLClient(hook, "ws://localhost:7600")
model_owner = DataCentricFLClient(hook, "ws://localhost:7601")
crypto_provider = DataCentricFLClient(hook, "ws://localhost:7602")
my_grid = sy.PrivateGridNetwork(data_owner, model_owner, crypto_provider)
sy.local_worker.object_store.garbage_delay = 1 # at time of writing, the garbage collection processus of remote values when using websockets should be batched to avoid sending too many messages to the remote workers. This is done by requesting the GC messages to be sent by batches every 1 second.
```

The computation will be exactly the same, and the runtime will roughtly double. You can run the experiment to verify this, and it's a nice intro to PyGrid!

Next is improving this first proof of concept! How can this be done?

- First, we can optimize our implementation, for example by switching from Python to Rust.
- Second, we can try to adapt the model structure or model layers to have a faster execution given our constraints without compromising accuracy. Think of the swap we made between maxpool and relu in the ResNet-18 architecture at thhe beginning.
- Last, we can investigate new Function Secret Sharing crypto protocols, this is a new and promising field, we expect new breakthroughs to help us improving the inference time!

If you want to help, come and apply to join one of our cryptography teams!

You can also help our community by starring the repositories! This helps raise awareness of the cool tools we're building.

The best way to keep up to date on the latest advancements is to join our community!

]]>In the previous article CKKS explained, Part 2: full encoding and decoding, we saw how we could implement CKKS's encoder and decoder, which enabled us to transform vectors to polynomials and vice-versa. This step was necessary as we will see that using polynomials is much more efficient for building an homomorphic encryption scheme than using vectors directly.

We will see in this article how we can use hard problems, such as **LWE **or **RLWE** to build an approximate homomorphic encryption scheme. CKKS uses approximate arithmetic instead of exact arithmetic, in the sense that once we finish computation we might get a slightly different result than if we did the computation directly. This means that if you encrypt 2 and 3, add their ciphertexts, and decrypt you might get something like 4.99 or 5.01 but not 5. Other schemes such as BFV are exact, which mean they will yield exactly 5.

Why use CKKS then ? CKKS is more suited for arithmetic on real numbers, where we can have approximate but close results, while BFV is more suited for arithmetic on integers.

We will see in this article how to implement the encryption and decryption for an approximate arithmetic homomorphic encryption scheme using LWE and RLWE.

CKKS is a public key encryption scheme, where a secret key and a public key are generated. While the public key is used for encryption and can be shared, the private key is used for decryption and must be kept secret.

The foundation of CKKS, and many other homomorphic encryption schemes, is the **Learning With Error** (LWE) problem, which was initially introduced in the paper On lattices, learning with errors, random linear codes, and cryptography by Regev. The LWE problem is to distinguish noisy pairs of the form \((a_i, b_i) = (a_i, <a_i,s> + e_i)\) from truly random ones in \(\mathbb{Z}_q^n \times \mathbb{Z}_q\). Here \(a_i, s \in \mathbb{Z}_q^n\), \(a_i\) is uniformly sampled, \(s\) is our secret, and the \(e_i \in \mathbb{Z}_q\) are small noises, typically gaussians, used to make the problem harder. If we did not introduce the \(e_i\), the problem would have been much easier to solve, as we could use Gaussian elimination to solve the linear system.

The LWE problem is known to be as hard as worst case lattice problems, which are currently secure against attacks from quantum computers. Therefore, we can exploit the fact that finding a secret \(s\) from pairs of \((a_i, <a_i,s> + e_i)\) is hard, and build a cryptosystem upon this.

Suppose we have generated a secret key \(s \in \mathbb{Z}_q^n\) which we keep secret, and publish \(n\) pairs of the type \((a_i, <a_i,s> + e_i)\), which can be written in matrix form as \((A, A.s + e)\) with \(A \in \mathbb{Z}_q^{n \times n}, e \in \mathbb{Z}_q^n\). As the LWE problem states, it is hard to recover the secret key from this couple, thus we can use that to create a public key.

Indeed we will use \(p = (-A.s + e, A)\) as our public key, which can be used publicly as the secret key is hard to extract from it. We chose to store \(-A.s\) instead of \(A.s\) for convenience as we will see later, but it doesn't change the problem.

Then to encrypt and decrypt a message \(\mu \in \mathbb{Z}_q^n\) using the public and secret key, we can use the following scheme:

- Encryption of \(\mu\) using \(p\): output \(c = (\mu,0) + p = (\mu -A.s + e, A) = (c_0, c_1)\).
- Decryption of \(c\) using \(s\): output \(\tilde{\mu} = c_0 + c_1.s = \mu -A.s + e + A.s = \mu + e \approx \mu\)

So for the encryption phase, we take the public key, and use it to mask our message \(\mu\). The message is thus hidden in the first coordinate of the ciphertext with a the mask \(-A.s\). Remember that \(A\) is sampled uniformly, so it will indeed mask \(\mu\) effectively. To remove it, we can use the second coordinate of \(c\), which only stores \(A\) and combine it with the secret key \(s\) to obtain the decryption which is \(\mu + e\). Notice here that we do not exactly get the original message \(\mu\) but \(\mu\) plus some noise \(e\), which is why we say we have an approximate arithmetic scheme. If \(e\) is small enough, then the decryption will be close to the original \(\mu\).

So we have seen here how to use the LWE problem to build a public crypto scheme that is secure against quantum attacks. The problem with the implementation above, is that while the secret key has size \(\mathcal{O}(n)\), the public key has size \(\mathcal{O}(n^2)\) due to the matrix \(A\) and computations will also require \(\mathcal{O}(n^2)\) operations. Because \(n\) will dictate the security of our scheme, if we use LWE to construct our scheme, it will be too inefficient in practice as the \(\mathcal{O}(n^2)\) size of keys and complexity will make it too impractical.

That is why we will consider the Ring Learning With Error problem introduced in On Ideal Lattices and Learning with Errors Over Rings, which is a variant of LWE but on rings. Instead of working with vectors in \(\mathbb{Z}_q^n\), we will work on polynomials in \(\mathbb{Z}_q[X]/(X^N+1)\) (we suppose here that \(N\) is a power of 2). So now we draw \(a\), \(s\), and \(e\) from \(\mathbb{Z}_q[X]/(X^N+1)\), where \(a\) is still sampled uniformly, \(s\) is a small secret polynomial, and \(e\) is a small noisy polynomial. Switching to RLWE has two main advantadges:

- The key size is no longer quadratic but linear, as we now output the public key \(p = (-a.s +e, a)\), where \(a.s\) denotes the polynomial product of \(a\) with \(s\). Because all operations are done between polynomials, both the private and public keys are of size \(\mathcal{O}(n)\).
- Multiplications are done on polynomials, therefore it can be done with a complexity of \(\mathcal{O}(n log(n))\) using Discrete Fourier Transform for polynomial multiplication, instead of \(\mathcal{O}(n^2)\) because we had to do matrix vector multiplication.

Therefore by using RLWE instead of LWE, we will much smaller keys, and operations will be even faster, so the preceding scheme becomes much more practical. Moreover, the RLWE is still a hard problem and provides strong security guarantees, thus using RLWE still provides a secure scheme.

We know understand why it was important to work with polynomials because they provide the basis for an efficient and secure scheme. Therefore you can understand now why we had to go through all the trouble of converting our vectors into polynomials of \(\mathbb{Z}_q[X]/(X^N+1)\) and vice-versa, as we can now leverage the algebraic structure of polynomial rings.

Now the have seen why we work on polynomials of \(\mathbb{Z}_q[X]/(X^N+1)\), and how we could get an encryption scheme based on this, let's see how we can define addition and multiplication on ciphertexts to have an homomorphic encryption scheme.

So we said that we have a secret \(s\), and a public key \(p = (b,a) = (-a.s +e, a)\). To encrypt a message \(\mu\) we simply output \(c = (\mu + b, a)\), and to decrypt it with \(s\) we evaluate \(c_0 + c_1.s\) which will approximately gives the original message.

Now suppose we have two messages, \(\mu\) and \(\mu'\) and that we encrypt them into \(c = (c_0,c_1)\) and \(c' = (c_0',c_1')\). Then \(c_{add} = c + c' = (c_0 + c_0',c_1 + c_1')\) is a correct encryption of \(\mu + \mu'\), i.e. when we decrypt it using \(s\) we get (approximatively) \(\mu + \mu'\).

Indeed, the decryption mechanism of \(c_{add}\) yields \(c_{add,0} + c_{add,1}.s = c_0 + c_0' + (c_1 + c_1').s = c_0 + c_1.s + c_0' + c_1'.s = \) \(\mu + \mu' + 2 e \approx \mu + \mu'\) because we said that \(e\) is negligible.

What this means is that if you add ciphertexts, and you decrypt them, you will get the addition of the plaintexts! This means that with this simple scheme, you can allow someone to perform additions on encrypted data, and the user can still decrypt it and get the correct result. This is our first step toward an homomorphic encryption scheme.

Nonetheless, we still need to define the multiplication on ciphertexts, which is more involved. Indeed, our goal will be to find a ciphertext \(c_{mult}\) such that when we decrypt it with the secret key \(s\), we get the product of the plaintexts.

Because multiplying two ciphertexts is more complex, we will first focus now on multiplying a ciphertext with a plaintext, and see how to do multiplication between ciphertexts in a later article.

Suppose we have a plaintext \(\mu\), encrypted into the ciphertext \(c = (c_0, c_1)\) and a plaintext \(\mu'\). Then to obtain the ciphertext of the multiplication, we simply need to output \(c_{mult} = (\mu' . c_0, \mu' . c_1)\).

Indeed, when decrypting \(c_{mult}\) we get \(\mu' . c_0 + \mu' . c_1 . s = \mu' . (c_0 + c_1 . s) = \mu' . (\mu + e) = \mu' . \mu + \mu'.e \approx \mu' . \mu \).

Therefore with these implementations of addition and multiplication, we have seen that it is possible to encrypt some data, perform operations on it, so that once we decrypt it, we have the same result as if we performed computation on plaintext data.

Because we have a few more concepts to address, like ciphertext-ciphertext multiplication, relinearization, and rescaling, we will not cover the code implementation yet. Once we will have all the building blocks, we will put everything together to have an end-to-end approximate homomorphic encryption scheme, similar to CKKS!

So I hope you understood how to build an homomorphic encryption scheme using RLWE, and next stop, ciphertext-to-ciphertext multiplication !

]]>As humans, our ability to answer questions is limited to the data we have. But what about questions like ‘What causes cancer?

]]>As humans, our ability to answer questions is limited to the data we have. But what about questions like ‘What causes cancer?’ or ‘What kind of a diet should I follow to prevent disease X?”. Answering these kinds of questions using data we don’t really have, is a key aspect of privacy-preserving AI.

Today, we live in a world where massive amounts of data are constantly being ‘spewed’ everywhere. Like it or not, almost every activity of ours these days, be it online or offline, is likely to become the cause for data generation/collection in some way.

In the human quest of life-long discovery, the power that lies untapped in this vast, complex, heterogeneous ocean of data to answer questions is undeniable. Therefore, ensuring that the proper people can access this unharnessed trove of data, without data getting ‘spewed’ all over, risking access to the wrong hands, such as someone who wants some malicious use of the data, is a significantly important task.

Considering privacy-preserving tools such as Differential Privacy and Secure Multi-Party Computation (discussed in Part 1 of this blog) as the building blocks, the only thing that stands between us and a better world in terms of data privacy is the usual cycle of adoption, maturation of the technology, and good engineering as is the case with any new concept/invention.

Some broad examples of areas where the positive impacts could be substantial include:

Consider the example of IBM Watson acing games of ‘Jeopardy!’ against the best human players. Or *the Deep Blue vs Gary Kasparov* chess game, where for the first time a machine was able to beat a reigning world chess champion. The common thread we see here is the amazing momentum that large datasets, or the ability to process them, can give to machine learning tasks. Due to issues like legal risk and commercial viability, thousands of enterprises, think tanks, governments, and other entities across the world have such potentially useful data lying relatively unharnessed within their data warehouses. On the other end, researchers across the scientific spectrum, for instance medical research, are looking for genuine, privacy-preserving access to big data that could help answer many meaningful questions about issues that truly matter such as life-threatening diseases and conditions.

Making these largely unused datasets accessible through privacy-preserving gateways could be a very valuable idea for data science startups to invest in. Connecting the data and the ‘data-seekers’ in this way, through secure interfaces that both protect the data and let the world safely leverage its potential, is a repeatable business model.

Consider the example of baggage screening at airports. In order to spot the rare malicious baggage contents, the screening personnel have to be given access to information about **all** the contents of **every** bag that passes the checkpoint. Analyzing this scenario from a privacy perspective, there is a lot of extra information leakage here. The question that needed to be answered with baggage was ‘Does this bag contain any illicit content?’.

Building a machine learning classifier to do this job would have helped here. It would have ensured that the personnel open up only those bags that seemed suspicious to the classifier, instead of someone manually going through the screening results of every bag. Similarly, a dog that can sniff a bag to identify malicious contents without having to manually go through the contents of every bag, is another privacy-preserving solution. These are analogous to the principle of single-use accountability systems, systems that are used to hold people accountable, through usage of data for a single purpose or to answer a single question.

Such systems have two main advantages:

1. Accountability systems become more privacy preserving, which helps mitigate any potential dual or multiple use.

2. Extremely privacy-sensitive tasks, like email surveillance for law enforcement needs by agencies like investment banks, might now become possible.

Messaging services like WhatsApp, Telegram etc. are end-to-end encrypted. This means that I can send a message to X, which will be encrypted and only X’s phone will be able to decrypt my message, without any interference or access to the service provider. Using the combination of Machine Learning, Differential Privacy and Secure Multi-Party Computation, so many more diverse and complex end-to-end services can be made available, while being privacy-preserving.

For instance, consider a medical consultation scenario where you decide to consult a doctor to get a diagnosis. The determination of what kind of treatment you will need happens to be based on the confluence of two datasets, namely the **doctor’s knowledge base** covering diseases, experience, and treatments and your** personal information** about your symptoms, genetic predisposition, medical history, etc. If we considered the treatment-determining function as *f(x,y)*, this function receives inputs from the doctor’s knowledge base X and the patient’s medically relevant details Y. Using both these data, *f(x,y)* computes the logic and generates the output *z,* which is the potential treatment for the condition the patient has. Here, the inputs coming from the doctor and patient can be protected using Secure Multi-Party Computation, so that either of them can use the information they have to compute f(x,y) together without having to reveal to each other the information they each possess. At the output end, if the result *z *is being sent to the outside world, it can be protected with Differential Privacy; or the result can be reverted to the patient, who alone can decrypt the same. Here, the doctors’ knowledge base X and *f(x,y) *can both be machine learning models.

This means that a patient can consult a doctor and get a diagnosis without ever having to reveal his symptoms or personal information to anyone, even the doctor.

Note: In Secure MPC, the result of the computation, in this case *z, *will be encrypted among the same shareholders who supplied the initial input for the computation. They can then decide who they together want to decrypt this result for.

In the previous article CKKS explained: Part 1, Vanilla Encoding and Decoding, we learned that to implement the CKKS encryption scheme for computation on encrypted complex vectors, we must first build an encoder and a decoder to transform our complex vectors into polynomials.

This encoder-decoder step is necessary because the encryption, decryption, and other mechanisms work on polynomial rings. Therefore it is necessary to have a way to transform our vectors of real values into polynomials.

We also learned that by using the cannonical embedding \(\sigma\), which simply decodes a polynomial by evaluating it on the roots of \(X^N +1\), we are able to have an isomorphism between \(\mathbb{C}^N\) to \(\mathbb{C}[X]/(X^N +1) \). Nevertheless, because we want our encoder to output polynomials in \(\mathbb{Z}[X]/(X^N +1) \) so as to exploit the structure of polynomial integer rings, we need to modify this first vanilla encoder to be able to output polynomials of the right ring.

Therefore, in this article we will explore how to implement the encoder and decoder used in the original paper Homomorphic Encryption for Arithmetic of Approximate Numbers, which will be our first step toward implementing CKKS from scratch.

The difference with the previous post is that the plaintext space of the encoded polynomial is now \(\mathcal{R} = \mathbb{Z}[X]/(X^N + 1)\) instead of \(\mathbb{C}[X]/(X^N + 1)\), so the coefficients of the polynomial of encoded values must have integer coefficients. Yet when we encode a vector in \(\mathbb{C}^N\) we have learned that its encoding does not neccessarily have integer coefficients.

To solve this issue, let's take a look at the image of the canonical embedding \(\sigma\) on \(\mathcal{R}\).

Because polynomials in \(\mathcal{R}\) have integer coefficients, i.e. real coefficients, and we evaluate them on complex roots, where half are the conjugates of the other (see the previous figure), we have that:

\(\sigma(\mathcal{R}) \subseteq \mathbb{H} = { z \in \mathbb{C}^{N} : z_j = \overline{z_{-j}} } \).

Recall the picture earlier when \(M = 8\) :

We can see on this picture that \(\omega^1 = \overline{\omega^7}\) and \(\omega^3 = \overline{\omega^5}\). In general, because we evaluate a real polynomial on the roots of \(X^N +1\), we will also have that for any polynomial \(m(X) \in \mathcal{R}, m(\xi^j) = \overline{m(\xi^{-j} )} = m(\overline{\xi^{-j}})\).

Therefore, any element of \(\sigma(\mathcal{R})\) is actually in a space of dimension \(N/2\), not \(N\). So if we use complex vectors of size \(N/2\) when encoding a vector in CKKS, we need to expand them by copying the other half of conjugate roots.

This operation, which takes an element of \(\mathbb{H}\) and projects it to \(\mathbb{C}^{N/2}\), is called \(\pi\) in the CKKS paper. Note that this defines a isomorphism as well.

Now we can start with \(z \in \mathbb{C}^{N/2}\), expand it using \(\pi^{-1}\) (note that \(\pi\) projects, \(\pi^{-1}\) expands), and we get \(\pi^{-1}(z) \in \mathbb{H}\).

A problem that we face is that we cannot directly use \(\sigma : \mathcal{R} = \mathbb{Z}[X]/(X^N +1) \rightarrow \sigma(\mathcal{R}) \subseteq \mathbb{H}\), because an element of \(\mathbb{H}\) is not necessarily in \(\sigma(\mathcal{R})\). \(\sigma\) does define an isomorphism, but only from \(\mathcal{R}\) to \(\sigma(\mathcal{R})\). To convince yourself that \(\sigma(\mathcal{R})\) is not equal to \(\mathbb{H}\), you can notice that \(\mathcal{R}\) is countable, therefore \(\sigma(\mathcal{R})\) as well, but \(\mathbb{H}\) is not, as it is isomorph to \(\mathbb{C}^{N/2}\).

This detail is important because it means that we must find a way to project \(\pi^{-1}(z)\) on \(\sigma(\mathcal{R})\). To do so, we will use a technique called "coordinate-wise random rounding", defined in A Toolkit for Ring-LWE Cryptography. This rounding technique allows to round a real \(x\) either to \(\lfloor x \rfloor\) or \(\lfloor x \rfloor +1 \) with a probability that is higher the closer \(x\) is to \(\lfloor x \rfloor\) or \(\lfloor x \rfloor +1\). We will not go into the details of this algorithm, though we will implement it.

The idea is simple, \(\mathcal{R}\) has an orthogonal \(\mathbb{Z}\)-basis \({1,X,...,X^{N-1} }\) and, given that \(\sigma\) is an isomorphism, \(\sigma(\mathcal{R})\) has an orthogonal \(\mathbb{Z}\)-basis \(\beta = (b_1,b_2,...,b_N) = (\sigma(1),\sigma(X),...,\sigma(X^{N-1}) )\). Therefore, for any \(z \in \mathbb{H}\), we will simply project it on \(\beta\) :

\(z = \sum_{i=1}^{N} z_i b_i\), with \(z_i = \frac{<z, b_i>}{||bi||^2} \in \mathbb{R}\).

Because the basis is orthogonal and not orthonormal, we have \(z_i = \frac{<z, b_i>}{||bi||^2}\). Note that we are using the hermitian product here : \(<x,y> = \sum_{i=1}^{N} x_i \overline{y_i}\). The hermitian product gives real outputs because we apply it on elements of \(\mathbb{H}\), you can compute to convince yourself or notice that you can find an isometric isomorphism between \(\mathbb{H}\) and \(\mathbb{R}^N\), therefore inner product in \(\mathbb{H}\) will yield real output.

Finally, once we have the coordinates \(z_i\), we simply need to round them randomly, to the higher or the lower closest integer, using the "coordinate-wise random rounding". This way we will have a polynomial which will have integer coordinates in the basis \((\sigma(1),\sigma(X),...,\sigma(X^{N-1}) )\), therefore this polynomial will belong to \(\sigma(\mathcal{R})\) and the trick is done.

Once we have projected on \(\sigma(\mathcal{R})\), we can apply \(\sigma^{-1}\) which will output an element of \(\mathcal{R}\) which was what we wanted!

One final detail : because the rounding might destroy some significant numbers, we actually need to multply by \(\Delta > 0\) during encoding, and divide by \(\Delta\) during decoding to keep a precision of \(\frac{1}{\Delta}\). To see how this works, imagine you want to round \(x = 1.4\) and you do not want to round it to the closest integer but to the closest multiple of \(0.25\) to keep some precision. Then, you want to set the scale \(\Delta = 4\), which gives a precision of \(\frac{1}{\Delta} = 0.25\). Indeed, now when we get \(\lfloor \Delta x \rfloor = \lfloor 4 * 1.4 \rfloor = \lfloor 5.6 \rfloor = 6\). Once we divide it by the same scale \(\Delta\) we get \(1.5\), which is indeed the closest multiple of \(0.25\) of \(x = 1.4\).

So the final encoding procedure is :

- take an element of \(z \in \mathbb{C}^{N/2}\)
- expand it to \(\pi^{-1}(z) \in \mathbb{H}\)
- multiply it by \(\Delta\) for precision
- project it on \(\sigma(\mathcal{R})\) : \(\lfloor \Delta . \pi^{-1}(z) \rceil_{\sigma(\mathcal{R})} \in \sigma(\mathcal{R})\)
- encode it using \(\sigma\) : \(m(X) = \sigma^{-1} (\lfloor \Delta . \pi^{-1}(z) \rceil_{\sigma(\mathbb{R})}) \in \mathcal{R}\).

The decoding procedure is much simpler, from a polynomial \(m(X)\) we simply get \(z = \pi \circ \sigma(\Delta^{-1} . m)\).

Now that we finally managed to see how the full CKKS encoding and decoding works, let's implement it! We will use the code we previously used for the Vanilla Encoder and Decoder. The code is available on this Colab notebook.

For the rest of the article, let's refactor and build on top of the `CKKSEncoder`

class we have created from the previous post. In a notebook environment, instead of redefining the class each time we want to add or change methods, we will simply use `patch_to`

from the `fastcore`

package from Fastai. This allows us to monkey patch objects that have already been defined. Using `patch_to`

is purely for conveniency and you could just redefine the `CKKSEncoder`

at each cell with the added methods.

```
# !pip3 install fastcore
from fastcore.foundation import patch_to
```

```
@patch_to(CKKSEncoder)
def pi(self, z: np.array) -> np.array:
"""Projects a vector of H into C^{N/2}."""
N = self.M // 4
return z[:N]
@patch_to(CKKSEncoder)
def pi_inverse(self, z: np.array) -> np.array:
"""Expands a vector of C^{N/2} by expanding it with its
complex conjugate."""
z_conjugate = z[::-1]
z_conjugate = [np.conjugate(x) for x in z_conjugate]
return np.concatenate([z, z_conjugate])
# We can now initialize our encoder with the added methods
encoder = CKKSEncoder(M)
```

```
z = np.array([0,1])
encoder.pi_inverse(z)
```

`array([0, 1, 1, 0])`

```
@patch_to(CKKSEncoder)
def create_sigma_R_basis(self):
"""Creates the basis (sigma(1), sigma(X), ..., sigma(X** N-1))."""
self.sigma_R_basis = np.array(self.vandermonde(self.xi, self.M)).T
@patch_to(CKKSEncoder)
def __init__(self, M):
"""Initialize with the basis"""
self.xi = np.exp(2 * np.pi * 1j / M)
self.M = M
self.create_sigma_R_basis()
encoder = CKKSEncoder(M)
```

We can now have a look at the basis \(\sigma(1), \sigma(X), \sigma(X^2), \sigma(X^3)\).

`encoder.sigma_R_basis`

`array([[ 1.00000000e+00+0.j, 1.00000000e+00+0.j,1.00000000e+00+0.j, 1.00000000e+00+0.j],[ 7.07106781e-01+0.70710678j, -7.07106781e-01+0.70710678j, -7.07106781e-01-0.70710678j, 7.07106781e-01-0.70710678j],[ 2.22044605e-16+1.j, -4.44089210e-16-1.j, 1.11022302e-15+1.j, -1.38777878e-15-1.j], [-7.07106781e-01+0.70710678j, 7.07106781e-01+0.70710678j,7.07106781e-01-0.70710678j, -7.07106781e-01-0.70710678j]])`

Here we will check that elements of \(\mathbb{Z} ({ \sigma(1), \sigma(X), \sigma(X^2), \sigma(X^3)) }\) are encoded as integer polynomials.

```
# Here we simply take a vector whose coordinates are (1,1,1,1) in the lattice basis
coordinates = [1,1,1,1]
b = np.matmul(encoder.sigma_R_basis.T, coordinates)
b
```

`array([1.+2.41421356j, 1.+0.41421356j, 1.-0.41421356j, 1.-2.41421356j])`

We can check now that it does encode to an integer polynomial.

```
p = encoder.sigma_inverse(b)
p
```

`x↦(1+2.220446049250313e-16j)+((1+0j))x+((0.9999999999999998+2.7755575615628716e-17j))x^2+((1+2.220446049250313e-16j))x^3`

```
@patch_to(CKKSEncoder)
def compute_basis_coordinates(self, z):
"""Computes the coordinates of a vector with respect to the orthogonal lattice basis."""
output = np.array([np.real(np.vdot(z, b) / np.vdot(b,b)) for b in self.sigma_R_basis])
return output
def round_coordinates(coordinates):
"""Gives the integral rest."""
coordinates = coordinates - np.floor(coordinates)
return coordinates
def coordinate_wise_random_rounding(coordinates):
"""Rounds coordinates randonmly."""
r = round_coordinates(coordinates)
f = np.array([np.random.choice([c, c-1], 1, p=[1-c, c]) for c in r]).reshape(-1)
rounded_coordinates = coordinates - f
rounded_coordinates = [int(coeff) for coeff in rounded_coordinates]
return rounded_coordinates
@patch_to(CKKSEncoder)
def sigma_R_discretization(self, z):
"""Projects a vector on the lattice using coordinate wise random rounding."""
coordinates = self.compute_basis_coordinates(z)
rounded_coordinates = coordinate_wise_random_rounding(coordinates)
y = np.matmul(self.sigma_R_basis.T, rounded_coordinates)
return y
encoder = CKKSEncoder(M)
```

Finally, because there might be loss of precisions during the rounding step, we have the scale parameter \(\Delta\) to achieve a fixed level of precision.

```
@patch_to(CKKSEncoder)
def __init__(self, M:int, scale:float):
"""Initializes with scale."""
self.xi = np.exp(2 * np.pi * 1j / M)
self.M = M
self.create_sigma_R_basis()
self.scale = scale
@patch_to(CKKSEncoder)
def encode(self, z: np.array) -> Polynomial:
"""Encodes a vector by expanding it first to H,
scale it, project it on the lattice of sigma(R), and performs
sigma inverse.
"""
pi_z = self.pi_inverse(z)
scaled_pi_z = self.scale * pi_z
rounded_scale_pi_zi = self.sigma_R_discretization(scaled_pi_z)
p = self.sigma_inverse(rounded_scale_pi_zi)
# We round it afterwards due to numerical imprecision
coef = np.round(np.real(p.coef)).astype(int)
p = Polynomial(coef)
return p
@patch_to(CKKSEncoder)
def decode(self, p: Polynomial) -> np.array:
"""Decodes a polynomial by removing the scale,
evaluating on the roots, and project it on C^(N/2)"""
rescaled_p = p / self.scale
z = self.sigma(rescaled_p)
pi_z = self.pi(z)
return pi_z
scale = 64
encoder = CKKSEncoder(M, scale)
```

We can now see it on action, the full encoder used by CKKS :

```
z = np.array([3 +4j, 2 - 1j])
z
```

`array([3.+4.j, 2.-1.j])`

Now we have an integer polynomial as our encoding.

```
p = encoder.encode(z)
p
```

`x↦160.0+90.0x+160.0x^2+45.0x^3`

And it actually decodes well !

`encoder.decode(p)`

`array([2.99718446+3.99155337j, 2.00281554-1.00844663j])`

I hope you enjoyed this little introduction to encoding complex numbers into polynomials for homomorphic encryption. We will deep dive into this further in the following articles so stay tuned!

]]>Private conversation is everywhere. If you talk to your wife, it’s obviously a private conversation. If you talk to an operator of a contact center, it’s still considered private. Almost all conversations are considered as private. Public debate is one of

]]>Private conversation is everywhere. If you talk to your wife, it’s obviously a private conversation. If you talk to an operator of a contact center, it’s still considered private. Almost all conversations are considered as private. Public debate is one of the exceptions.

If you want to extract meaning out of private conversation, you need to access private data, which you can ** not** access in most cases. That’s why I think a privacy preserving deep learning technique is the key to taking speech-to-text AI to the next phase.

In this tutorial, I will show how to train a speech command prediction model with federated learning.

To tell the truth, I want to train speech-to-text model with federated learning instead of speech command prediction. But I couldn’t make it this time so maybe in the future…

You can learn in this tutorial…

- how to handle audio data
- how to train speech commands prediction model with federated learning

I use PySyft library for federated learning. PySyft is one of libraries for privacy preserving AI, which is developed and maintained by the open-source community called OpenMined.

If the idea of privacy preserving AI is not familiar to you, please check out OpenMined’s websites. There are tons of good blogs and tutorials.

The objective is to train a speech commands prediction model with federated learning. Input data is a wav audio file and output is a category id of speech commands list. We have 12 classes, unknown, silence, yes, no, up, down, left, right, on, off, stop, and go.

I borrowed almost all codes from this repository . Thanks a lot!

https://github.com/tugstugi/pytorch-speech-commands.git

The federated learning part is taken from a PySyft tutorial.

https://github.com/OpenMined/PySyft/blob/master/examples/tutorials/Part%2006%20-%20Federated%20Learning%20on%20MNIST%20using%20a%20CNN.ipynb

We download speech command datasets from here:

http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

After downloading datasets, we do things similar to those below. The point is make sure all audio has exact same duration, 1 second in this case, and convert raw audio into Mel Spectrogram format. Mel Spectrogram is one of the audio formats to use as input. If you want to know more about audio format for deep learning, please check out this blog.

- Split datasets into training datasets and validation datasets
- preprocess raw audio into STFT format
- convert STFT format into Mel Spectrogram Format
- data augmentation
- adding background noise data augmentation

Since my objective is to use federated learning, I want remote machines to have data. So let’s send datasets to remote machines ** virtually**.

Create 2 virtual remote machine called Alice and Bob.

```
# import PySyft
import syft as sy # maket PySyft and PyTorch work together
hook = sy.TorchHook(torch) # create virtual remote machine called bob and alice
bob = sy.VirtualWorker(hook, id="bob")
alice = sy.VirtualWorker(hook, id="alice")
```

Send datasets to Alice and Bob.

```
# sy.FederatedDataLoader do the magic
federated_train_loader = sy.FederatedDataLoader(
train_dataset.federate((bob, alice)),
batch_size=batch_size,
)
```

At training, the model goes back and forth between me, main training process, and Bob and Alice.

** inputs.location** give you the location of data. Then

```
# send model to data.location
model.send(inputs.location)
```

That’s it.

Let’s see the loss and accuracy.

After 12 epochs, we got 94 % accuracy for training datasets.

That’s good!

But wait…

validation accuracy was just 16 %. This is close to random picking…

This is the log of pure PyTorch training without federated learning.

I used AdamW which is same as federated learning version.

Training accuracy is 94 % and validation accuracy is 91 %, which is normal.

I did a couple of experimentation such as…

- using SGD instead of Adam.
- using large datasets
- not using any data augmentations
- checking file by file

And finally I got the reason, or at least workaround.

I noticed that evaluation mode always gave me bad accuracy, but training mode gave good accuracy.

It seems PySyft version 0.2.8 sometimes can not handle self-training correctly. And a few layers like batchnorm use self-training inside.

After change ** model.eval()** to

To be honest, speed is a bit slow. We might need to make the speed faster because some real world use cases require a lot more data on a much bigger architecture. And there are some functions and options which we can not use with current version of PySyft.

But PySyft is being developed rapidly. I can say the speed is very fast. So it’s going to support such functions and options soon.

Everything else works great and as an AI practitioner working in voice/speech sector, I can feel the excitement around privacy preserving AI and OpenMind.

Next time I’m going to write a demo with multiple instances with data. I mean a demo without sending datasets to virtual machines/workers because it’s closer to real world situations. I also want to try to train speech-to-text model with federated learning too, of course.

I put all the codes on a single jupyter notebooks so you can follow each steps to learn how to handle audio data and how to apply federated learning to speech command prediction model training.

Here are the source codes. I hope it’s helpful.

]]>The data protection market is estimated to reach $119 billion by 2022 — this projection highlights the pivotal role the field has in society. We constantly rely on a secure grid in which data can be safely stored, accessed, and shared across the broad services we utilise, from healthcare and research

]]>The data protection market is estimated to reach $119 billion by 2022 — this projection highlights the pivotal role the field has in society. We constantly rely on a secure grid in which data can be safely stored, accessed, and shared across the broad services we utilise, from healthcare and research to finance, across private and public sectors.

Not surprisingly, a lot of people have asked us ‘what are all the companies in the privacy and security space?’ — so we curated a list! We found almost 400 companies in the privacy sector that can now be accessed here: *List of Companies in the Privacy Space *

Most of the companies specialise in security, identity and privacy solutions. Many focus on GDPR compliance, de-anonymization or secure data sharing. This list covers the broad data protection market, encompassing its many facets, thus the solutions and services provided vary widely and so do the techniques employed (cryptography, machine learning, federated learning, homomorphic encryption, differential privacy, de-identification, etc) depending on the end purpose.

To ensure we do not miss any company/startup/project in this field and given we are an open and collaborative community, you can add them to this list by filling in this form **HERE****. **

Keep an eye on this space, since like the data protection market’s forecast, this list will continue growing!

Fatemeh Mireshghallah is a 3rd year Ph.D. student at USC San Diego and an intern at MSR AI. Fatemeh and her team are working on analyzing the trade-off between utility and privacy while dealing with differential privacy on data containing many subgroups. The earlier works were mostly focussing on data with a 1% imbalance between subgroups and epsilon values between 3 and 10. But their work is focussed on data with imbalances up to 30% and epsilon values beyond the above range. When working on differential privacy, the data consists of a large number of subgroups. Sub-groups containing smaller populations lose their usability compared to those with larger populations. In DP, the utility is a compromise to get privacy. But this property is highly expressive on smaller sub-groups.

They built a smile detection model using resnet-18 on the Celebrity dataset as smiling is not correlated with gender. If there is a skew in the output, it must be due to an imbalance in the dataset. Further, the data is perfectly balanced in terms of smiling and non-smiling to nullify its influence. While looking at the test accuracies, the difference in the accuracy of males and females gets increased with an increase in an imbalance of data (an increase from 70 to 99.9). At higher epsilon levels (around 1), the model almost functioned randomly.

“On training, as the number of iterations is increased, differences in the training accuracies of male and female faces got increased when DP is applied. It remains the same if DP is not applied. So, it is possible to get less disparity by stopping early while compromising the accuracy.”

Presentation of this project can be watched here.

Fatemeh Mireshghallah and her team are trying to determine the significant portions of highly sensitive medical images. Successful determination of these portions can help to remove or blacking out unnecessary portions, making it available to a wider range of research purposes. The architecture consists of two models - the interpretability model and the Utility model. The interpretability model is a linear autoencoder that outputs a map called scales map of the same size as that of the input image. Relatively important pixels can be found from the scales map. This map is then fed into the utility model.

The whole process works similarly to reinforcement learning, training on different portions of the images across numerous iterations until the most important features are found. While working with the pancreas dataset, even if the position of the pancreas varies across different images, the model would be able to capture the required portion. Once the interpretability model can capture the required portions, the interpretability model is frozen, multiply the scales to new images and send them to the utility model. Increasing the random noise in the model tends to remove unnecessary portions as the training progresses.

“We are looking to expand the same method for other datasets like lung images, COVID datasets, etc. We are also looking for some baseline utility methods like Grad-CAM, occlusion sensitivity to compare our results.”

Presentation of this project can be watched here*.*

Kritika Prakash is a Master’s student at IIIT Hyderabad and she leads the Differential Privacy team at OpenMined. Kritika and her team are working on the process of automating the addition of noise to a query using differential privacy. The difficulty in the computation of sensitivity restricts the wide adoption of differential privacy in machine learning and deep learning. The differentially private analysis consists of defining a private input data, passing it through a query function, and obtaining the resultant data. In order to prevent the extraction of private information from this data, it can be noised. When automating the whole process, there are many challenges - defining the private data, calculating the sensitivity while querying it automatically, performing arithmetic operations on numerical data, and finally adding noise to the data automatically making sure that minimum privacy budget is used.

“As of now, we have defined what a private scalar is and how to generalize it. Things to be done are adding privacy budget accounting support and determining what's the best way to add noise to it.”

Presentation of this project can be watched here*.*

Rakshit Naidu and his team are developing alternative methods for Score-CAM as its output is noisy. They developed two methods of SS-CAM. The first method (SS-CAM(1)) adds noise to the activation function while the second method (SS-CAM(2)) adds noise to the normalized input mask. They generate feature maps as explained in the first talk by Fatemeh Mireshghallah, upsample, multiply them with the input image, and the results are smoothed. A linear combination is applied between the average scores thus generated and the upsampled activation maps to generate the output.

“While comparing the results of these CAMs and other older CAM methods, SS-CAM(2) performs better than all the other CAMs. Outputs of insertion and deletion curves support the claim. Fairness evaluation scores, localization evaluation, and human trust evals are also calculated and compared.”

Presentation of this project can be watched here*.*

Tushar Semwal is a postdoc at the University of Edinburgh. Tushar and his team are working on similarity representation to perform transfer learning in a federated way. By identifying the most similar source model to a target (client) model, we can transfer the trained parameters from the source model to the target model in a federated way to achieve the desired accuracy at a lower number of communication rounds. The challenges are unavailability of target data distribution and negative transfer. The most similar model is found using the Similarity metric and the federated election is done using an electing or voting algorithm.

The similarity between the source and target model is measured by calculating the difference between the activation functions of corresponding hidden layers of the source model and the target model. We need not look at the target data distribution. Centred Kernel Alignment performed better than all others with comparatively fewer datasets for comparing representation between layers of different models. Now to choose the best source model, a federated voting algorithm is developed where the source models are transferred to the client. Clients are voted for the source model with maximum CKA value. Finally, parameters of the source model with maximum parameters are transferred to the client model to continue the federated training.

Choosing USPS handwritten dataset as the target mode and MNIST, K-MNIST, CIFAR-10, E-MNIST, SVHN, F-MNIST, and STL-10 as the source models, the results obtained are tabulated. A native target model without any transfer trained in the USPS dataset took 200 communication rounds to reach 96% accuracy.

*“Still, there are some questions to be answered.*

*When should we start voting(after how many communication rounds)?**Is CKA sufficient or should we look for better representation comparison methods?”*

Presentation of this project can be watched here*.*

Georgios is a post-doc at the Technical University of Munich and the Imperial College London working in the biomedical imaging analysis group. His team in OpenMined is working on predicting paediatric pneumonia (whether it’s normal, bacterial or viral) from chest x-ray images using OpenMined stacks like PySyft and PyGrid. Their proof of principle indicates that it’s possible to solve this problem through Federated training and Secure Inference. In the process of implementing the whole pipeline, they are also attempting to find bugs in the existing framework and observe real-life difficulties to be faced while turning such a solution into a reality. He also pointed out some of the challenges they have faced so far including lack of batchnorm in ResNet-18 pre-trained model, GPU support, distribution of augmentation of dataset to be performed locally on the nodes, limitation of WebSockets and Grid Nodes etc. Their current result is quite comparable to human baseline and could be improved. He concluded by requesting people who are interested to get on board to reach out to @GKaissis through slack.

“Since radiologists are extremely expensive and rare especially in the developing world, it would make sense to offer a solution for example machine learning as a service or inference as a service which is end-to-end encrypted and privacy preserving in order to perform these diagnoses remotely.”

Presentation of this project can be watched here*.*

DKTK is German cancer consortium which is dedicated to tackling the most demanding clinical challenges in cancer research. On the other hand, JIP (Joint Imaging Platform) is an architecture where 11 German Universities were provided with harmonized hardwares (servers) which are all connected to central repositories and provided with centrally developed tools such as machine learning tools like PyTorch, NLTK etc. Using JIP, people can upload their ML models to the central registry and the local nodes can download and test them on their own use cases. Such architecture aims to increasingly decentralize the task of data processing in the domain of medical imaging. In the DKTK JIP deployment project, Georgios and his colleagues in OpenMined have successfully deployed the PyGrid and PySyft stack on one of the JIP servers and enabled it to be used for privacy-preserving machine learning within the single organization. Recently, JIP has been replaced by it’s open-source version named as DECIPHER which will open the door for anyone interested to leverage the project.

“I found it quite interesting that sort of the barrier towards a more widespread deployment specially a sort of federated learning scenario which would connect the individual sites among them were sort of more dependent on the willingness of the IT and IPSec teams to open up the firewalls so the servers could communicate with each other.“

Presentation of this project can be watched here*.*

Sahib Singh has joined as a machine learning research engineer in Ford Research after graduating from Carnegie Mellon University with a Master’s in Analytics and Data Science. His research team in OpenMined has benchmarked two privacy mechanisms named as local-DP and DP-SGD on medical imagery and reported their findings. Originally they have used two datasets, one is the Chest X-ray dataset for pneumonia detection and the other one is diabetic retinopathy dataset for determining diabetics severity of patients.

For benchmarking local-DP, they have generated three more datasets from the two original datasets by adding different scales of perturbations to the images. Then they are passed on to train a ResNet-18 model for getting the desired output. Similarly, for benchmarking DP-SDG three levels of perturbations were added after modifying the dataset gradients by clipping them using L2-norm, then adding random noise and finally multiplying them by the learning rate. Just like the previous case, ResNet-18 is trained using the perturbed datasets.

Their experiment shows that although the original model training accuracy for both the use cases are around 99%, as the level of perturbation in Local-DP increases, the training accuracy gradually decreases. However, the test accuracy in the perturbed models are always higher than their respective training accuracy. They also reported two important observations. Firstly, the perturbed images visually looked quite similar to their original versions, which raises a question about the effectiveness of this mechanism. Secondly, the test accuracy in the perturbed models are higher than the training accuracy. A possible explanation to this phenomena could be that the test images are not perturbed and being passed in their original format. Therefore, it allows the model to train on more complex latent features but being tested on raw images having less complex latent features.

He concluded by mentioning some future research directions including testing the robustness of such Differential Privacy mechanisms on Membership Inference Attacks. Apart from that, some of their upcoming projects will include PrivateNLP and Emotion Analysis.

“We want to evaluate privacy both in terms of a theoretical guarantee as well as from a visual standpoint.”

Presentation of this project can be watched here*.*

Aditya leads the data science team in an India-based gaming platform company called Mobile Premier League. He, along with his colleagues in OpenMined have developed a privacy-preserving recommendation system by implementing Matrix Factorization, one of the core parts of the system using Neural Networks in a federated fashion. For their use-case, they used 100K MovieLens dataset containing movie ratings given by users. Their approach was able to replicate the result achieved by the baseline approach proposed by Ammad-ud-din et. al. In order to add non-linearity to Matrix Factorization, they have also implemented a neural collaborative filtering. Their future works include extending the current model for implicit recommendation tasks like game or songs recommendations etc. and complexify the current architecture to be applied in non-static use-cases like news article recommendations. Finally, he mentioned that they are actively looking for collaborators to join the team.

“In every app we use, recommendation systems are powering the search and the recommendation we see on the app. Thus, converting this to a federated fashion or distributed fashion is essential for us to develop better systems.“

Presentation of this project can be watched here*.*

Ayush is working as a full-time software engineer and part-time researcher in machine learning after graduating from the University of Nebraska. His team in OpenMined is working on understanding the complexity of reproducing different Federated Learning algorithms proposed in existing research papers by implementing them in PyTorch. Along with other objectives, their ultimate goal is to build an open-source standard repository of FL algorithms written in PyTorch and PySyft and publish a compilation of all their experiments as a code-based survey paper which will be the first of its kind. They have already implemented Federated Averaging proposed in a paper where they have been able to surpass the reported accuracy by using a lot less number of hyperparameters than what the authors have used. Currently they are comparing Federated Average with FedProx. Initial result indicates that FedProx outperforms Federated Averaging. Ayush also presented a few other interesting observations found through their experiments.

For future work, they would like to implement the same algorithms in PySyft and develop a standardized metric for comparing different FL algorithms.

“[In the Federated Averaging paper], authors said that they have a total of 1.6 million parameters in their CNN architecture but when I replicated their architecture I only got around six hundred thousand trainable parameters which was interesting.”

Presentation of this project can be watched here*.*

Anupam is a PhD student at the *University of Edinburgh. *He begins by stating that the advantages of Federated Learning can also be its vulnerability. That is, decentralized and heterogenous data can lead to Backdoor attacks! He then gives a quick introduction to backdoor attack followed by some examples such as person identification error, CIFAR backdoor, word prediction, etc. He explains that backdoor attack usually effects nodes with unique features such as cars in green, vertical stripes on background, language attack types, etc. Where, an adversary can call cars with pink colour as bird! Or anything of their choice.

He further speaks of the defences and their cost proposed in various publications such as Krum and Multi-Krum assuming data to be homogeneous but negatively affects the fairness, followed by Norm differential clipping and weak Differential Privacy. Anupam then speaks of the choices available to the Adversary and their effects (pros and cons) such as adding more nodes or poisoning the data or picking optimal sets of triggers! Followed by their trade-offs and advantages.

He then points out that the adversarial utility is not straight forward as its rather a game play between the server and the adversary. So, needs to be calibrated in given scenario.

Because the distributed attack is better but not always true, and more poisoned data is better but it can make the nodes visible, high attack frequencies leads to better chances of success but with an open problem of "How to select trigger? How to know whether trigger is high or low probability, given the attack budget how to select the triggers optimally?"

Presentation of this project can be watched here*.*

Harsh completed his Master's from Harvard and Georgia Tech for Neuroscience and Computer science. Works at Spectral Lab as a Research Scientist Leading Autonomous Adversarial Defence Team and also, working on CV related projects. He is also, a Research Scientist at OpenMined, leading Privacy Preserving Neural Architecture Team. He and his team are working on various projects at OpenMined, out of which the primary focus for the presentation is **Privacy Preserving technology for Multi-Modal Learning and Neural Architecture Search. **

As per Harsh, different architectural search algorithms can be grouped into 3 spaces - Reinforcement Learning based , Evolutionary and Differential Method. Treating search spaces constant and optimize along it for computational efficiency in Federated Learning setting, though it is tough for traditional setups. At higher level proposes DP FNAS - a gradient based architectural search to update the client models in the Federated setting and adding noise to the gradients in Differentially Private Framework including theoretical analysis. He then speaks of the good performance by this setup followed by its offshoots in the research directions and the couple of choices made by the publishers such as one, its computationally very efficient and two, DP perfectly matches their framework, etc. He then speaks of the direction his team is heading to like bench marking the performance of these setups in the same way as they did with the residual network, comparing different stats about the performance and qualitative comparison of generating topology of different primitives, multi-modal stuff modification and other novel contributions.

"Architecture seems effective in the Federated setting but alone with the Federated setting doesn't really provide privacy guarantees, because of the sort of sharing of these different sort of intermediate statistics, so applying gradients based on Differential Privacy seems like a useful extension to this general setup."

Presentation of this project can be watched here*.*

Franz Papst is a PhD student at Graz University of Technology. He and his team - Tushar Semwal, Santiago Gonzalez, aims to study affirmative online learning for time series data. For which the motivation comes from the idea of many devices on edge receiving constant sensor data, which they want to use as a setting for Federated Learning, using WESAD dataset, a multimodal dataset for various stress and defecting detection consisting of data from 15 subjects. He then speaks of how MNIST like dataset is not available for the time series, but the team managed to find one and create a Non-Federated Learning and ML or DL baseline for the above. The dataset is recorded using chest based devices. The Research team used different methods like LDA and LSTMs and turns out the LSTM performs quite well! He later concludes with the aim is to build Federated Learning based LSTM models after which the team may start focusing on using diffusion network for effective detection followed by experimenting with online learning.

Training in Federated learning setting and exploring strategies to how to distribute it. One which works for each subject in the test set, also with the imbalanced affected with state labourers. Used a global test set with 30% random subjects. But, the PySyft library not being good enough for the application yet!

Presentation of this project can be watched here*.*

Stephen and his team at OpenMined, believes in "*Better Machine Learning than Cryptography*" over the general view of "Better Cryptography than Machine Learning". The motivation for which comes from the idea of Privacy via encryption, which may also open up ML to security sensitive problems and a question, Can encryption be used by default? He further mentions, based on the prior works published, which mostly focused on better Neural Network via improving the fundamentals, encrypted ML is rather SLOW! He also mentions some of the limitations of the prior work such as weak set of primitives, small datasets/models, everyones focus being on optimizing for the inference runtime. They chose FALCON for case study. Later ends the presentation by team's roadmap and status updates.

"In pursuit of improving crypto-oriented neural architecture search by building on new primitives"

Presentation of this project can be watched here*.*

]]>

Homomorphic encryption is a promising field which allows computation on encrypted data. This excellent post, What Is homomorphic Encryption, provides a broad explanation of what homomorphic encryption is and what the stakes are for this field of research.

In this series of articles, we will study in depth the Cheon-Kim-Kim-Song (CKKS) scheme, which is first discussed in the paper Homomorphic Encryption for Arithmetic of Approximate Numbers. CKKS allows us to perform computations on vectors of complex values (thus real values as well). The idea is that we will implement CKKS from scratch in Python and then, by using these crypto primitives, we can explore how to perform complex operations such as linear regression, neural networks, and so on.

The figure above provides a high-level view of CKKS. We can see that a message **m**, a vector of values on which we want to perform certain computation, is first encoded into a *plaintext* polynomial **p(X)** and then encrypted using a public key.

CKKS works with polynomials because they provide a good trade-off between security and efficiency as compared to standard computations on vectors.

Once the message **m** is encrypted into **c**, a couple of polynomials, CKKS provides several operations that can be performed on it, such as addition, multiplication and rotation.

If we denote** **a function by **f**, which is a composition of homomorphic operations, then decrypting **c’ = f(c)** with the secret key will yield **p’ = f(p)**. Therefore once we decode it, we will get **m = f(m).**

The central idea to implement a homomorphic encryption scheme is to have homomorphic properties on the *encoder*, *decoder*, *encryptor* and *decryptor*. This way, operations on *ciphertext* will be decrypted and decoded correctly and provide outputs as if operations were done directly on *plaintext*.

So in this article we will see how to implement the encoder and the decoder, and in later articles we will continue to implement the encryptor and decryptor to have a homomorphic encryption scheme.

**Preliminaries**

Basic knowledge in linear algebra and ring theory is recommended to have a good understanding of how CKKS is implemented. You can acquaint yourself with these topics using the following links:

- Introduction to linear algebra provides a good basis in linear algebra.
- Ring theory (Math 113) is a good resource to learn ring theory.

Specifically for this article, we will rely on the following concepts:

- Cyclotomic polynomials which are polynomials with nice properties, as they allow efficient computations when used as polynomial modulo.
- Canonical embedding which is used for encoding and decoding. They have the nice properties of being isomorphism, i.e. one-to-one homomorphisms between vectors and polynomials.
- Vandermonde matrices which are a special class of matrices, which we will use to have the inverse of the canonical embedding.

If you want to run the notebook you can find it in this Colab notebook.

CKKS exploits the rich structure of integer polynomial rings for its plaintext and ciphertext spaces. Nonetheless, data comes more often in the form of vectors than in polynomials.

Therefore, it is necessary to encode our input \(z \in \mathbb{C}^{N/2}\) into a polynomial \(m(X) \in \mathbb{Z}[X]/(X^N + 1)\).

We will denote the degree of our polynomial degree modulus by \(N\), which will be a power of 2. We denote the \(m\)-th cyclotomic polynomial (note that \(M=2N\)) by \(\Phi_M(X) = X^N + 1\). The plaintext space will be the polynomial ring \(\mathcal{R} = \mathbb{Z}[X]/(X^N + 1)\). Let us denote by \(\xi_M\), the \(M\)-th root of unity : \(\xi_M = e^{2 i \pi / M}\).

To understand how we can encode a vector into a polynomial and how the computation performed on the polynomial will be reflected in the underlying vector, we will first experiment with a vanilla example, where we simply encode a vector \(z \in \mathbb{C}^{N}\) into a polynomial \(m(X) \in \mathbb{C}[X]/(X^N + 1)\). Then we will cover the actual encoding of CKKS, which takes a vector \(z \in \mathbb{C}^{N/2}\), and encodes it in a polynomial \(m(X) \in \mathbb{Z}[X]/(X^N + 1)\).

Here we will cover the simple case of encoding \(z \in \mathbb{C}^{N}\), into a polynomial \(m(X) \in \mathbb{C}[X]/(X^N + 1)\).

To do so, we will use the canonical embedding \(\sigma : \mathbb{C}[X]/(X^N +1) \rightarrow \mathbb{C}^N\), which decodes and encodes our vectors.

The idea is simple: To decode a polynomial \(m(X)\) into a vector \(z\), we evaluate the polynomial on certain values, which will be the roots of the cyclotomic polynomial \(\Phi_M(X) = X^N + 1\). Those \(N\) roots are : \(\xi, \xi^3, ..., \xi^{2 N - 1}\).

So to decode a polynomial \(m(X)\), we define \(\sigma(m) = (m(\xi), m(\xi^3), ..., m(\xi^{2 N - 1})) \in \mathbb{C}^N\). Note that \(\sigma\) defines an isomorphism, which means it is a bijective homomorphism, so any vector will be uniquely encoded into its corresponding polynomial, and vice-versa.

The tricky part is the encoding of a vector \(z \in \mathbb{C}^N\) into the corresponding polynomial, which means computing the inverse \(\sigma^{-1}\). Hence, the problem is to find a polynomial \(m(X) = \sum_{i=0}^{N-1} \alpha_i X^i \in \mathbb{C}[X]/(X^N + 1)\), given a vector \(z \in \mathbb{C}^N\), such that \(\sigma(m) = (m(\xi), m(\xi^3), ..., m(\xi^{2 N - 1})) = (z_1,...,z_N)\).

Pursuing this thread further, we end up with the following system :

\(\sum_{j=0}^{N-1} \alpha_j (\xi^{2 i - 1})^j = z_i, i=1,...,N\).

This can be viewed as a linear equation :

\(A \alpha = z\), with \(A\) the Vandermonde matrix of the \((\xi^{2 i -1})_{i=1,...,N}\), \(\alpha\) the vector of the polynomial coefficients, and \(z\) the vector we want to encode.

Therefore we have that : \(\alpha = A^{-1} z\), and that \(\sigma^{-1}(z) = \sum_{i=0}^{N-1} \alpha_i X^i \in \mathbb{C}[X]/(X^N + 1)\).

Let's now examine an example to better understand what we have discussed so far.

Let \(M = 8\), \(N = \frac{M}{2}= 4\), \(\Phi_M(X) = X^4 + 1\), and \(\omega = e^{\frac{2 i \pi}{8}} = e^{\frac{i \pi}{4}}\).

Our goal here is to encode the following vectors : \([1, 2, 3, 4]\) and \([-1, -2, -3, -4]\), decode them, add and multiply their polynomial and decode it.

As we can see, in order to decode a polynomial, we simply need to evaluate it on powers of an \(M\)-th root of unity. Here we choose \(\xi_M = \omega = e^{\frac{i \pi}{4}}\).

Once we have \(\xi\) and \(M\), we can define both \(\sigma\) and its inverse \(\sigma^{-1}\), respectively the decoding and the encoding.

Now we can move on to implement the Vanilla Encoder and Decoder in Python.

```
import numpy as np
# First we set the parameters
M = 8
N = M //2
# We set xi, which will be used in our computations
xi = np.exp(2 * np.pi * 1j / M)
xi
```

`(0.7071067811865476+0.7071067811865475j)`

```
from numpy.polynomial import Polynomial
class CKKSEncoder:
"""Basic CKKS encoder to encode complex vectors into polynomials."""
def __init__(self, M: int):
"""Initialization of the encoder for M a power of 2.
xi, which is an M-th root of unity will, be used as a basis for our computations.
"""
self.xi = np.exp(2 * np.pi * 1j / M)
self.M = M
@staticmethod
def vandermonde(xi: np.complex128, M: int) -> np.array:
"""Computes the Vandermonde matrix from a m-th root of unity."""
N = M //2
matrix = []
# We will generate each row of the matrix
for i in range(N):
# For each row we select a different root
root = xi ** (2 * i + 1)
row = []
# Then we store its powers
for j in range(N):
row.append(root ** j)
matrix.append(row)
return matrix
def sigma_inverse(self, b: np.array) -> Polynomial:
"""Encodes the vector b in a polynomial using an M-th root of unity."""
# First we create the Vandermonde matrix
A = CKKSEncoder.vandermonde(self.xi, M)
# Then we solve the system
coeffs = np.linalg.solve(A, b)
# Finally we output the polynomial
p = Polynomial(coeffs)
return p
def sigma(self, p: Polynomial) -> np.array:
"""Decodes a polynomial by applying it to the M-th roots of unity."""
outputs = []
N = self.M //2
# We simply apply the polynomial on the roots
for i in range(N):
root = self.xi ** (2 * i + 1)
output = p(root)
outputs.append(output)
return np.array(outputs)
```

Let's first encode a vector and see how it is encoded using real values.

```
# First we initialize our encoder
encoder = CKKSEncoder(M)
b = np.array([1, 2, 3, 4])
b
```

`array([1, 2, 3, 4])`

Let's encode the vector now.

```
p = encoder.sigma_inverse(b)
p
```

`x↦(2.5+4.440892098500626e-16j)+((-4.996003610813204e-16+0.7071067811865479j))x+((-3.4694469519536176e-16+0.5000000000000003j))x^2+((-8.326672684688674e-16+0.7071067811865472j))x^3`

Now let's see how we can extract the vector we had initially from the polynomial:

```
b_reconstructed = encoder.sigma(p)
b_reconstructed
```

`array([1.-1.11022302e-16j, 2.-4.71844785e-16j, 3.+2.77555756e-17j, 4.+2.22044605e-16j])`

We can see that the values of the reconstruction and the initial vector are very close.

`np.linalg.norm(b_reconstructed - b)`

`6.944442800358888e-16`

As stated before, \(\sigma\) is not chosen randomly to encode and decode, but it has a lot of nice properties. Among them, \(\sigma\) is an isomorphism, so addition and multiplication on polynomials will result in coefficient wise addition and multiplication on the encoded vectors.

The homomorphic property of \(\sigma\) is due to the fact that \(X^N = 1\) and \(\xi^N = 1\).

We can now start to encode several vectors, and see how we can perform homomorphic operations on them and decode it.

```
m1 = np.array([1, 2, 3, 4])
m2 = np.array([1, -2, 3, -4])
p1 = encoder.sigma_inverse(m1)
p2 = encoder.sigma_inverse(m2)
```

We can see that addition is pretty straightforward.

```
p_add = p1 + p2
p_add
```

`x↦(2.0000000000000004+1.1102230246251565e-16j)+((-0.7071067811865477+0.707106781186547j))x+((2.1094237467877966e-15-1.9999999999999996j))x^2+((0.7071067811865466+0.707106781186549j))x^3`

Here as expected, we see that p1 + p2 decodes correctly to \([2, 0, 6, 0]\).

`encoder.sigma(p_add)`

`array([2.0000000e+00+3.25176795e-17j, 4.4408921e-16-4.44089210e-16j, 6.0000000e+00+1.11022302e-16j, 4.4408921e-16+3.33066907e-16j])`

Because when doing multiplication we might have terms whose degree is higher than \(N\), we will need to do a modulo operation using \(X^N + 1\).

To perform multiplication, we first need to define the polynomial modulus which we will use.

```
poly_modulo = Polynomial([1,0,0,0,1])
poly_modulo
```

`x↦1.0+0.0x+0.0x^2+0.0x^3+1.0x^4`

Now we can perform multiplication.

`p_mult = p1 * p2 % poly_modulo`

Finally if we decode it, we can see that we have the expected result.

`encoder.sigma(p_mult)`

`array([ 1.-8.67361738e-16j, -4.+6.86950496e-16j, 9.+6.86950496e-16j, -16.-9.08301212e-15j])`

Therefore we can see that our simple encoder and decoder works as expected, as it has homomorphic properties and is a one-to-one mapping between vectors and polynomials.

While this is a great step, we have actually lied because if you noticed before, when we used the encoder \(\sigma^{-1}\) the polynomials had complex coefficients. So while the encoder and decoder were indeed homomorphic and one-to-one, the domains they covered were \(\mathbb{C}^N \rightarrow \mathbb{C}[X]/(X^N +1)\). Because we actually want polynomials to belong in \(\mathbb{Z}[X]/(X^N +1)\) to use all the properties of integer polynomial rings, we thus need to make sure the encoder outputs polynomials with integer coefficients and not complex coefficients.

So I hope you enjoyed this little introduction to encoding complex numbers into polynomials for homomorphic encryption. We will see in the next article how to implement the actual encoder and decoder used in CKKS so stay tuned !

]]>Website: @KritikaPrakash | Github: @Kritikalcoder | Twitter: @kritipraks

**Where are you based?**

"I’m currently based in Bangalore, India."

**What do you do?**

"I lead the Differential Privacy Research Team at OpenMined. I am an MS by Research student in Computer Science at IIIT Hyderabad (India) where I work on building algorithms for power trading in smart electricity grids. I am also an artist."

**What are your specialties?**

"I come from a research background and work on interesting problems in Deep Learning, Differential Privacy, Reinforcement Learning and Game Theory. A lot of my work involves Python and Java programming, problem formulation, technical reading and writing. Currently, I am working on a research project at OpenMined to automate Differential Privacy analysis for Deep Learning."

**How and when did you originally come across OpenMined?**

"I first came across OpenMined 1.5 years ago. In 2019, I started working on a side project on Differential Privacy. I faced some technical roadblocks and started looking online for implementations of Differentially Private Machine Learning Algorithms. That’s when I came across OpenMined and explored PySyft as well as the blogs.

Soon, I filled out the survey for the mentorship program to get mentored and more involved with the community."

**What was the first thing you started working on within OpenMined?**

"I became a mentor as a part of the Mentorship Team in November 2019 and had a lot of fun helping other beginners figure out how to contribute to OpenMined and achieve their own goals."

**And what are you working on now?**

"I am working as a research scientist on the project “Automatic Sensitivity Analysis for Differentially Private Deep Learning”. As part of the Differential Privacy Research Team, I collaborate on fun research-based projects on different applications of Differential Privacy to the real world, and how we (OpenMined) can build tools to serve this purpose. I also give talks and write blog-posts on Differential Privacy."

**What would you say to someone who wants to start contributing?**

"Start small and ask for lots of help. It’s always confusing and overwhelming at the beginning. The OpenMined slack is your best friend here. This community is incredibly helpful, friendly and resourceful. Take initiative. Look for things to do/fix which align with your interests, and then, just give it your all. Once you’re familiar with the basics, the best way to contribute is to start working on a project, and learn all the skills in the process. Contributing to open source is a great way to work on your self-growth and build skills in various areas such as coding, teamwork, research, communication and more."

**Please recommend one interesting book, podcast or resource to the OpenMined community.**

"Lex Fridman’s series of podcasts on “Artificial Intelligence” have been super helpful in making me feel connected to the most thriving parts of the computer science global community. They’re easily accessible on YouTube. While textbooks help you get strong technically, such interviews give you a perspective on how different fields connect with sub-fields of computer science, and what role we play in shaping tomorrow’s world as engineers and researchers.]]>

Another resource I’ll strongly recommend to beginners in Machine Learning is fast.ai’s courses on practical Machine Learning. The content is taught in a top-down fashion with a lot of jupyter-notebook style of experimental coding along the way. It really helps you get a practical understanding of what’s happening and solidifies your learning in the process."

Let us discuss some challenges of performing machine Learning on Private data. Here are some attacks that have been seen in the security literature.

1) Training data extraction attacks or model inversion attacks: Imagine you have a classifier trained on images of individual faces, to recognize which individual is in the image, you feed the classifier some face images and the output will classify the person in the image. Frederikson. et. al. showed that with access to the classifier’s output probabilities they were able to reconstruct these images here which are approximations of the training data that the machine learning classifier saw. These approximations are not extracting individual training points but rather the average representations that the classifier learned for each class which here corresponds to one individual.

2) Membership attacks: The second kind of attack is membership inference attacks against ML models introduced by Shokri et. al.. Here the goal of the adversary is slightly different, instead of reconstructing training points from the output of the classifier the goal now is to infer whether a specific input was used to train the model. Given the image of a person, did the person contribute to the training data from a specific machine learning model. What Shokri et.al. also showed is that you can perform these attacks by only having access to the classifier’s probabilities.

PATE analysis is a way to defend these data holders against these powerful attackers by generally considering two types of threat models i.e. black box and white box adversaries:

1) Model querying (black-box-adversary): In this kind of attack, the adversary is only able to query the model that you trained, it will not have access to the internals of the model or to the architectures to the parameters. All it can do is submit inputs to your black-box model and observe the prediction that the model is making. This is called model querying attacks or black box attacks. The two attacks presented above are instances of such attacks.(Shokri et.al. & Frederikson et.al.)

2) Model inspection (white-box adversary): This kind of attack is stronger because in this case the adversary has access to the model and its parameters. The work by Zhang et.al. i.e Understanding Deep Learning requires re-thinking generalization kind of hints at the fact that machine learning models might be able to memorize some of their training data or at least they have the capacity to do so. So we need to be robust to an attacker that has access to these model parameters and can analyze them.

This is why while working on PATE analysis the threat models considered are very powerful adversaries, which can make a potentially unbounded number of queries and also have access to the model internals.

The way it is proved that PATE provides differential privacy is based on an application called "Moments Accountant", a technique introduced by Abadi et. al. in 2016 paper. The Moments accountant technique allows us to formalize the fact that when we have a strong quorum among the teachers - when all of the teachers or most of the teachers agree - then we should pay a small privacy cost for that prediction given to the student. The guarantees we provide here in terms of differential privacy are "data dependant", which means that during training all the votes provided by the teachers are recorded and we compute numerical values for the differential privacy guarantees provided. Here, in a sense, differential privacy is characterized by two values epsilon and delta.

Epsilon, or the privacy budget, is the value which basically defines an interval, which quantifies how much we tolerate the output of the machine learning model on one database to be different from the output of the same machine learning model on second database D prime that only has one training point that is different. The smaller the epsilon is, the stronger the privacy will be.

Delta is the failure rate which we tolerate the guarantee to not hold.

Steps:**The first thing we need to do before getting started is install syft**

`!pip install syft`

**Load the SVHN data set**

SVHN is a real-world image data set for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.italicized text.

```
import torch
from torchvision import datasets, transforms
from torch.utils.data import Subset
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
train_data = datasets.SVHN('datasets/SVHN/train/', split='train', transform=transform,
target_transform=None, download=True)
test_data = datasets.SVHN('datasets/SVHN/test/', split='test', transform=transform,
target_transform=None, download=True)
num_teachers = 100
batch_size = 50
def get_data_loaders(train_data, num_teachers):
""" Function to create data loaders for the Teacher classifier """
teacher_loaders = []
data_size = len(train_data) // num_teachers
for i in range(data_size):
indices = list(range(i*data_size, (i+1)*data_size))
subset_data = Subset(train_data, indices)
loader = torch.utils.data.DataLoader(subset_data, batch_size=batch_size)
teacher_loaders.append(loader)
return teacher_loaders
teacher_loaders = get_data_loaders(train_data, num_teachers)
```

**Generating the student train and test data by splitting the svhn test set**

```
student_train_data = Subset(test_data, list(range(9000)))
student_test_data = Subset(test_data, list(range(9000, 10000)))
student_train_loader = torch.utils.data.DataLoader(student_train_data, batch_size=batch_size)
student_test_loader = torch.utils.data.DataLoader(student_test_data, batch_size=batch_size)
```

**Define the teacher models and train them by defining a cnn**

```
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(5*10*10, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(x.size(0), 5*10*10)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)
```

**Defining the train and predict functions**

```
def train(model, trainloader, criterion, optimizer, epochs=10):
running_loss = 0
for e in range(epochs):
model.train()
for images, labels in trainloader:
optimizer.zero_grad()
output = model.forward(images)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
def predict(model, dataloader):
outputs = torch.zeros(0, dtype=torch.long)
model.eval()
for images, labels in dataloader:
output = model.forward(images)
ps = torch.argmax(torch.exp(output), dim=1)
outputs = torch.cat((outputs, ps))
return outputs
def train_models(num_teachers):
models = []
for i in range(num_teachers):
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
train(model, teacher_loaders[i], criterion, optimizer)
models.append(model)
return models
models = train_models(num_teachers)
```

**Next by combining the predictions of the Teacher models we will generate the Aggregated Teacher and Student labels**

```
import numpy as np
epsilon = 0.2
def aggregated_teacher(models, dataloader, epsilon):
preds = torch.torch.zeros((len(models), 9000), dtype=torch.long)
for i, model in enumerate(models):
results = predict(model, dataloader)
preds[i] = results
labels = np.array([]).astype(int)
for image_preds in np.transpose(preds):
label_counts = np.bincount(image_preds, minlength=10)
beta = 1 / epsilon
for i in range(len(label_counts)):
label_counts[i] += np.random.laplace(0, beta, 1)
new_label = np.argmax(label_counts)
labels = np.append(labels, new_label)
return preds.numpy(), labels
teacher_models = models
preds, student_labels = aggregated_teacher(teacher_models, student_train_loader, epsilon)
```

**Now by using the labels generated previously we will create the Student model and train it**

Why do we need to train an additional student model? The aggregated teacher violates our threat model:

1.The total privacy loss gets increased by each prediction. Privacy budgets create a tension between the accuracy and number of predictions. (If we stick to the first part of the mechanism where we have only the teachers, everytime the teachers make a prediction we pay an additional cost in privacy. As users make more and more prediction queries the overall cost in terms of privacy will keep increasing so at some point we will have a tension between utility and privacy.)

2.Pivate data may get revealed by the inspection of internals. Privacy guarantees should hold in the face of white-box adversaries. (If you remember a threat model we considered adversaries are able to access the internals of the models which means that if the adversaries are able to inspect the internals of the teachers because the teachers saw the training data they may be able to leak some information about that training data. Whereas if the adversary instead inspects the student model then it will only be able recover, in the worst case, the public data with the labels that were provided by the teachers with differential privacy.)

```
def student_loader(student_train_loader, labels):
for i, (data, _) in enumerate(iter(student_train_loader)):
yield data, torch.from_numpy(labels[i*len(data): (i+1)*len(data)])
student_model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(student_model.parameters(), lr=0.003)
epochs = 10
steps = 0
running_loss = 0
for e in range(epochs):
student_model.train()
train_loader = student_loader(student_train_loader, student_labels)
for images, labels in train_loader:
steps += 1
optimizer.zero_grad()
output = student_model.forward(images)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if steps % 50 == 0:
test_loss = 0
accuracy = 0
student_model.eval()
with torch.no_grad():
for images, labels in student_test_loader:
log_ps = student_model(images)
test_loss += criterion(log_ps, labels).item()
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
student_model.train()
print("Epoch: {}/{}.. ".format(e+1, epochs),
"Train Loss: {:.3f}.. ".format(running_loss/len(student_train_loader)),
"Test Loss: {:.3f}.. ".format(test_loss/len(student_test_loader)),
"Accuracy: {:.3f}".format(accuracy/len(student_test_loader)))
running_loss = 0
```

```
Epoch: 1/10.. Train Loss: 0.321.. Test Loss: 3.638.. Accuracy: 0.217
Epoch: 1/10.. Train Loss: 0.254.. Test Loss: 5.541.. Accuracy: 0.217
Epoch: 1/10.. Train Loss: 0.244.. Test Loss: 5.642.. Accuracy: 0.217
Epoch: 2/10.. Train Loss: 0.223.. Test Loss: 5.961.. Accuracy: 0.217
Epoch: 2/10.. Train Loss: 0.205.. Test Loss: 6.295.. Accuracy: 0.254
Epoch: 2/10.. Train Loss: 0.186.. Test Loss: 6.514.. Accuracy: 0.303
...
```

**Now we will perform PATE Analysis on the student labels generated by the Aggregated Teacher**

```
from syft.frameworks.torch.dp import pate
data_dep_eps, data_ind_eps = pate.perform_analysis(teacher_preds=preds, indices=student_labels, noise_eps=epsilon, delta=1e-5)
print("Data Independent Epsilon:", data_ind_eps)
print("Data Dependent Epsilon:", data_dep_eps)
```

Which will give the output :-

```
Data Independent Epsilon: 1451.5129254649705
Data Dependent Epsilon: 59.47392676433782
```

**Conclusion:**

While performing different Differential privacy techniques it is important to note which technique to use in the given scenario. PATE is generally useful when a party wants to annotate a local data set using the private data sets of other actors, and the Epsilon-delta tool allows for very granular control of just how much the other actors must trust us to protect their privacy in this process.

Credits:

Nicolas Papernot - Private Machine Learning with PATE - Cybersecurity With The Best 2017

SCALABLE PRIVATE LEARNING WITH PATE paper

SEMI-SUPERVISED KNOWLEDGE TRANSFER FOR DEEP LEARNING FROM PRIVATE TRAINING DATA paper

A 5-Step Guide on incorporating Differential Privacy into your Deep Learning models

Nicolas Papernot and Ian goodfellow Clever-Hans blog

MAINTAINING PRIVACY IN MEDICAL DATA WITH DIFFERENTIAL PRIVACY

UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION

Membership Inference Attacks Against Machine Learning Models

If you want to join our mission on making the world more privacy preserving :-

]]>We at OpenMined are collaborating with the PyTorch team in moving towards our shared mission of making privacy-preserving AI easier to build. We actively support the use of Opacus in our community as well as its integration with the tools we're building.

Differential Privacy is one of the most important tools for use within privacy-preserving AI because it allows us to train AI models which can learn high level patterns (such as how to identify cancerous tumors) without memorising sensitive personal data (unique attributes about specific patients).

The implication of this technology becoming mainstream is a rapid increase in the amount of training data available for scientific inquiry into humanity's most important and sensitive issues.

This is a step-by-step tutorial on how to train a simple PyTorch classification model on MNIST dataset using a differentially private - stochastic gradient descent optimizer in 20 lines of code using the PyTorch Opacus library.

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance and allows the client to online track the privacy budget expended at any given moment.

Step 1: Importing PyTorch and Opacus

Step 2: Loading MNIST Data

Step 3: Creating a PyTorch Neural Network Classification Model and Optimizer

Step 4: Attaching a Differential Privacy Engine to the optimizer

Step 5: Training the private model over multiple epochs

1. Basic Deep Learning in PyTorch (Tutorial)

2. Installing PyTorch

3. Installing PyTorch Opacus

For private image classification on MNIST, we will require the use of PyTorch, `numpy`

, and Opacus. We will need `tqdm`

to visualize the progress we make during the training process.

```
import torch
from torchvision import datasets, transforms
import numpy as np
from opacus import PrivacyEngine
from tqdm import tqdm
```

We load MNIST data using a `DataLoader`

and split it into train and test datasets. The data is shuffled, and normalized using the mean (0.1307) and the standard deviation (0.3081) of the dataset. The training set is divided into batches of 64 images each, whereas the testing set is divided into batches of 1024 images each.

```
train_loader = torch.utils.data.DataLoader(datasets.MNIST('../mnist',
train=True, download=True,
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,)),]),),
batch_size=64, shuffle=True, num_workers=1,
pin_memory=True)
test_loader = torch.utils.data.DataLoader(datasets.MNIST('../mnist',
train=False,
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,)),]),),
batch_size=1024, shuffle=True, num_workers=1,
pin_memory=True)
```

Step 3: Creating a PyTorch Neural Network Classification Model and Optimizer

Now, let us create a `Sequential`

PyTorch neural network model which predicts the label of images from our MNIST dataset. We create a simple network consisting of 2 convolutional layers, followed by 2 fully connected layers, interspersed with multiple `ReLu`

and `MaxPooling`

layers. The first layer takes images of size `(28 x 28)`

as input. The final layer returns a vector of probabilities of size `(1 x 10)`

, so that we can predict which digit (from 0 - 9) the image represents.

Finally, we use a Stochastic Gradient Descent (SGD) optimizer to improve our classification model, and we set the learning rate to 0.05.

```
model = torch.nn.Sequential(torch.nn.Conv2d(1, 16, 8, 2, padding=3),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, 1),
torch.nn.Conv2d(16, 32, 4, 2),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, 1),
torch.nn.Flatten(),
torch.nn.Linear(32 * 4 * 4, 32),
torch.nn.ReLU(),
torch.nn.Linear(32, 10))
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
```

Step 4: Attaching a Differential Privacy Engine to the Optimizer

So far, we have a regular PyTorch classification model. Now, we want to make our model differentially private. So, we create and attach a `PrivacyEngine`

from the opacus library to our model and it will make our model differentially private and help us track our privacy budget.

The `sample_size`

here is the size of the training set, the list of `alphas`

are the different orders at which Renyi Differential Privacy is computed.

The `noise_multiplier`

is ratio of the standard deviation of the Gaussian noise distribution to the `L2`

sensitivity of the function to which it is added.

`max_grad_norm`

indicates the value of the upper bound of the `L2`

norm of the loss gradients.

```
privacy_engine = PrivacyEngine(model,
batch_size=64,
sample_size=60000,
alphas=range(2,32),
noise_multiplier=1.3,
max_grad_norm=1.0,)
privacy_engine.attach(optimizer)
```

Step 5: Training the private model over multiple epochs

Finally, we create a function to train the model using the differentially private SGD optimizer. We want to minimize the cross entropy loss.

Within a single training step, we iterate over all the training batches of data and make our model predict the label of the data. Based on this prediction, we obtain a loss, whose gradients we back-propagate using `loss.backward()`

.

Now the key difference brought by the privacy engine, is that the norms of the gradients propagated backwards are clipped to be less than `max_grad_norm`

. This is done to ensure that the overall sensitivity of the data is bounded.

After that, the gradients for a batch are averaged and randomly sampled Gaussian noise is added to it based on the value of `noise_multiplier`

. This is done to ensure that the process satisfies the guarantee of Renyi Differential Privacy of a specified alpha order value.

Finally, the optimizer takes a step in a direction opposite to that of the largest noisy gradient using `optimizer.step()`

. We repeat this entire training iteration for 10 epochs, to decrease the overall loss.

```
def train(model, train_loader, optimizer, epoch, device, delta):
model.train()
criterion = torch.nn.CrossEntropyLoss()
losses = []
for _batch_idx, (data, target) in enumerate(tqdm(train_loader)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
losses.append(loss.item())
epsilon, best_alpha =
optimizer.privacy_engine.get_privacy_spent(delta)
print(
f"Train Epoch: {epoch} \t"
f"Loss: {np.mean(losses):.6f} "
f"(ε = {epsilon:.2f}, δ = {delta}) for α = {best_alpha}")
for epoch in range(1, 11):
train(model, train_loader, optimizer, epoch, device="cpu", delta=1e-5)
```

The epsilon and delta parameters help us determine the shape and size of the Gaussian noise distribution. Together, the epsilon, delta and alpha values give us a quantification of the Renyi Differential Privacy guarantee and the privacy budget we have spent.

This is how opacus neatly adds support for Differential Privacy to a PyTorch model by attaching a discreet Privacy Engine to its model, ensuring that the overall model creation and training process is just the same.

```
# Step 1: Importing PyTorch and Opacus
import torch
from torchvision import datasets, transforms
import numpy as np
from opacus import PrivacyEngine
from tqdm import tqdm
# Step 2: Loading MNIST Data
train_loader = torch.utils.data.DataLoader(datasets.MNIST('../mnist', train=True, download=True,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,),
(0.3081,)),]),), batch_size=64, shuffle=True, num_workers=1, pin_memory=True)
test_loader = torch.utils.data.DataLoader(datasets.MNIST('../mnist', train=False,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,),
(0.3081,)),]),), batch_size=1024, shuffle=True, num_workers=1, pin_memory=True)
# Step 3: Creating a PyTorch Neural Network Classification Model and Optimizer
model = torch.nn.Sequential(torch.nn.Conv2d(1, 16, 8, 2, padding=3), torch.nn.ReLU(), torch.nn.MaxPool2d(2, 1),
torch.nn.Conv2d(16, 32, 4, 2), torch.nn.ReLU(), torch.nn.MaxPool2d(2, 1), torch.nn.Flatten(),
torch.nn.Linear(32 * 4 * 4, 32), torch.nn.ReLU(), torch.nn.Linear(32, 10))
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
# Step 4: Attaching a Differential Privacy Engine to the Optimizer
privacy_engine = PrivacyEngine(model, batch_size=64, sample_size=60000, alphas=range(2,32),
noise_multiplier=1.3, max_grad_norm=1.0,)
privacy_engine.attach(optimizer)
# Step 5: Training the private model over multiple epochs
def train(model, train_loader, optimizer, epoch, device, delta):
model.train()
criterion = torch.nn.CrossEntropyLoss()
losses = []
for _batch_idx, (data, target) in enumerate(tqdm(train_loader)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
losses.append(loss.item())
epsilon, best_alpha = optimizer.privacy_engine.get_privacy_spent(delta)
print(
f"Train Epoch: {epoch} \t"
f"Loss: {np.mean(losses):.6f} "
f"(ε = {epsilon:.2f}, δ = {delta}) for α = {best_alpha}")
for epoch in range(1, 11):
train(model, train_loader, optimizer, epoch, device="cpu", delta=1e-5)
```

]]>We are always working towards our mission to make the world more privacy-preserving by lowering the barrier-to-entry to privacy-enhancing technology.

**What is Digital Public Good?**

The **UN Secretary-General’s** High-level Panel on Digital Cooperation,

We are always working towards our mission to make the world more privacy-preserving by lowering the barrier-to-entry to privacy-enhancing technology.

**What is Digital Public Good?**

The **UN Secretary-General’s** High-level Panel on Digital Cooperation, in its report published on June 10th 2019, recommends “*that a broad, multi-stakeholder alliance, involving the UN, create a platform for sharing digital public goods, engaging talent and pooling data sets, in a manner that respects privacy, in areas related to attaining the SDGs*”.

**Digital Public Goods** is an alliance and a platform to help accelerate the attainment of Sustainable Development Goals in low and middle-income countries. It is open-source software, open data, open AI models, open standards and open content that adhere to privacy and other applicable best practices, do no harm and are of high relevance for the attainment of the **UN’s 2030 Sustainable Development Goals (SDGs)**.

The **Alliance** prioritizes technical and financial support for deployment of Digital Public Goods in areas such as **Health**, **Job Skills**, **Financial services & Inclusion **and** Foundational literacy & other learning resources and platforms**.

**Why is OpenMined a Digital Public Good?**

**OpenMined** is an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies.

Our mission at OpenMined community is to create an accessible ecosystem of tools for private, secure, multi-owner governed AI. We do this by extending popular libraries like TensorFlow and PyTorch with advanced techniques in cryptography and private machine learning.

**OpenMined’s work is relevant to the Advocacy, Big Data, Data Management & Policy, Data Security, Democracy, Digital Development, Governance, Research, Resource Management and Sustainability sectors that are in alignment with Digital Public Goods!**

**Our mission at OpenMined is to make the world more privacy-preserving by lowering the barrier-to-entry to privacy-preserving technologies through free, open-source software and education.**

**Today, we’re very excited to announce our partnership with GENBU AI for open-source collaboration via the OpenMined libraries and through participation in the architectural decisions**

**Our mission at OpenMined is to make the world more privacy-preserving by lowering the barrier-to-entry to privacy-preserving technologies through free, open-source software and education.**

**Today, we’re very excited to announce our partnership with GENBU AI for open-source collaboration via the OpenMined libraries and through participation in the architectural decisions that make it easier to deploy edge federated learning at scale.**

GENBU is developing cloud infrastructure for federated learning on edge compute, making it possible to deploy federated learning across industries from automotive to agriculture. Their CEO Rima Al Shikh is embedded in our “strike team” where she supports OpenMined developers by contributing to the design of distributed computing. We’d like to specifically thank Tudor Cebere, George Muraru, Jason Paumier, Andrew Trask, Madhava Jay, Radu Stochitoiu, Ionesior Junior, and Vova Manannikov for their development work supporting this project!

Like OpenMined, GENBU believes that AI has the power to change the world. Our goal in this partnership is to help them in their commitment to responsible economic development via AI application deployment across industries and borders. Privacy technologies, such as the PySyft, PyGrid, and PyDP libraries from the OpenMined community, make it possible to harness AI’s potential in economic deployment by enabling model training on the necessary private data.

OpenMined is proud of the diversity of our community, and GENBU’s COO TJ Misra is working with our Laura Ayre from the OpenMined Partnerships Team to amplify the voices of and opportunities for historically underrepresented demographics in science, engineering, mathematics, and privacy technology.

We will meet with GENBU regularly, improve our codebase based on their requests, introduce them to potential employees, contractors, partners, and customers within our community, and of course, raise awareness of achievements we make together.

]]>There has been an extraordinary amount of hype attributed to Open AI’s recent GPT-3 API launch —it is very well-deserved.

GPT-3 is a remarkable natural language processing (NLP) 175 billion parameter model trained on the web corpus. It has led to a broad swath of intriguing text generation examples

]]>There has been an extraordinary amount of hype attributed to Open AI’s recent GPT-3 API launch —it is very well-deserved.

GPT-3 is a remarkable natural language processing (NLP) 175 billion parameter model trained on the web corpus. It has led to a broad swath of intriguing text generation examples **-** see a few examples here. We may be entering a decade of incredible NLP impact.

The web is the perfect initial training ground for text generation — so much of our lives, esp in these Covid times, are being digitized there. This incredibly large corpus is showing the potential to generate realistic text summaries, establish basic Q&A systems, write legal contracts, pre-write articles/stories/content for editing, and so much more.

**But some of the most interesting and important use cases of natural language processing will require training on our sensitive data beyond the public web corpus.** Whether it be related to medical assistance, therapy, or simply how we interact in our private lives, powerful applications for GPT-3 will arise that will require training on sensitive data to achieve. In order to get that data, we will need robust privacy preserving data science.

Let’s take the example of building an interactive nurse AI, which ideally would help anyone on the planet get meaningful, medical help with just a smartphone. Certainly, there are a lot of medical help sites, nurse FAQs, etc. on the web which can be leveraged as a starting corpus to build such a starter assistant. Such an ‘interactive nurse AI ‘might be a new user interface improvement, but quality-wise it wouldn’t be that much better than our current search and webpage FAQ approach.

The true magic of our nurses naturally lies in walls of hospitals, doctor offices, and medical clinics. They have ‘trained-on’ years of their specific, precious longtail patient interaction and knowledge of critical medical issues into protected servers which meet compliance regulations. Perhaps that might involve potentially noticing that a slightly slurred speech for a diabetic patient with certain characteristics as an early warning sign for a specific type of stroke. These nuanced details aren’t always recorded online in public web corpora. **For an NLP ‘nurse assistant’, therapist, ****etc.**** to be superior to our current search and webpage FAQ approach, the training data likely also needs to come from these sensitive sources.**

Fortunately, there are ways to train on such sensitive data securely. This is the mission of our OpenMined community - answering questions with “data you cannot see”. We are providing open-source libraries for reliable, safe, and robust privacy preserving techniques. These include libraries for federated learning, secure multi-party computation, differential privacy, homomorphic encryption, private set intersection, and many more.

You can learn more about privacy preserving techniques in this blog series, and learn more about OpenMined at openmined.org. OpenMined is a community of 8,600+ engineers, researchers, marketers, and hackers dedicated to lowering the barrier-to-entry to private AI technologies through open-source code and free education.

To learn more about the state of privacy tech, join us at our OpenMined Privacy Conference PriCon - a free event organized by the OpenMined community covering all aspects of privacy-related technology research, deployments and issues. If you’d like to sponsor our conference - each donation directly sponsors a grant for an open-source developer - you can do so here, or email partnerships@openmined.org!