Update as of November 18, 2021: The version of PySyft mentioned in this post has been deprecated. Any implementations using this older version of PySyft are unlikely to work. Stay tuned for the release of PySyft 0.6.0, a data centric library for use in production targeted for release in early December.

Note: This post first appeared on the RStudio AI Blog, formerly RStudio TensorFlow Blog, dedicated to all things deep learning, probabilistic modeling and distributed computing from R. The original audience is assumed to be familiar with the R packages tensorflow and keras, designed to allow for designing and training TensorFlow/Keras models in an idiomatic, "R-like" way, as well as reticulate, that ingenious helper that lets us incorporate Python functionality directly into R.

In a nutshell

Deep learning need not be irreconcilable with privacy protection. Federated learning enables on-device, distributed model training; encryption keeps model and gradient updates private; differential privacy prevents the training data from leaking. As of today, private and secure deep learning is an emerging technology. In this post, we introduce Syft, an open-source framework that integrates with PyTorch as well as TensorFlow. In an example use case, we obtain private predictions from a Keras model.

The word privacy, in the context of deep learning (or machine learning, or “AI”), and especially when combined with things like security, sounds like it could be part of a catch phrase: privacy, safety, security – like liberté, fraternité, égalité. In fact, there should probably be a mantra like that. But that’s another topic, and like with the other catch phrase just cited, not everyone interprets these terms in the same way.

So let’s think about privacy, narrowed down to its role in training or using deep learning models, in a more technical way. Since privacy – or rather, its violations – may appear in various ways, different violations will demand different countermeasures. Of course, in the end, we’d like to see them all integrated – but re privacy-related technologies, the field is really just starting out on a journey. The most important thing we can do, then, is to learn about the concepts, investigate the landscape of implementations under development, and – perhaps – decide to join the effort.

This post tries to do a tiny little bit of all of those.

Aspects of privacy in deep learning

Say you work at a hospital, and would be interested in training a deep learning model to help diagnose some disease from brain scans. Where you work, you don’t have many patients with this disease; moreover, they tend to mostly be affected by the same subtypes: Your training set, were you to create one, would not reflect the overall distribution very well. It would, thus, make sense to cooperate with other hospitals; but that isn’t so easy, as the data collected is protected by privacy regulations. So, the first requirement is: The data has to stay where it is; e.g., it may not be sent to a central server.

Federated learning

This first sine qua non is addressed by federated learning [1]. Federated learning is not “just” desirable for privacy reasons. On the contrary, in many use cases, it may be the only viable way (like with smartphones or sensors, which collect gigantic amounts of data). In federated learning, each participant receives a copy of the model, trains on their own data, and sends back the gradients obtained to the central server, where gradients are averaged and applied to the model.

This is good insofar as the data never leaves the individual devices; however, a lot of information can still be extracted from plain-text gradients. Imagine a smartphone app that provides trainable auto-completion for text messages. Even if gradient updates from many iterations are averaged, their distributions will greatly vary between individuals. Some form of encryption is needed. But then how is the server going to make sense of the encrypted gradients?

One way to accomplish this relies on secure multi-party computation (SMPC).

Secure multi-party computation

In SMPC, we need a system of several agents who collaborate to provide a result no single agent could provide alone: “normal” computations (like addition, multiplication …) on “secret” (encrypted) data. The assumption is that these agents are “honest but curious” – honest, because they won’t tamper with their share of data; curious in the sense that if they were (curious, that is), they wouldn’t be able to inspect the data because it’s encrypted.

The principle behind this is secret sharing. A single piece of data – a salary, say – is “split up” into meaningless (hence, encrypted) parts which, when put together again, yield the original data. Here is an example.

Say the parties involved are Julia, Greg, and me. The below function encrypts a single value, assigning to each of us their “meaningless” share:

# a big prime number
# all computations are performed in a finite field, for example, the integers modulo that prime
Q <- 78090573363827
 
encrypt <- function(x) {
  # all but the very last share are random 
  julias <- runif(1, min = -Q, max = Q)
  gregs <- runif(1, min = -Q, max = Q)
  mine <- (x - julias - gregs) %% Q
  list (julias, gregs, mine)
}

# some top secret value no-one may get to see
value <- 77777

encrypted <- encrypt(value)
encrypted
[[1]]
[1] 7467283737857

[[2]]
[1] 36307804406429

[[3]]
[1] 34315485297318

Once the three of us put our shares together, getting back the plain value is straightforward:

decrypt <- function(shares) {
  Reduce(sum, shares) %% Q  
}

decrypt(encrypted)
77777

As an example of how to compute on encrypted data, here’s addition. (Other operations will be a lot less straightforward.) To add two numbers, just have everyone add their respective shares:

add <- function(x, y) {
  list(
    # julia
    (x[[1]] + y[[1]]) %% Q,
    # greg
    (x[[2]] + y[[2]]) %% Q,
    # me
    (x[[3]] + y[[3]]) %% Q
  )
}
  
x <- encrypt(11)
y <- encrypt(122)

decrypt(add(x, y))
133

Back to the setting of deep learning and the current task to be solved: Have the server apply gradient updates without ever seeing them. With secret sharing, it would work like this:

Julia, Greg and me each want to train on our own private data. Together, we will be responsible for gradient averaging, that is, we’ll form a cluster of workers united in that task. Now, the model owner secret shares the model, and we start training, each on their own data. After some number of iterations, we use secure averaging to combine our respective gradients. Then, all the server gets to see is the mean gradient, and there is no way to determine our respective contributions.

Beyond private gradients

Amazingly, it is even possible to train on encrypted data – amongst others, using that same technique of secret sharing. Of course, this has to negatively affect training speed. But it’s good to know that if one’s use case were to demand it, it would be feasible. (One possible use case is when training on one party’s data alone doesn’t make any sense, but data is sensitive, so others won’t let you access their data unless encrypted.)

So with encryption available on an all-you-need basis, are we completely safe, privacy-wise? The answer is no. The model can still leak information. For example, in some cases it is possible to perform model inversion [2], that is, with just black-box access to a model, train an attack model that allows reconstructing some of the original training data. Needless to say, this kind of leakage has to be avoided. Differential privacy [3] [4] demands that results obtained from querying a model be independent from the presence or absence, in the dataset employed for training, of a single individual. In general, this is ensured by adding noise to the answer to every query. In training deep learning models, we add noise to the gradients, as well as clip them according to some chosen norm.

At some point, then, we will want all of those in combination: federated learning, encryption, and differential privacy.

Syft is a very promising, very actively developed framework that aims for providing all of them. Instead of “aims for”, I should perhaps have written “provides” – it depends. We need some more context.

Introducing Syft

Syft – also known as PySyft, since as of today, its most mature implementation is written in and for Python – is maintained by OpenMined, an open source community dedicated to enabling privacy-preserving AI. It’s worth it reproducing their mission statement here:

Industry standard tools for artificial intelligence have been designed with several assumptions: data is centralized into a single compute cluster, the cluster exists in a secure cloud, and the resulting models will be owned by a central authority. We envision a world in which we are not restricted to this scenario - a world in which AI tools treat privacy, security, and multi-owner governance as first class citizens. […] The mission of the OpenMined community is to create an accessible ecosystem of tools for private, secure, multi-owner governed AI.

While far from being the only one, PySyft is their most maturely developed framework. Its role is to provide secure federated learning, including encryption and differential privacy. For deep learning, it relies on existing frameworks.

PyTorch integration seems the most mature, as of today; with PyTorch, encrypted and differentially private training are already available. Integration with TensorFlow is a bit more involved; it does not yet include TensorFlow Federated and TensorFlow Privacy. For encryption, it relies on TensorFlow Encrypted (TFE), which as of this writing is not an official TensorFlow subproject.

However, even now it is already possible to secret share Keras models and administer private predictions. Let’s see how.

Private predictions with Syft, TensorFlow Encrypted and Keras

Our introductory example will show how to use an externally-provided model to classify private data – without the model owner ever seeing that data, and without the user ever getting hold of (e.g., downloading) the model. (Think about the model owner wanting to keep the fruits of their labour hidden, as well.)

Put differently: The model is encrypted, and the data is, too. As you might imagine, this involves a cluster of agents, together performing secure multi-party computation.

This use case presupposing an already trained model, we start by quickly creating one. There is nothing special going on here.

Prelude: Train a simple model on MNIST

# create_model.R

library(tensorflow)
library(keras)

mnist <- dataset_mnist()
mnist$train$x <- mnist$train$x/255
mnist$test$x <- mnist$test$x/255

dim(mnist$train$x) <- c(dim(mnist$train$x), 1)
dim(mnist$test$x) <- c(dim(mnist$test$x), 1)

input_shape <- c(28, 28, 1)

model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 16, kernel_size = c(3, 3), input_shape = input_shape) %>%
  layer_average_pooling_2d(pool_size = c(2, 2)) %>%
  layer_activation("relu") %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3)) %>%
  layer_average_pooling_2d(pool_size = c(2, 2)) %>%
  layer_activation("relu") %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3)) %>%
  layer_average_pooling_2d(pool_size = c(2, 2)) %>%
  layer_activation("relu") %>%
  layer_flatten() %>%
  layer_dense(units = 10, activation = "linear")
  

model %>% compile(
  loss = "sparse_categorical_crossentropy",
  optimizer = "adam",
  metrics = "accuracy"
)

model %>% fit(
    x = mnist$train$x,
    y = mnist$train$y,
    epochs = 1,
    validation_split = 0.3,
    verbose = 2
)

model$save(filepath = "model.hdf5")

Set up cluster and serve model

The easiest way to get all required packages is to install the ensemble OpenMined put together for their Udacity Course that introduces federated learning and differential privacy with PySyft. This will install TensorFlow 1.15 and TensorFlow Encrypted, amongst others.

The following lines of code should all be put together in a single file. I found it practical to “source” this script from an R process running in a console tab.

To begin, we again define the model, two things being different now. First, for technical reasons, we need to pass in batch_input_shape instead of input_shape. Second, the final layer is “missing” the softmax activation. This is not an oversight – SMPC softmax has not been implemented yet. (Depending on when you read this, that statement may no longer be true.) Were we training this model in secret sharing mode, this would of course be a problem; for classification though, all we care about is the maximum score.

After model definition, we load the actual weights from the model we trained in the previous step. Then, the action begins. We create an ensemble of TFE workers that together run a distributed TensorFlow cluster. The model is secret shared with the workers, that is, model weights are split up into shares that, each inspected alone, are unusable. Finally, the model is served, i.e., made available to clients requesting predictions.

How can a Keras model be shared and served? These are not methods provided by Keras itself. The magic comes from Syft hooking into Keras, extending the model object: cf. hook <- sy$KerasHook(tf$keras) right after we import Syft.

# serve.R
# you could start R on the console and "source" this file

# do this just once
reticulate::py_install("syft[udacity]")

library(tensorflow)
library(keras)

sy <- reticulate::import(("syft"))
hook <- sy$KerasHook(tf$keras)

batch_input_shape <- c(1, 28, 28, 1)

model <- keras_model_sequential() %>%
 layer_conv_2d(filters = 16, kernel_size = c(3, 3), batch_input_shape = batch_input_shape) %>%
 layer_average_pooling_2d(pool_size = c(2, 2)) %>%
 layer_activation("relu") %>%
 layer_conv_2d(filters = 32, kernel_size = c(3, 3)) %>%
 layer_average_pooling_2d(pool_size = c(2, 2)) %>%
 layer_activation("relu") %>%
 layer_conv_2d(filters = 64, kernel_size = c(3, 3)) %>%
 layer_average_pooling_2d(pool_size = c(2, 2)) %>%
 layer_activation("relu") %>%
 layer_flatten() %>%
 layer_dense(units = 10) 
 
pre_trained_weights <- "model.hdf5"
model$load_weights(pre_trained_weights)

# create and start TFE cluster
AUTO <- TRUE
julia <- sy$TFEWorker(host = 'localhost:4000', auto_managed = AUTO)
greg <- sy$TFEWorker(host = 'localhost:4001', auto_managed = AUTO)
me <- sy$TFEWorker(host = 'localhost:4002', auto_managed = AUTO)
cluster <- sy$TFECluster(julia, greg, me)
cluster$start()

# split up model weights into shares 
model$share(cluster)

# serve model (limiting number of requests)
model$serve(num_requests = 3L)

Once the desired number of requests have been served, we can go to this R process, stop model sharing, and shut down the cluster:

# stop model sharing
model$stop()

# stop cluster
cluster$stop()

Now, on to the client(s).

Request predictions on private data

In our example, we have one client. The client is a TFE worker, just like the agents that make up the cluster.

We define the cluster here, client-side, as well; create the client; and connect the client to the model. This will set up a queueing server that takes care of secret sharing all input data before submitting them for prediction.

Finally, we have the client asking for classification of the first three MNIST images.

With the server running in some different R process, we can conveniently run this in RStudio:

# client.R

library(tensorflow)
library(keras)

sy <- reticulate::import(("syft"))
hook <- sy$KerasHook(tf$keras)

mnist <- dataset_mnist()
mnist$train$x <- mnist$train$x/255
mnist$test$x <- mnist$test$x/255

dim(mnist$train$x) <- c(dim(mnist$train$x), 1)
dim(mnist$test$x) <- c(dim(mnist$test$x), 1)

batch_input_shape <- c(1, 28, 28, 1)
batch_output_shape <- c(1, 10)

# define the same TFE cluster
AUTO <- TRUE
julia <- sy$TFEWorker(host = 'localhost:4000', auto_managed = AUTO)
greg <- sy$TFEWorker(host = 'localhost:4001', auto_managed = AUTO)
me <- sy$TFEWorker(host = 'localhost:4002', auto_managed = AUTO)
cluster <- sy$TFECluster(julia, greg, me)

# create the client
client <- sy$TFEWorker()

# create a queueing server on the client that secret shares the data 
# before submitting a prediction request
client$connect_to_model(batch_input_shape, batch_output_shape, cluster)

num_tests <- 3
images <- mnist$test$x[1: num_tests, , , , drop = FALSE]
expected_labels <- mnist$test$y[1: num_tests]

for (i in 1:num_tests) {
  res <- client$query_model(images[i, , , , drop = FALSE])
  predicted_label <- which.max(res) - 1
  cat("Actual: ", expected_labels[i], ", predicted: ", predicted_label)
}
Actual:  7 , predicted:  7 
Actual:  2 , predicted:  2 
Actual:  1 , predicted:  1 

There we go. Both model and data did remain secret, yet we were able to classify our data.

Let’s wrap up.

Conclusion

Our example use case has not been too ambitious – we started with a trained model, thus leaving aside federated learning. Keeping the setup simple, we were able to focus on underlying principles: Secret sharing as a means of encryption, and setting up a Syft/TFE cluster of workers that together, provide the infrastructure for encrypting model weights as well as client data.

In case you’ve read our previous post on TensorFlow Federated – that, too, a framework under development – you may have gotten an impression similar to the one I got: Setting up Syft was a lot more straightforward, concepts were easy to grasp, and surprisingly little code was required. As we may gather from a recent blog post, integration of Syft with TensorFlow Federated and TensorFlow Privacy are on the roadmap. I am looking forward a lot for this to happen.

Thanks for reading!