Update as of November 18, 2021: The version of PySyft mentioned in this post has been deprecated. Any implementations using this older version of PySyft are unlikely to work. Stay tuned for the release of PySyft 0.6.0, a data centric library for use in production targeted for release in early December.
This is a summary of Duet Tutorial by Andrew Trask which was presented at OpenMined Privacy Conference 2020.
Brief intro to federated learning and its limitations
According to Wikipedia, federated learning (also known as collaborative learning) is a Machine Learning (“ML”) technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples without moving them. This approach stands in contrast to traditional centralized machine learning techniques where all the local datasets are uploaded to one server, as well as in contrast to the more classical decentralized approaches, which often assume that local data samples are identically distributed.
In simpler terms, federated learning is an approach to training ML models without compromising the privacy of data owners. Since the data never leaves the owner’s device, privacy is retained. However, in practice, privacy may still be compromised. The model weights that are transmitted during the training process can still potentially be used to learn something about the training data. In this approach, data owners only have a binary choice. They can either participate in the training process or opt-out. There’s no in-between that allows the data owners granular control over what data they share and how much they share if they decide to participate.
Duet and its novel approach to remote Data Science
Duet is a novel approach that moves away from the central server management ideology in federated learning. It focuses on providing a coordination mechanism between data owners and data scientists where the goal of data scientists is to perform data analysis and train models on data they cannot see. Unlike past approaches, Duet provides data owners granular control over their data and ensures effective collaboration between the two parties. In essence, data owners can not only decide whether they want to participate in the training process but also choose what operations the data scientists can perform on the data.
Code walkthrough
Let’s walk through some basic code examples to understand how Duet can be leveraged for Remote Data Science.
Note: All code snippets that are to be executed by data owners will be labeled as “Owner” and snippets that are to be executed by data scientists will be labeled as “Scientist”.
Part 1 - Setting up a Duet Connection
(Owner) - Launch the Duet server on the owner’s machine
import syft as sy
duet = sy.launch_duet()
You can see the shift in control right from the get-go. Unlike traditional federated learning approaches, in Duet, the owner initiates the connection. The scientist cannot access anything unless the owner invites them to a collaborative training session.
Initiating a Duet session will generate a session ID that needs to be shared with the scientist to join the session.
Let’s suppose the session ID is xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy
(Scientist) - Use the session ID to join the Duet session
import syft as sy
duet = sy.join_duet("xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy")
On execution, a client ID will be returned that the owner needs to enter into the prompt at their end. This would finally connect the two parties.
Part 2 - Setting up Model for remote training
Most of the work here is to be done by the scientist.
(Scientist) - Create a simple convolution network. Notice here that we do this just like PyTorch with two crucial differences
- We inherit from
sy.Module
instead ofnn.Module
- We need to pass in a variable called
torch_ref
which we will use internally for any calls that would normally be totorch
class SyNet(sy.Module):
def __init__(self, torch_ref):
super(SyNet, self).__init__(torch_ref=torch_ref)
self.conv1 = self.torch_ref.nn.Conv2d(1, 32, 3, 1)
self.conv2 = self.torch_ref.nn.Conv2d(32, 64, 3, 1)
self.dropout = self.torch_ref.nn.Dropout2d(0.25)
self.fc1 = self.torch_ref.nn.Linear(9216, 128)
self.fc2 = self.torch_ref.nn.Linear(128, 10)
def forward(self, x):
x = self.torch_ref.nn.functional.relu(self.conv1(x))
x = self.torch_ref.nn.functional.relu(self.conv2(x))
x = self.torch_ref.nn.functional.max_pool2d(x, 2)
x = self.dropout(x)
x = self.torch_ref.flatten(x, 1)
x = self.torch_ref.nn.functional.relu(self.fc1(x))
x = self.fc2(self.dropout(x))
output = self.torch_ref.nn.functional.log_softmax(x, dim=1)
return output
(Scientist) - Create a local model using local copies of torch
and torchvision
import torch
import torchvision
local_model = SyNet(torch)
local_transform_1 = torchvision.transforms.ToTensor()
local_transform_2 = torchvision.transforms.Normalize(0.1307, 0.3081)
local_transforms = torchvision.transforms.Compose([local_transform_1, local_transform_2])
args = {"batch_size": 64,
"test_batch_size": 1000,
"epochs": 14,
"lr": 1.0,
"gamma": 0.7,
"no_cuda": False,
"seed": 42, # the meaning of life
"log_interval": 10,
"save_model": True,}
test_data = torchvision.datasets.MNIST('../data', train=False, download=True, transform=local_transforms)
test_loader = torch.utils.data.DataLoader(test_data,args["test_batch_size"])
test_data_length = len(test_loader.dataset)
Part 3 - Sending the local model to the Duet partner
Once the scientist has initialized the model, they can send it to their Duet partner for training. They can also request the Duet partner to provide details such as availability of GPU via the Duet API.
(Scientist) - Request details of GPU availability from the owner.
# send the model to data owner's machine
remote_model = local_model.send(duet)
# lets ask the Data Owner if their Machine has CUDA
has_cuda = False
has_cuda_ptr = remote_torch.cuda.is_available()
has_cuda = bool(has_cuda_ptr.get(request_block=True,
name="cuda_is_available",
reason="To run test and inference locally",
timeout_secs=5,))
(Owner) - At their end, the owner can see all available requests using duet.requests.pandas
. They can manually approve or deny requests using accept()
or deny()
.
For example, first request in the queue can be approved as follows duet.requests[0].accept()
Alternatively, the owner can predefine handlers that’ll approve or deny the requests automatically
duet.requests.add_handler(name="cuda_is_available", action="accept")
duet.requests.add_handler(
name="loss",
action="deny",
timeout_secs=-1, # no timeout
print_local=True # print the result in your notebook
)
duet.requests.add_handler(name="train_size", action="accept")
Notice here that the owner has added a handler for loss
that'll deny the request. Keep this in mind, as we'll come back to this later.
Note - name
argument for requests sent by the scientist and handlers added by the owners need to match, otherwise the handlers would fail to work. So, both parties need to engage in open and honest communication otherwise the training session would fail.
Part 4 - Setting up the training process
(Scientist) - Define the train()
function for the remote model on the owner’s device
def train(remote_model, torch_ref, train_loader, optimizer, epoch, args, train_data_length):
train_batches = round((train_data_length / args["batch_size"]) + 0.5)
if remote_model.is_local:
print("Training requires remote remote_model")
return
remote_model.train()
for batch_idx, data in enumerate(train_loader):
data_ptr, target_ptr = data[0], data[1]
optimizer.zero_grad()
output = remote_model(data_ptr)
loss = torch_ref.nn.functional.nll_loss(output, target_ptr)
loss.backward()
optimizer.step()
loss_item = loss.item()
train_loss = duet.python.Float(0) # create a remote Float we can use for summation
train_loss += loss_item
if batch_idx % args["log_interval"] == 0:
local_loss = None
local_loss = loss_item.get(name="loss",
reason="To evaluate training progress",
request_block=True,
timeout_secs=5)
if local_loss is not None:
print("Train Epoch: {} {} {:.4}".format(epoch, batch_idx, local_loss))
else:
print("Train Epoch: {} {} ?".format(epoch, batch_idx))
Well, the code here certainly looks overwhelming. So let’s break it down piece-by-piece.
- First the
is_local
attribute of the model is checked to ensure that the model passed to the function is a remote model, not a local one. - Like a standard training operation, we fetch the data, pass it through the model, and calculate the loss. However, notice the distinction here. We use the remote torch library as indicated by
torch_ref
used in the snippet above. - Since operations are executed on the remote machine, we do not have access to any training information unless we explicitly request permission for it. Let’s say, we want to monitor the training loss, we request the owner’s permission using the Duet API.
We are almost done here. Now, if you go back to Part 3 of this article you’ll notice that the owner had already added a handler to reject the scientist’s request for viewing training loss.
Isn’t this wonderful? The owner has full control over what data is accessible to the other Duet party.
Note - I have skipped out some of the code and parts of the model training process for the sake of keeping the article concise. However, I’ve covered all the concepts that you need to know to get started with Duet.
For full training and inference code, check out the sample Duet notebooks.
Disclaimer - Duet is still in development and the API may change in future releases. Please keep an eye on the official PySyft repo.