Naive Bayes:

Based on Bayes Theorem, Naive Bayes methods are probabilistic models which are used for classification. They are highly useful when dimensionality of the dataset is high.

Bayes Theorem:

\begin{align}P(A|B) = \frac{P(B|A) * P(A)}{P(B)}\end{align}

Using Bayes Theorem, we can find the probability of an event A occurring, given that an event B has already occurred. Here, we consider a “naive” assumption that events A and B are independent of each other. This assumption is held for all feature vectors that we consider.

So to calculate the probability of a given variable y, where we are provided with feature vectors x1 to xn, the Bayes Theorem can be applied as:

\begin{align}P(y|x_1, x_2,.., x_n) = \frac{P(x_1, x_2,.., x_n | y) * P(y)}{P(x_1, x_2,.., x_n)}\end{align}

Depending upon the type of data we have, we can apply the following Theorem:

  1. Gaussian Naive Bayes
  2. Multinomial Naive Bayes (for multinomial data)
  3. Complement Naive Bayes (for imbalance multinomial data)
  4. Bernoulli Naive Bayes (for boolean-valued data)
  5. Categorical Naive Bayes (for categorically distributed data)

In the following example, we will look at Gaussian Naive Bayes.

Problem with Standard Naive Bayes

We can use Naive Bayes only on categorical data, i.e. if we do not want to go ahead and bucket our data into categories. At such times, it is useful to implement Gaussian Naive Bayes since it works on dataset with continuous values.

Gaussian Naive Bayes

Here, we assume the feature set to be Gaussian in nature:
\begin{align}P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)\end{align}

Gaussian Naive Bayes helps us to deal with continuous data. In our dataset, if we have data which is distributed in normal(or Gaussian) form, we segregate the data according to their respective class values. Then, we calculate their mean and variance which then help us to further calculate the probability value of that particular attribute.

Code Implementation:

# import required packages
import torch
import syft as sy

Now we will create a virtual workers named bob, alice and bill.

# create a hook
hook = sy.TorchHook(torch)

# create a worker
bob = sy.VirtualWorker(hook, id="bob")
alice = sy.VirtualWorker(hook, id="alice")
bill = sy.VirtualWorker(hook, id="bill")

For this code walk-through, we will generate some data and send it to bob and alice, while bill would be the crypto provider.

# a random dataset 
data = torch.tensor([[3.393533211, 2.331273381],
					[3.110073483, 1.781539638],
					[1.343808831, 3.368360954],
					[3.582294042, 4.67917911],
					[2.280362439, 2.866990263],
					[7.423436942, 4.696522875],
					[5.745051997, 3.533989803],
					[9.172168622, 2.511101045],
					[7.792783481, 3.424088941],
					[7.939820817, 0.791637231]])
                    
# class values of the dataset
target = torch.tensor([[0],[0],[0],[0],[0],[1],[1],[1],[1],[1]])

# send the data and target labes to the workers
data = data.fix_precision().share(bob, alice, crypto_provider=bill)
target = target.fix_precision().share(bob, alice, crypto_provider=bill)

Following functions will help us to calculate the statistics of our dataset which we require to further calculate their probability value.

# calculate mean of list
def mean(numbers):
    s = sum(numbers) / len(numbers)
    return s

# calculate standard deviation of list
def stddev(numbers):
    num_copy = numbers
    avg = mean(numbers)

    for n in range(len(num_copy)):
        num_copy[n] = (num_copy[n] - avg) * (num_copy[n] - avg)

    variance = sum(num_copy) / (len(num_copy) - 1)

    std = torch.sqrt(variance.get().float_precision())
    std = std.fix_precision().share(bob, alice, crypto_provider=bill)

    return std

# calculate stats(mean, stddev, total) for each attribute of the dataset
def summarize_dataset(rows):
    numAttributes = len(rows[0])
    summaries = []

    for n in range(numAttributes):
        elements = []
        for r in range(len(rows)):
            elements.append(rows[r][n])
        m = mean(elements)
        s = stddev(elements)
        l = torch.tensor([len(elements)]).fix_precision().share(bob, alice, crypto_provider=bill)
        summaries.append([m, s, l])
    
    return summaries

As our labels are encrypted as well, we will define a function to store unique values from the labels.

# generate list of labels for our datset
def getLabels(target):
    labels = []

    for t in range(len(target)):
        same = 0
        for l in range(len(labels)):
            same = (target[t] == labels[l]).get()
            if same:
                break
        if not same:
            labels.append(target[t])
    
    return labels

Now we will segregate the attributes according to their class values. For that, we will create a dictionary called  separated and store the attributes with the key being their class value.

# separate the dataset according to the class values
separated = dict()
labels = getLabels(target)

# initialize the labels as keys for 
# separated dictionary
for label in labels:
    separated[label] = list()

# loop over the rows of the dataset and apped the rows to the dictionary
# according to their class values
for i in range(len(target)):
    for l in range(len(labels)):
        same = target[i] == labels[l]
        if same.get():
            separated[labels[l]].append(data[i])

In the following code, we calculate the statistics of our dataset using the functions we defined earlier and store these values in the form of a dictionary.

# initialize the stats dictionary and
# calculate the stats for the values in
# the separated dictionary
summaries = dict()

for class_value, rows in separated.items():
    for l in range(len(labels)):
        same = class_value == labels[l]
        if same.get():
            summaries[labels[l]] = summarize_dataset(rows)

As the equation to calculate probability has some constant values, hence we encrypt and send these values to bob and alice to avoid any issues during division.

# import pi value and squareroot function from math library
from math import pi, sqrt

# encrypt the constants and send their values to virtual workers
sq_pi = sqrt(2 * pi)
sq_pi = torch.tensor([sqrt(2 * pi)]).fix_precision().share(bob, alice, crypto_provider=bill)
one = torch.tensor([1]).fix_precision().share(bob, alice, crypto_provider=bill)

We will define a function to calculate the gaussian probability.

# Calculate the Gaussian probability distribution function for given row
def calculate_probability(x, mean, stdev):
    numerator = (x-mean)**2
    denominator = 2 * (stdev**2) 
    exponent = torch.exp(-(numerator / denominator))
    p = ((one / (sq_pi * stdev)) * exponent)
    
    return p

Now comes our main function, where we implement Gaussian Naive Bayes using all the functions which we defined above.

# Calculate the probabilities of predicting each class for a given row
def calculate_class_probabilities(summaries, row):
    total_rows = len(target)

    probabilities = dict()

    for class_value, class_summaries in summaries.items():
        probabilities[class_value] = summaries[class_value][0][2] / total_rows
        for i in range(len(class_summaries)):
            mean, stdev, _ = class_summaries[i]
            probabilities[class_value] *= calculate_probability(row[i], mean, stdev)
    return probabilities

We can test our code using any values. For simplicity, we are using values from the dataset.

test = torch.tensor([3.393533211, 2.331273381]).fix_precision().share(bob, alice, crypto_provider=bill)

Finally, we can calculate the probabilities for the class values by passing our test tensor in the calculate_class_probabilities() function along with the summaries dictionary which we calculated for our dataset.

# calculate probabilities of every class for a given row
prob = calculate_class_probabilities(summaries, test)

for k, v in prob.items():
    print(k.get().float_precision())
    print(v.get().float_precision())