>Business >Developing Safe AI Pt.1

Developing Safe AI Pt.1

In this blog article by AICoreSpot, we’re all set to undertake training of a neural network that is completely encrypted during training (training on unencrypted data). The outcome will be a neural network with two advantageous attributes. To start with, the neural network’s intelligence is safeguarded from individuals who might wish to compromise it, enabling valuable AIs to receive training in insecure environments with no danger of theft of their intelligence. Following that, the network can only render encrypted predictions – which we presume have no influence on the external world as the external world cannot comprehend the predictions with no secret key; this develops a valuable power imbalance between a user and a superintelligence. If the artificial intelligence receives homomorphic encryption, them from its viewpoint, the entirety of the external world is also homomorphically encrypted. A human handles the secret key and has the choice to either unlock the AI itself (unleashing it on the world) or only individual predictions the AI makes (appears to be safer). 


Many individuals are worried that super capable artificial intelligences will one day turn against us human beings. Recently, Stephen Hawking expressed the need for a new world government to monitor the capabilities we impart to Artificial Intelligence so it doesn’t go rogue against us. These are pretty strong comments, and they can be viewed as being reflective of the public concern shared amongst both the research community and the planet as a whole. In this article, we’d like provide a tutorial on a possible technical solution to this issue with some illustrative code to demonstrate the strategy. 

The objective is pretty simple and straightforward. We desire to develop AI technologies that can become very intelligent, intelligent enough to eradicate cancer, put a full stop to world hunger, etc. but whose brain, for lack of a better word, is handled by a human being possessing a key, so that the application of intelligence is restricted. Limitless learning is amazing, but unrestricted application of that knowledge is possibly hazardous. 

To put forth this concept, we’ll quickly detail two very thrilling domains of research: Deep Learning and Homomorphic Encryption. 

Part 1 – Deep Learning Explained 

Deep learning is a set of utilities for the automating of intelligence, mainly utilizing neural networks. As a subdomain of computer science, it’s primarily accountable for the latest explosion in AI technologies as it has smashed prior quality records for several intelligence activities. For some context, it played a huge role in DeepMind’s AlphaGo framework that just defeated the World Champion Go player, Lee Sedol.  

How does a neural network go about learning? 

A neural network renders forecasts on the basis of input. It learns to do this essentially by leveraging a trial-and-error process. It starts by making a forecast, which is mostly arbitrary at first, and then obtains an “error signal” demonstrating that it forecasted too much or too low – typically probabilities. Following the repetition of this cycle several millions of times, the network begins piecing stuff together.  

The thing that is most noteworthy here, the item that we should stick a pin on, is the error signal. Without obtaining feedback on the precision of its forecasts, it cannot go about learning. This is an item of criticality to keep in mind. 

Part 2 – Homomorphic Encryption Explained 

As the name indicates, homomorphic encryption is a variant of encryption. In the asymmetric scenario, it can take perfect readable text and convert it into nonsense leveraging a “public key”. More critically, it can then take that supposed nonsense and covert it back to the original text leveraging a “secret key”. However, until you have the “secret key” you cannot interpret the nonsense, theoretically. 

Homomorphic encryption is a specialized variant of encryption. It enables individuals to alter the encrypted data is particular ways without being able to read that data. For instance, homomorphic encryption can executed on numbers so that multiplication and addition can be executed on encrypted values without the need to decrypt them. Here are a few toy instances: 

Currently, there are an increasing number of homomorphic encryption schemes, every one with differing attributes. It’s a comparatively nascent domain and there are various considerable issues still being worked on and studied, but we’ll swerve back to that point at a later time. 

To start with, let’s just begin with the following: integer public key encryption schemes that tend to be homomorphic over multiplication and addition can execute the operations in the image above. Further, as the public key facilitates “one-way” encryption, you can even execute operations between unencrypted numbers and encrypted numbers – through encrypting them in a one-way fashion, as illustrated above by 2*Cypher A. A few encryption schemes don’t even need that, but we swerve back to that point later, as well. 

Part 3: May we leverage them together? 

Maybe the most constant intersection between Deep Learning and Homomorphic Encryption has been with regards to data privacy. As it happens, when you undertake homomorphic encryption of data, you cannot interpret it but you can still upkeep most of the fascinating statistical structure. This has enabled individuals to go about training models on encrypted data (CryptoNets). Further, a startup hedge fund referred to as Numer.ai undertakes encryption of costly, proprietary information and enables anybody to make an effort to go about training machine learning models to forecast the stocks market. Typically they wouldn’t be capable of doing this as it would imply giving away very costly data. And typical encryption would make model training impossible.  

This blog post is concerned with the inversion, encryption of the neural network and how to train it on decrypted information. 

A neural network, in all its wonderous sophistication, actually simplifies into a shockingly minimal number of moving portions which are basically repeated to infinity. As a matter of fact, several state-of-the-art neural networks can be developed leveraging on the following operations: 

  • Division 
  • Multiplication
  • Addition 
  • Subtraction 
  • Sigmoid 
  • Exponential 
  • Tanh 

Which gives way to the overt technical concern, can we homomorphically encrypt the neural network itself? Would we wish to? As it happens, with some conservative approximations, this can be performed. 

  • Multiplication – works right out of the box 
  • Addition – works right out of the box 
  • Subtraction – functions right out of the box? It’s merely the inversion of addition. 
  • Sigmoid – Could be a bit more difficult
  • Division – functions right out of the box? It’s an inversion of multiplication.
  • Tanh – Could be a bit more difficult 
  • Exponential – Could be a bit more difficult 

It appears like we’ll be capable to get subtraction and division somewhat easily, however, these more complex functions are more intricate than mere multiplication and addition. In order to attempt to go about homomorphic encryption of a deep neural network, we require one more ingredient, our secret sauce. 

Part 4 – Taylor Series Expansion 

You might recall it from primary school. A Taylor Series enables one to go about computing a complex, nonlinear function leveraging a limitless series of subtractions, multiplications, additions, and divisions. This is brilliant – save the limitless part. Thankfully, if you stop short of computing the precise Taylor Series expansion, you can still obtain a close approximation of the function under consideration. Here are a few famous functions approximated via Taylor Series. 


You might notice that there are exponents. Exponents are merely repetitious multiplication, which we can perform. For something to toy with, here’s a small phython implementation approximating the Taylor Series for our desired sigmoid function. We’ll look at the first few stages of the series and see how near we can get to the actual sigmoid function. 


Possessing just the starting four factors of the Taylor Series, we can very near to the sigmoid for a comparatively big series of numbers. Now that we possess our generalized technique, it’s time to choose a Homomorphic Encryption algorithm. 

Part 5: Opting for an encryption algorithm 

Homomorphic encrypting is a comparatively nascent domain, with the dominating landmark being the finding of the first completely homomorphic algorithm by Craig Gentry way back in 2009. This milestone event developed a foothold for several to follow suit. A lot of the excitement surrounding Homomorphic Encrypting has been with regards to the development of Turing Complete, computers that feature homomorphic encryption. Therefore, the mission for a completely homomorphic scheme looks to identify an algorithm that can efficiently and safely go about computing the several logic gates needed to run random computation. 

The generalized hope is that individuals would be capable of safely offloading work onto the cloud without the risk that the information being transmitted could be intercepted by anybody other than the sender. It’s a neat concept, and a lot of advancements have to be made to get there. 

However, there are some pitfalls. Generally, most completely homomorphic encryption schemes are very slow in comparison to typical computers, making it impractical. This has led to a fascinating thread of research to restrict the number of operations to be somewhat homomorphic so that at least few computations could be executed. Reduced flexibility, but quicker, a typical tradeoff in computing terms. 

This is where we want to begin our search. Theoretically, we require a homomorphic encryption scheme that functions on floats (however, we’ll settle for integers, as we’ll observe) over binary values. Binary values would function, but not only would it need the flexibility of completely homomorphic encryption, at the expense of performance, however, we’d have to handle the logic amongst binary representations and the mathematical operations we desire to go about computing. A less capable, customized HE algorithm with regards to floating point operations would be a more ideal fit. 

Regardless of this limitation, there are still an abundance of options. Here are a few instances that are typically leveraged with traits that are desirable: 

  • Effective homomorphic encryption on integer vectors and its applications 
  • Yet another somewhat homomorphic encryption (YASHE) 
  • Somewhat Practical Fully homomorphic encryption (FV) 
  • Fully homomorphic encryption without bootstrapping 

The best option to leverage here is probably either FV or YASHE. YASHE was the technique leveraged for the famous CryptoNets algorithm, with amazing support for floating point operations. But its very complicated. With the intent of making this blog article easier and fun to toy around with, we’re going to go with the somewhat less sophisticated, and probably less secure Efficient Integer Vector Homomorphic Encryption.  

However, we believe its critical to observe that new HE algorithms are being produced as you look through this post, and the concepts put forth in this article are generic to any schemes that are homomorphic over multiplication and addition of integers and/or floating point numbers. We hope to increase awareness with regards to this application of HE so that more He algos will be generated for optimization of deep learning. 

This encrypting algorithm is also documented in exhaustive detail by Yu, Lai, Paylor, in their seminal work. The primary body of this strategy is in the C++ file vhe.cpp. Now we’ll walk you through a python port of this coding with supplementing details for what’s occurring. This will also be good if you opt to go about implementing a more sophisticated scheme as there exist themes that are comparatively universals – general functions naming, variable naming, etc. 

Part 6 – Homomorphic encrypting in python 

Let’s begin by looking at some of the homomorphic encryption jargon: 

  • Plaintext: This is un-encrypted information. It’s also referred to as “the message”. In our scenario, this will be a group of numbers representative of our neural network. 
  • Cyphertext: This is encrypted information. We’ll perform mathematical operations on the cyphertext which will alter the fundamental Plaintext. 
  • Public Key: This is a pseudo arbitrary grouping of numbers that facilitates anybody to go about encrypting data. It’s okay to go about sharing this with individuals as theoretically, they can only leverage it for encryption. 
  • Private/Secret Key: This is a pseudo-arbitrary grouping of numbers that facilitates you to go about decrypting information that was encrypted by the public key. This should not be shared with other individuals. Else, they could decrypt your messaging. 

So these are the dominant moving parts. They also correlate to specific variables with names that are very standardized throughout differing homomorphic encryption strategies. In this research, they are as follows: 

  • S: this is a matrix that indicates your private/secret key. You require it to decrypt things. 
  • M:  This is your public key. You’ll leverage it go about encrypting things and execute mathematical operations. A few algorithms don’t need the public key for all mathematical operations but this one leverages it exhaustively. 
  • c: This vector is the data that has undergone encryption, the “cyphertext” 
  • x: This correlates to your message, or your “plaintext”. A few papers leverage the variable ‘m’ instead. 
  • W: This is a singular “weighting” scalar variable which we leverage to re-weight our input messaging x – make it consistently larger or smaller. We leverage this variable to assist in tuning the signal to noise ratio. Making the signal “larger” makes it less vulnerable to noise at any specific operation. However, overshooting and making it larger than it needs to be amplifies our probability of corrupting our information entirely. There’s a delicate balance. 
  • E or e: this typically is a reference to random noise. In some scenarios, this indicates noise added to the information prior to encryption with the public key. This noise is typically what makes the decryption tough. It’s what enables two encryptions of the same message to differ, which is critical to make the message difficult to crack. Observe that this can be a vector or a matrix dependent on the algorithm and type of implementation. In other scenarios, this can indicate the noise that adds up over operations.  

As is the practice with several mathematical research, capital letters correspond to matrices, lowercase letters correspond to vectors, and italic lowercases correspond to scalars. Homomorphic Encryption has four types of operations that we are concerned with: private/public keypair generation, one-way encryption, decryption, and the mathematical operations. Let’s begin with the decryption.



The formula towards the left details the general relationship between our secret key S and our message x. The formula towards the right demonstrates how we can leverage our secret key to go about decrypting our message. Observe tht the ‘e’ is gone. Essentially, the general philosophy underlying homomorphic encryption strategies is to introduce just enough noise that the initial message is difficult to retrieve in the absence of the secret key, but a small enough amount of noise that it corresponds to a rounding error when you actually possess the secret key. The brackets on the top and bottom indicate “round to the nearest integer”. Other homomorphic encryption algorithms round to several amounts. Modulus operators are almost unanimous. Encryption, then, is about producing a c so this relationship stays true. If S is a randomized matrix, then c will be difficult to go about decrypting. The easier, non-symmetric way of producing an encryption key is to just identify the inversion of the secret key. Let’s begin with some python code. 

6/23/2021 Building Safe A.I. – i am trask https://iamtrask.github.io/2017/03/17/safe-ai/ 1/1 view sourceprint? 01.import numpy as np 02. 03.def generate_key(w,m,n): 04. S = (np.random.rand(m,n) * w / (2 ** 16)) # proving max(S) < w 05. return S 06. 07.def encrypt(x,S,m,n,w): 08. assert len(x) == len(S) 09. 10. e = (np.random.rand(m)) # proving max(e) < w / 2 11. c = np.linalg.inv(S).dot((w * x) + e) 12. return c 13. 14.def decrypt(c,S,w): 15. return (S.dot(c) / w).astype(‘int’) 16. 17.def get_c_star(c,m,l): 18. c_star = np.zeros(l * m,dtype=’int’) 19. for i in range(m): 20. b = np.array(list(np.binary_repr(np.abs(c[i]))),dtype=’int’) 21. if(c[i] < 0): 22. b *= -1 23. c_star[(i * l) + (l-len(b)): (i+1) * l] += b 24. return c_star 25. 26.def get_S_star(S,m,n,l): 27. S_star = list() 28. for i in range(l): 29. S_star.append(S*2**(l-i-1)) 30. S_star = np.array(S_star).transpose(1,2,0).reshape(m,n*l) 31. return S_star 32. 33. 34.x = np.array([0,1,2,5]) 35. 36.m = len(x) 37.n = m 38.w = 16 39.S = generate_key(w,m,n 

And when this code is executed in an iPython notebook, we can execute the following operations (with corresponding output)



The primary thing to observe here are the below results. Observe that we can execute some fundamental operations to the cyphertext and it alters the underlying plain text correspondingly.  

Part 7 – Optimization of encryption 

Observe the decryption formulae again. If the secret key, S, is the identity matrix, then cyphertext c is merely a re-weighted, somewhat noisy variant of the input x, which could simply be discovered provided a bunch of instances. If you cannot understand this, Google “identity matrix tutorial” and you’ll get an idea. It’s a bit too detailed to dive into here. 

This takes us to how the encryption occurs. Over overtly allocating a self-standing public key and private key, the authors propose a “key switching” strategy, where they can change out one Private Key S for another S. More particularly, this private key switching strategy consists of producing a matrix M that can execute the transformation. As M has the capability to convert a message from being encrypted – secret key of the identity matrix, to being encrypted, secret key that’s arbitrary and tough to crack, this M gets to be our public key! 

That was a lot of input to take. Here’s another summarization. 

  • Provided the two formulae above, if the secret key = identity matrix, the message has not undergone encryption. 
  • Provided the two formulae above, if the secret key is an arbitrary matrix, the produced massage has undergone encryption. 
  • We can make a matrix M that alters the secret key from one secret key to another. 
  • When the matrix M translates from the identity to an arbitrary secret key, it is, by extension, encrypting the message in a one-way encryption. 
  • As M executes the part of a “one way encryption”, we refer to it as the “public key” and distribute it as we would a public key as it cannot decrypt the code. 

So, with no further fuss, this is how its executed in Python. 

importnumpy as np 



04.S =(np.random.rand(m,n) *w /(2**16)) # proving max(S) < w 




08.assertlen(x) ==len(S) 


10.e =(np.random.rand(m)) # proving max(e) < w / 2 

11.c =np.linalg.inv(S).dot((w *x) +e) 




15.return(S.dot(c) /w).astype(‘int’) 



18.c_star =np.zeros(l *m,dtype=‘int’) 

19.fori inrange(m): 

20.b =np.array(list(np.binary_repr(np.abs(c[i]))),dtype=‘int’) 

21.if(c[i] < 0): 

22.b *=1 

23.c_star[(i *l) +(llen(b)): (i+1) *l] +=b 




27.l =int(np.ceil(np.log2(np.max(np.abs(c))))) 

28.c_star =get_c_star(c,m,l) 

29.S_star =get_S_star(S,m,n,l) 

30.n_prime =n +1 



33.S_prime =np.concatenate((np.eye(m),T.T),0).T 

34.A =(np.random.rand(n_prime m, n*l) *10).astype(‘int’) 

35.E =(1*np.random.rand(S_star.shape[0],S_star.shape[1])).astype(‘int’) 

36.M =np.concatenate(((S_star T.dot(A) +E),A),0) 

37.c_prime =M.dot(c_star) 




41.S_star =list() 

42.fori inrange(l): 


44.S_star =np.array(S_star).transpose(1,2,0).reshape(m,n*l) 




48.n_prime =n +1 

49.T =(10*np.random.rand(n,n_prime n)).astype(‘int’) 




53.c,S =switch_key(x*w,np.eye(m),m,n,T) 



56.x =np.array([0,1,2,5]) 


58.m =len(x) 

59.n =m 

60.w =16 

61.S =generate_key(w,m,n) 



The way this functions is by making the S key mostly the identity matrix, merely focusing an arbitrary vector T onto it. Therefore, T actually has all of the data needed for secret key, although we have to still develop a matrix of size S to get things functional. 

Part 8 – Developing an XOR neural network 

So, as we currently have the knowledge on how to go about encrypting and decrypting messages – and compute basic addition and multiplication, it’s time to begin attempting to expand to the remainder of the operations we require to develop a simple XOR neural network. While essentially neural networks are just a series of very simplistic operations, there are various combos of these operations that we require some handy functions for. So here, each operation is going to be detailed, the ones we need, and the high level strategy we’re going to adopt – essentially the series of multiplications and additions we will leverage. Then we’ll look at the coding.  

  • Floating point numbers: We’re going to perform this by merely scaling our floats into integers. We’ll undertake training of our networks on integers assuming they are floats. Let’s assume we’re scaling by 1000. 0.2*0.5 = 0.1. If we scale up, 200*500 = 100000. We have to scale down by 1000 twice as we executed multiplication, but 100000 /  1000*1000 = 0.1 which is what we require. This can be tough to start with, but we’ll get used to it. As this HE scheme rounds to the closest integer, this also enables you to control the accuracy of your neural net. 
  • Vector-matrix multiplication: This is our secret sauce. As it happens, the M matrix that undertakes conversion from one secret key to another is actually a way to linear transform. 
  • Inner dot product: In the correct context, the linear transformation depicted above can also be an inner dot product. 
  • Sigmoid: As we can do vector-matrix multiplication, we can assess random polynomials provided adequate multiplications. As we are aware of the Taylor Series polynomial for sigmoid, we can assess an approximate sigmoid. 
  • Elementwise Matrix Multiplication: This one is shockingly inefficient. We have to perform a Vector-Matrix multiplication or a series of inner dot products. 
  • Outer product: We can achieve this through masking and inner products. 

As a generic disclaimer, there might be more effective methods of achieving these, but the idea was to not risk compromising the integrity of the homomorphic encryption scheme, so the move was to kind of bend over backwards to just leverage the given functions from the paper. Now, let’s observe how these are achieved in Python. 


02.out_rows =list() 

03.forposition inrange(len(layer_2_c)1): 


05.M_position =M_onehot[len(layer_2_c)2][0] 


07.layer_2_index_c =innerProd(layer_2_c,v_onehot[len(layer_2_c)2][position],M_position,l) /scaling_factor 


09.x =layer_2_index_c 

10.x2 =innerProd(x,x,M_position,l) /scaling_factor 

11.x3 =innerProd(x,x2,M_position,l) /scaling_factor 

12.x5 =innerProd(x3,x2,M_position,l) /scaling_factor 

13.x7 =innerProd(x5,x2,M_position,l) /scaling_factor 


15.xs =copy.deepcopy(v_onehot[5][0]) 

16.xs[1] =x[0] 

17.xs[2] =x2[0] 

18.xs[3] =x3[0] 

19.xs[4] =x5[0] 

20.xs[5] =x7[0] 


22.out =mat_mul_forward(xs,H_sigmoid[0:1],scaling_factor) 




26.defload_linear_transformation(syn0_text,scaling_factor =1000): 

27.syn0_text *=scaling_factor 




31.flip =False 

32.if(len(x) < len(y)): 

33.flip =True 

34.tmp =x 

35.x =y 

36.y =tmp 


38.y_matrix =list() 


40.fori inrange(len(x)1): 



43.y_matrix_transpose =transpose(y_matrix) 


45.outer_result =list() 

46.fori inrange(len(x)1): 

47.outer_result.append(mat_mul_forward(x *onehot[len(x)1][i],y_matrix_transpose,scaling_factor)) 









56.input_dim =len(layer_1) 

57.output_dim =len(syn1) 


59.buff =np.zeros(max(output_dim+1,input_dim+1)) 

60.buff[0:len(layer_1)] =layer_1 

61.layer_1_c =buff 


63.syn1_c =list() 

64.fori inrange(len(syn1)): 

65.buff =np.zeros(max(output_dim+1,input_dim+1)) 

66.buff[0:len(syn1[i])] =syn1[i] 



69.layer_2 =innerProd(syn1_c[0],layer_1_c,M_onehot[len(layer_1_c) 2][0],l) /float(scaling_factor) 

70.fori inrange(len(syn1)1): 

71.layer_2 +=innerProd(syn1_c[i+1],layer_1_c,M_onehot[len(layer_1_c) 2][i+1],l) /float(scaling_factor) 





76.y =[y] 


78.one_minus_layer_1 =transpose(y) 


80.outer_result =list() 

81.fori inrange(len(x)1): 

82.outer_result.append(mat_mul_forward(x *onehot[len(x)1][i],y,scaling_factor)) 




Now, there’s one aspect that hasn’t been discussed as of yet. To not spend too much time, we’re pre-computing various keys, matrices, and vectors and recording them. This consists of things like the vector of all 1s and one-hot encoding vectors of several lengths. This is good for masking operations above in addition to some simple things we wish to be able to perform. For instance, the derivative of sigmoid is sigmoid(x) * (1-sigmoid(X)) Therefore, precomputing these variables is handy. The following is the pre-computation step. 



03.l =100 

04.w =2**25 


06.aBound =10 

07.tBound =10 

08.eBound =10 


10.max_dim =10 


12.scaling_factor =1000 


14.# keys 

15.T_keys =list() 

16.fori inrange(max_dim): 



19.# one way encryption transformation 

20.M_keys =list() 

21.fori inrange(max_dim): 



24.M_onehot =list() 

25.forh inrange(max_dim): 

26.i =h+1 

27.buffered_eyes =list() 

28.forrow innp.eye(i+1): 

29.buffer =np.ones(i+1) 

30.buffer[0:i+1] =row 

31.buffered_eyes.append((M_keys[i1].T *buffer).T) 



34.c_ones =list() 

35.fori inrange(max_dim): 

36.c_ones.append(encrypt(T_keys[i],np.ones(i+1), w, l).astype(‘int’)) 


38.v_onehot =list() 

39.onehot =list() 

40.fori inrange(max_dim): 

41.eyes =list() 

42.eyes_txt =list() 

43.foreye innp.eye(i+1): 






49.H_sigmoid_txt =np.zeros((5,5)) 


51.H_sigmoid_txt[0][0] =0.5 

52.H_sigmoid_txt[0][1] =0.25 

53.H_sigmoid_txt[0][2] =1/48.0 

54.H_sigmoid_txt[0][3] =1/480.0 

55.H_sigmoid_txt[0][4] =17/80640.0 


57.H_sigmoid =list() 

58.forrow inH_sigmoid_txt: 



If you’re observing closely, you realize that the _sigmoid matrix is the matrix we require for the polynomial assessment of sigmoid. Lastly, we wish to train our neural network with the following. We’ve basically taken the XOR network from prior research and switched out its functions with the correct utility functions for our encrypted weights. 



003.input_dataset =[[],[0],[1],[0,1]] 

004.output_dataset =[[0],[1],[1],[0]] 


006.input_dim =3 

007.hidden_dim =4 

008.output_dim =1 

009.alpha =0.015 


011.# one way encrypt our training data using the public key (this can be done onsite) 

012.y =list() 

013.fori inrange(4): 



016.# generate our weight values 

017.syn0_t =(np.random.randn(input_dim,hidden_dim) *0.2) 0.1 

018.syn1_t =(np.random.randn(output_dim,hidden_dim) *0.2) 0.1 


020.# one-way encrypt our weight values 

021.syn1 =list() 

022.forrow insyn1_t: 



025.syn0 =list() 

026.forrow insyn0_t: 




030.# begin training 

031.foriter inrange(1000): 


033.decrypted_error =0 

034.encrypted_error =0 

035.forrow_i inrange(4): 


037.if(row_i ==0): 

038.layer_1 =sigmoid(syn0[0]) 

039.elif(row_i ==1): 

040.layer_1 =sigmoid((syn0[0] +syn0[1])/2.0) 

041.elif(row_i ==2): 

042.layer_1 =sigmoid((syn0[0] +syn0[2])/2.0) 


044.layer_1 =sigmoid((syn0[0] +syn0[1] +syn0[2])/3.0) 


046.layer_2 =(innerProd(syn1[0],layer_1,M_onehot[len(layer_1) 2][0],l) /float(scaling_factor))[0:2] 


048.layer_2_delta =add_vectors(layer_2,y[row_i]) 


050.syn1_trans =transpose(syn1) 


052.one_minus_layer_1 =[(scaling_factor *c_ones[len(layer_1) 2]) layer_1] 

053.sigmoid_delta =elementwise_vector_mult(layer_1,one_minus_layer_1[0],scaling_factor) 

054.layer_1_delta_nosig =mat_mul_forward(layer_2_delta,syn1_trans,1).astype(‘int64’) 

055.layer_1_delta =elementwise_vector_mult(layer_1_delta_nosig,sigmoid_delta,scaling_factor) *alpha 


057.syn1_delta =np.array(outer_product(layer_2_delta,layer_1)).astype(‘int64’) 


059.syn1[0] -=np.array(syn1_delta[0]*alpha).astype(‘int64’) 


061.syn0[0] -=(layer_1_delta).astype(‘int64’) 


063.if(row_i ==1): 

064.syn0[1] -=(layer_1_delta).astype(‘int64’) 

065.elif(row_i ==2): 

066.syn0[2] -=(layer_1_delta).astype(‘int64’) 

067.elif(row_i ==3): 

068.syn0[1] -=(layer_1_delta).astype(‘int64’) 

069.syn0[2] -=(layer_1_delta).astype(‘int64’) 



072.# So that we can watch training, I’m going to decrypt the loss as we go. 

073.# If this was a secure environment, I wouldn’t be doing this here. I’d send 

074.# the encrypted loss somewhere else to be decrypted 

075.encrypted_error +=int(np.sum(np.abs(layer_2_delta)) /scaling_factor) 

076.decrypted_error +=np.sum(np.abs(s_decrypt(layer_2_delta).astype(‘float’)/scaling_factor)) 



079.sys.stdout.write(“\r Iter:”+str(iter) +” Encrypted Loss:”+str(encrypted_error) +  ” Decrypted Loss:”+str(decrypted_error) +” Alpha:”+str(alpha)) 


081.# just to make logging nice 

082.if(iter %10==0): 



085.# stop training when encrypted error reaches a certain level 

086.if(encrypted_error < 25000000): 



089.print(“\nFinal Prediction:”) 


091.forrow_i inrange(4): 


093.if(row_i ==0): 

094.layer_1 =sigmoid(syn0[0]) 

095.elif(row_i ==1): 

096.layer_1 =sigmoid((syn0[0] +syn0[1])/2.0) 

097.elif(row_i ==2): 

098.layer_1 =sigmoid((syn0[0] +syn0[2])/2.0) 


100.layer_1 =sigmoid((syn0[0] +syn0[1] +syn0[2])/3.0) 


102.layer_2 =(innerProd(syn1[0],layer_1,M_onehot[len(layer_1) 2][0],l) /float(scaling_factor))[0:2] 

103.print(“True Pred:”+str(output_dataset[row_i]) +” Encrypted Prediction:”+str(layer_2) +” Decrypted Prediction:”+str(s_decrypt(layer_2) /scaling_factor)) 

Iter:0 Encrypted Loss:84890656 Decrypted Loss:2.529 Alpha:0.015 

 Iter:10 Encrypted Loss:69494197 Decrypted Loss:2.071 Alpha:0.015 

 Iter:20 Encrypted Loss:64017850 Decrypted Loss:1.907 Alpha:0.015 

 Iter:30 Encrypted Loss:62367015 Decrypted Loss:1.858 Alpha:0.015 

 Iter:40 Encrypted Loss:61874493 Decrypted Loss:1.843 Alpha:0.015 

 Iter:50 Encrypted Loss:61399244 Decrypted Loss:1.829 Alpha:0.015 

 Iter:60 Encrypted Loss:60788581 Decrypted Loss:1.811 Alpha:0.015 

 Iter:70 Encrypted Loss:60327357 Decrypted Loss:1.797 Alpha:0.015 

 Iter:80 Encrypted Loss:59939426 Decrypted Loss:1.786 Alpha:0.015 

 Iter:90 Encrypted Loss:59628769 Decrypted Loss:1.778 Alpha:0.015 

 Iter:100 Encrypted Loss:59373621 Decrypted Loss:1.769 Alpha:0.015 

 Iter:110 Encrypted Loss:59148014 Decrypted Loss:1.763 Alpha:0.015 

 Iter:120 Encrypted Loss:58934571 Decrypted Loss:1.757 Alpha:0.015 

 Iter:130 Encrypted Loss:58724873 Decrypted Loss:1.75 Alpha:0.0155 

 Iter:140 Encrypted Loss:58516008 Decrypted Loss:1.744 Alpha:0.015 

 Iter:150 Encrypted Loss:58307663 Decrypted Loss:1.739 Alpha:0.015 

 Iter:160 Encrypted Loss:58102049 Decrypted Loss:1.732 Alpha:0.015 

 Iter:170 Encrypted Loss:57863091 Decrypted Loss:1.725 Alpha:0.015 

 Iter:180 Encrypted Loss:55470158 Decrypted Loss:1.653 Alpha:0.015 

 Iter:190 Encrypted Loss:54650383 Decrypted Loss:1.629 Alpha:0.015 

 Iter:200 Encrypted Loss:53838756 Decrypted Loss:1.605 Alpha:0.015 

 Iter:210 Encrypted Loss:51684722 Decrypted Loss:1.541 Alpha:0.015 

 Iter:220 Encrypted Loss:54408709 Decrypted Loss:1.621 Alpha:0.015 

 Iter:230 Encrypted Loss:54946198 Decrypted Loss:1.638 Alpha:0.015 

 Iter:240 Encrypted Loss:54668472 Decrypted Loss:1.63 Alpha:0.0155 

 Iter:250 Encrypted Loss:55444008 Decrypted Loss:1.653 Alpha:0.015 

 Iter:260 Encrypted Loss:54094286 Decrypted Loss:1.612 Alpha:0.015 

 Iter:270 Encrypted Loss:51251831 Decrypted Loss:1.528 Alpha:0.015 

 Iter:276 Encrypted Loss:24543890 Decrypted Loss:0.732 Alpha:0.015 

 Final Prediction: 

True Pred:[0] Encrypted Prediction:[-3761423723.0718255 0.0] Decrypted Prediction:[-0.112] 

True Pred:[1] Encrypted Prediction:[24204806753.166267 0.0] Decrypted Prediction:[ 0.721] 

True Pred:[1] Encrypted Prediction:[23090462896.17028 0.0] Decrypted Prediction:[ 0.688] 

True Pred:[0] Encrypted Prediction:[1748380342.4553354 0.0] Decrypted Prediction:[ 0.052]  

Add Comment