DETAILED ACTION
1.	This office action is in response to the Application No. 16508434 filed on 07/11/2019. Claims 1, 2, 5, 12, 13, 35-49 are presented for examination and are currently pending.
Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

3.	Claims 1, 2 and 12 are rejected under 35 U.S.C 102(a)(1) as being anticipated by Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018))

	Regarding claim 1, Rouhani teaches a method, comprising: training a neural network using a cost function that places constraints on weights in the neural network, (DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner’s IP rights in both whitebox and black-box settings, abstract; All three loss functions (loss0, loss1, and loss2) are simultaneously used to train/fine-tuned the underlying neural network. We used Stochastic Gradient Descent (SGD) in all our experiments to optimize the DL model parameters with the explicit constraints outlined in Equations 1 and 3, pg. 5, left col, second para.; In Equation (1), θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para. The Examiner notes that a loss function is synonymous to a cost function)
	 the constraints based on one or more keys and one or more cluster centers of the weights, (step 1: {Key set generation: Xkey, Ykey}, Select_Pairs ({Xtrain, Ytrain}, y*, … step 3: Compute mean of activation: µs×M ← Compute Mean (f l(x, θ), pg. 7, left col, Algorithm 3; θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para.; The acquired mean values are used as an approximation of the
Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy; Figure 3 illustrates a simple example of two
clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, second to the last para.)
	 wherein the training embeds a capability to produce one or more signatures corresponding to the one or more, keys; (To protect the IP of a particular neural network, the model owner (a.k.a. Alice) first must locally embed the watermark (WM) information into her neural network. Embedding the watermark involves three main steps: (i) Generating a set of N-bit binary random strings to be embedded in the pdf distribution of different layers in the target neural network. (ii) Creating specific input keys to later trigger the corresponding WM strings after watermark embedding. (iii) Training (fine-tuning) the neural network with
particular constraints enforced by the WM information within intermediate activation maps of the target DL model, pg. 3, right col. first para.; Typically,
a specific set of inputs (keys) is used for extracting the embedded watermark. In our case, the inputs triggering the ingrained binary random strings are used as the key for the detection of IP infringement in both white-box and black-box settings, pg. 3, Fig. 1, Alice Local Model Watermarking; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; the owner’s signature (watermark), abstract; The Examiner notes that the bit string is a signature) and
	outputting information corresponding to the trained neural network (Output watermarking is a post-processing step performed after embedding the selected binary WMs in the intermediate (hidden) layers, pg. 5, Fig. 2; Once the neural network is locally trained by Alice to include the pertinent watermark information, pg. 3, right col, second para.)
	for a testing a neural network to determine if the tested neural network is or is not verifiable as the trained neural network (DeepSigns uses a small input key size (K = 20) to trigger the WM information, whereas a typical test set in DL problems can be two to three orders of magnitude larger, pg. 12, left col, section B: Black-box Setting. The Examiner notes that input key size (K = 20) is a test set for testing.; To verify the presence of the watermark in the output layer, Alice needs to statistically analyze Bob’s responses to a set of input keys. To do so, she must follow four main steps: (I) Submitting queries to the remote DL service provider using the randomly selected input keys (Xkey) as discussed in Section IV-B. (II) Acquiring the output labels corresponding to the input keys. (III) Computing the number of mismatches between the model predictions and Alice’s ground-truth labels. (IV) Thresholding the number of mismatches to derive the final decision. If the number of mismatches is less than a threshold, it means that the model used by Bob possesses a high similarity to the network owned by Alice. Otherwise, the two models are not replicas. When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, second para.; Boolean Success on watermark detection of Alice trained neural network is verifiable by green checkmark or is not verifiable by red X mark Fig. 1)

	Regarding claim 2, Rouhani teaches the method according to claim 1, Rouhani teaches wherein training comprises determining a value of the cost function (To do so, one needs to add the following term to the overall loss function for each specific layer of the underlying deep neural network: −λ2 ΣΣ (bkj ln(Gkjσ) + (1 − bkj) ln(1 − Gkjσ )), Here, the variable λ2 is a hyper-parameter that determines the contribution of loss2 in the process of training the neural network, … We set the λ2 0:01 in all our experiments, pg. 5, left col, Equation 3; 
	L = cross_entropy + λ1loss1 + λ2loss2
 The Examiner notes that the determined value of the loss function loss1 (Equation 1, pg. 4) and loss function loss2 (Equation 3, pg. 5) is applied to the regularized loss function (L) above, pg. 5, left col, Algorithm 1)
	based on a key and a cluster center (Computing statistical mean value of the activation features obtained by passing the selected input keys in Step I. The
acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy) and 
	using the value of the cost function in the training. (sum up the corresponding loss functions for each layer in Step 4 to train the pertinent DL model, pg. 4, right col, first para.; Algorithm step 4: … training the model with the regularized loss function: L = cross_entropy + λ1loss1 + λ2loss2, pg. 5, left col, Algorithm 1)

	Regarding claim 12, Rouhani teaches a method, comprising: (DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner’s IP rights in both white-box and black-box settings, abstract)
	testing a neural network with one or more keys to determine one or more output signatures, wherein the neural network has an embedded capability to produce one or more signatures corresponding to the one or more keys, (Testing White-box neural network, pg. 3, Fig 1. The Examiner notes that the tested neural network is the white-box Neural Network at Bob Deep Learning (DL) service provider; Boolean  Success on watermark detection of Alice trained neural network is verified by a green check mark or unverified by a red x mark, pg. 3, Fig. 1; DeepSigns uses a small input key size (K = 20) to trigger the WM information, whereas a typical test set in DL problems can be two to three orders of magnitude larger, pg. 12, left col, section B: Black-box Setting; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; the owner’s signature (watermark), abstract; The Examiner notes that input key size (K = 20) is a test set for testing.) and
	 the capability is based on constraints placed on weights in the neural network during training, (We used Stochastic Gradient Descent (SGD) in all our experiments to optimize the DL model parameters with the explicit constraints outlined in Equations 1 and 3, pg. 5, left col, second para.; In Equation (1), θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para.)
	 the constraints based on one or more keys and one or more cluster centers of the weights, the one or more cluster centers based on weights used in the neural network; (step 1: {Key set generation: Xkey, Ykey}, Select_Pairs ({Xtrain, Ytrain}, y*, … step 3: Compute mean of activation: µs×M ← Compute Mean (f l(x, θ), pg. 7, left col, Algorithm 3; θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para.; The acquired mean values are used as an approximation of the
Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy; Figure 3 illustrates a simple example of two
clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, second to the last para.)
	comparing, using a metric, (compares using metrics, Table III, pg. 9)
	the one or more output signatures with one or more other signatures that correspond to the one or more keys; (OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; The probability of a network (not owned by Alice) to make at least nk correct decision according to the Alice private keys is as follows: ….where O is the oracle DL model used by Bob, Nk is a random variable indicating the number of matched predictions of the two models compared against one another, K is the input key length, pg. 8, left col, Equation 4)
	determining based on the comparison whether the neural network is or is not verified as a known neural network with the embedded capability to produce specific signatures corresponding to the one or more keys; (To verify the presence of the watermark in the output layer, Alice needs to statistically analyze Bob’s responses to a set of input keys, …Submitting queries to the remote DL service provider using the randomly selected input keys (Xkey), …. If the number of mismatches is less than a threshold, it means that the model used by Bob possesses a high similarity to the network owned by Alice. Otherwise, the two models are not replicas. When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, second para.; step 1: Alice sends her input keys Xkey to Bob T , Step2: Inference by the remote model: Y pred ← Predict (T , Xkey), step 3: Response comparison: nk ← Count Mismatch (Y pred, Y key), step 4: Decision making: Presence = 1 if nk < Nk else 0; Return: WM presence indicator (Presence), pg. 7, Algorithm 4. The Examiner notes that watermark presence indicates that signatures are present) and
	in response to the neural network determined to be verified as the known neural network, reporting the neural network as being verified (A value of 1 in the last row of the table indicates that the embedded watermark is successfully detected, whereas a value of 0 indicates a false negative, pg. 9. Table 3)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



4.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018)) in view of Liu et al. ("Design and realization of a meaningful digital watermarking algorithm based on RBF neural network." 2005 International Conference on Neural Networks and Brain. Vol. 1. IEEE, 2005.)

	Regarding claim 5, Rouhani teaches the method according to claim 1, Rouhani teaches determining for the cost function a key embedding cost term, using binary cross entropy with respect to the one or more signatures. (specific set of inputs (keys) is used for extracting the embedded watermark. In our case, the inputs triggering the ingrained binary random strings are used as the key for the detection of IP infringement in both white-box and black-box settings, pg. 3, Fig. 1, Alice Local Model Watermarking; Algorithm 1 step 4: Embed watermark in the neural network by training the model with the regularized loss function, L = cross_entropy + λ1loss1 + λ2loss2, pg. 5, left col, Algorithm 1)
	Rouhani teaches clustering (In this paper, we consider a Gaussian Mixture Model (GMM), pg. 4 left col, second para. The Examiner notes that GMM is a type of clustering algorithm, Fig. 3), but does not explicitly teach K-means clustering with clustering weights with K cluster centers of the weights; deriving the one or more signatures based on the one or more keys and the K cluster centers; 
	Liu teaches wherein training comprises: clustering weights with K cluster centers of the weights; (K-mean value of clustering method is chosen to train the neural network of hidden layer learning, pg. 216, right col, first para.)
	deriving the one or more signatures based on the one or more keys (embedded watermark by secret key, pg. 217, left col, last para.) and
	 the K cluster centers; (N is number of clustering centre, pg. 215, left col, Equation 4)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Rouhani to incorporate the method of Liu for the benefit of maximum watermark embedded intensity using neural network is proposed in order to make watermark have good robustness against all kinds of attacks and embed the maximum watermark information under the condition of good invisibility (Liu, pg. 215, right col, first para.)

5.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018)) in view of Sternickel et al. (US20070167846)

	Regarding claim 13, Rouhani teaches the method according to claim 12, Rouhani teaches wherein comparing, using a metric, the one or more output signatures with one or more other signatures that correspond to the one or more keys (compares using metrics, Table III, pg. 9; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; The probability of a network (not owned by Alice) to make at least nk correct decision according to the Alice private keys is as follows: ….where O is the oracle DL model used by Bob, Nk is a random variable indicating the number of matched predictions of the two models compared against one another, K is the input key length, pg. 8, left col, Equation 4) further comprises:
	Rouhani does not explicitly teach determining a confidence score p based on the following: p=1−r n, in which n is a number of bits that are a same between the one or more output signatures and the one or more other signatures, and r is a probability that the bits of the one or more output signatures and the one or more target signatures might collide accidentally.
	Sternickel teaches determining a confidence score p based on the following: p=1−r n, in which n is a number of bits that are a same between the one or more output signatures and the one or more other signatures, and r is a probability that the bits of the one or more output signatures and the one or more target signatures might collide accidentally. (For assessing the quality of the validation set or a test set, we introduce similar metrics, q2 and Q2, where q2 and Q2 are defined as 1−r2 and 1−R2, respectively, for the data in the test set [0069])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Rouhani to incorporate the method of Sternickel for the benefit of assessing the quality of a trained model [0068] and a neural network wherein weights in the first layer would be just the descriptors of the training data (Sternickel [0057])
	
6. 	Claims 35, 36, 45, 46, 48 and 49 are rejected under 35 U.S.C. 103 as being unpatentable over Rodriguez et al. (US20150055855) in view of Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (31 May 2018))

	Regarding claim 35, Rodriguez teaches an apparatus, comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the apparatus at least to: (such hardware includes one or more processors, one or more memories (e.g. RAM), together with software instructions [0192]; Machine learning systems can be designed and implemented using a variety of tools, CPUs, GPUs, and dedicated hardware platforms [0050])
	train a neural network (This document describes the following methods: i) “watermarking” a neural network (NN) training set, and ii) using digital watermark signals in the training set used to train the neural network [0055])
	wherein the training embeds a capability to produce one or more signatures corresponding to the one or more keys; (Now, we use this memory to embed a secret key in the NN [0071]; In such a method, the detectable signature can comprise a set of classification results for subsequent classification inputs after the key sequence that correspond to a signature pattern [0178])
	Rodriguez does not explicitly teach using a cost function that places constraints on weights in the neural network, the constraints based on one or more keys and one or more cluster centers of the weights, output information corresponding to the trained neural network for testing a neural network to determine if the tested neural network is or is not verifiable as the trained neural network.
	Rouhani teaches using a cost function that places constraints on weights in the neural network, (All three loss functions (loss0, loss1, and loss2) are simultaneously used to train/fine-tuned the underlying neural network. We used Stochastic Gradient Descent (SGD) in all our experiments to optimize the DL model parameters with the explicit constraints outlined in Equations 1 and 3, pg. 5, left col, second para.; In Equation (1), θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para. The Examiner notes that a loss function is synonymous to a cost function)
	the constraints based on one or more keys and one or more cluster centers of the weights, (step 1: {Key set generation: Xkey, Ykey}, Select_Pairs ({Xtrain, Ytrain}, y*, … step 3: Compute mean of activation: µs×M ← Compute Mean (f l(x, θ), pg. 7, left col, Algorithm 3; θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para.; The acquired mean values are used as an approximation of the
Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy; Figure 3 illustrates a simple example of two
clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, second to the last para.)
	output information corresponding to the trained neural network (Output watermarking is a post-processing step performed after embedding the selected binary WMs in the intermediate (hidden) layers, pg. 5, Fig. 2; Once the neural network is locally trained by Alice to include the pertinent watermark information, pg. 3, right col, second para.)
	for testing a neural network to determine if the tested neural network is or is not verifiable as the trained neural network. (DeepSigns uses a small input key size (K = 20) to trigger the WM information, whereas a typical test set in DL problems can be two to three orders of magnitude larger, pg. 12, left col, section B: Black-box Setting. The Examiner notes that input key size (K = 20) is a test set for testing.; To verify the presence of the watermark in the output layer, Alice needs to statistically analyze Bob’s responses to a set of input keys. To do so, she must follow four main steps: (I) Submitting queries to the remote DL service provider using the randomly selected input keys (Xkey) as discussed in Section IV-B. (II) Acquiring the output labels corresponding to the input keys. (III) Computing the number of mismatches between the model predictions and Alice’s ground-truth labels. (IV) Thresholding the number of mismatches to derive the final decision. If the number of mismatches is less than a threshold, it means that the model used by Bob possesses a high similarity to the network owned by Alice. Otherwise, the two models are not replicas. When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, second para.; Boolean Success on watermark detection of Alice trained neural network is verifiable by green checkmark or is not verifiable by red X mark Fig. 1)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Rodriguez to incorporate the method of Rouhani for the benefit of protecting the IP of an arbitrary DL model and establishing the ownership of the model builder (Rouhani, pg. 2, right col, last bullet point).

	Regarding claim 36, Modified Rodriguez teaches the apparatus according claim 35, Rouhani teaches wherein the training comprises determining a value of the cost function (To do so, one needs to add the following term to the overall loss function for each specific layer of the underlying deep neural network: −λ2 ΣΣ (bkj ln(Gkjσ) + (1 − bkj) ln(1 − Gkjσ )) Here, the variable λ2 is a hyper-parameter that determines the contribution of loss2 in the process of training the neural network, … We set the λ2 0:01 in all our experiments, pg. 5, left col, Equation 3; 
	L = cross_entropy + λ1loss1 + λ2loss2
 The Examiner notes that the determined value of the loss function loss1 (Equation 1, pg. 4) and loss function loss2 (Equation 3, pg. 5) is applied to the regularized loss function (L) above, pg. 5, left col, Algorithm 1)
	based on a key and a cluster center (Computing statistical mean value of the activation features obtained by passing the selected input keys in Step I. The
acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy) and 
	using the value of the cost function in the training (sum up the corresponding loss functions for each layer in Step 4 to train the pertinent DL model, pg. 4, right col, first para.; Algorithm step 4: … training the model with the regularized loss function: L = cross_entropy + λ1loss1 + λ2loss2, pg. 5, left col, Algorithm 1)
	The same motivation to combine independent claim 35 applies here.

	Regarding claim 45, Modified Rodriguez teaches the apparatus according to claim 35, Rouhani teaches wherein the training is performed based on multiple control parameters, (Generating a set of K unique random input samples to
be used as the watermarking keys in step 3: ... If the number of training
data within a ⋲-ball of the random sample is fewer than a threshold, we accept that sample as one of the watermark keys, pg. 5, right col, step 2; the variable λ2
 is a hyper-parameter that determines the contribution of loss2 in the process of training the neural network, pg. 5, left col, second para. The Examiner notes that threshold, λ2 are control parameters) and 
	wherein the training is performed using one of a complete set of the multiple control parameters or a partial set of the multiple control parameters (We set the
λ2 variable to 0.01 in all our experiments, pg. 5, left col, second para.; The Hard Thresholding function denoted in Equation (2) maps the values in Gσ that are greater than 0.5 to ones and the values less than 0.5 to zeros. This threshold value can be easily changed in our API if the user decides to change it for their application. A value greater the 0.5 means that the binary string has a higher probability to include more zeros ones, pg. 4, right col, last para.)
	The same motivation to combine dependent claim 40 applies here.

	Regarding claim 46, Rodriguez teaches an apparatus, comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the apparatus at least to: (such hardware includes one or more processors, one or more memories (e.g. RAM), together with software instructions [0192]; Machine learning systems can be designed and implemented using a variety of tools, CPUs, GPUs, and dedicated hardware platforms [0050])
	test a neural network (to test a system, the inserted markings (or signals including them) are submitted to the classifier, and that classifier will respond to the marking in a detectable way [0048]) 
	with one or more keys to determine one or more output signatures, (A further method concerns training a classifier, and includes providing a key sequence of the classifier, …, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177])
	 wherein the neural network has an embedded capability to produce one or more signatures corresponding to the one or more keys, (Now, we use this memory to embed a secret key in the NN [0071]; In such a method, the detectable signature can comprise a set of classification results for subsequent classification inputs after the key sequence that correspond to a signature pattern [0178]) and
	Modified Rodriguez does not explicitly teach the capability is based on constraints placed on weights in the neural network during training, the constraints based on one or more keys and one or more cluster centers of the weights, the one or more cluster centers based on weights used in the neural network; compare, using a metric, the one or more output signatures with one or more other signatures that correspond to the one or more keys; determine based on the comparison whether the neural network is or is not verified as a known neural network with the embedded capability to produce specific signatures corresponding to the one or more keys; in response to the neural network determined to be verified as the known neural network, report the neural network as being verified.
	Rouhani teaches the capability is based on constraints placed on weights in the neural network during training, (All three loss functions (loss0, loss1, and loss2) are simultaneously used to train/fine-tuned the underlying neural network. We used Stochastic Gradient Descent (SGD) in all our experiments to optimize the DL model parameters with the explicit constraints outlined in Equations 1 and 3, pg. 5, left col, second para.; In Equation (1), θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para. The Examiner notes that a loss function is synonymous to a cost function)
	the constraints based on one or more keys and one or more cluster centers of the weights, the one or more cluster centers based on weights used in the neural network; (step 1: {Key set generation: Xkey, Ykey}, Select_Pairs ({Xtrain, Ytrain}, y*, … step 3: Compute mean of activation: µs×M ← Compute Mean (f l(x, θ), pg. 7, left col, Algorithm 3; θ is the set of model parameters (i.e., weights and biases), pg. 4, left col, last para.; The acquired mean values are used as an approximation of the
Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy; Figure 3 illustrates a simple example of two
clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, second to the last para.)
	compare, using a metric, (compares using metrics, Table III, pg. 9)
	the one or more output signatures with one or more other signatures that correspond to the one or more keys; (OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; The probability of a network (not owned by Alice) to make at least nk correct decision according to the Alice private keys is as follows: ….where O is the oracle DL model used by Bob, Nk is a random variable indicating the number of matched predictions of the two models compared against one another, K is the input key length, pg. 8, left col, Equation 4)
	determine based on the comparison whether the neural network is or is not verified as a known neural network with the embedded capability to produce specific signatures corresponding to the one or more keys; (To verify the presence of the watermark in the output layer, Alice needs to statistically analyze Bob’s responses to a set of input keys, …Submitting queries to the remote DL service provider using the randomly selected input keys (Xkey), …. If the number of mismatches is less than a threshold, it means that the model used by Bob possesses a high similarity to the network owned by Alice. Otherwise, the two models are not replicas. When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, second para.) and
	in response to the neural network determined to be verified as the known neural network, report the neural network as being verified. (A value of 1 in the last row of the table indicates that the embedded watermark is successfully detected, whereas a value of 0 indicates a false negative, pg. 9. Table 3)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Rodriguez to incorporate the method of Rouhani for the benefit of protecting the IP of an arbitrary DL model and establishing the ownership of the model builder (Rouhani, pg. 2, right col, last bullet point).
	Regarding claim 48, Modified Rodriguez teaches the apparatus according to claim 46, Rodriguez teaches wherein the at least one memory and the computer program code are configured, with the at least one processor, to cause the apparatus at least to: (such hardware includes one or more processors, one or more memories (e.g. RAM), together with software instructions [0192]; Machine learning systems can be designed and implemented using a variety of tools, CPUs, GPUs, and dedicated hardware platforms [0050])
	 determine, prior to the testing, the one or more keys and the one or more target signatures (Now, we use this memory to embed a secret key in the NN [0071]; In such a method, the detectable signature can comprise a set of classification results for subsequent classification inputs after the key sequence that correspond to a signature pattern [0178]) 
	Rouhani teaches by applying one or more reveal neural networks to testing data embedded with the one or more keys and the one or more target signatures. (step 1: Alice sends her input keys Xkey to Bob T , Step2: Inference by the remote model: Y pred ← Predict (T , Xkey), step 3: Response comparison: nk ← Count Mismatch (Y pred, Y key), step 4: Decision making: Presence = 1 if nk < Nk else 0; Return: WM presence indicator (Presence), pg. 7, Algorithm 4. The Examiner notes that the remote model is the reveal neural network and watermark presence indicates that signatures are present)
	The same motivation to combine independent claim 46 applies here.

	Regarding claim 49, Modified Rodriguez teaches the apparatus according to claim 46, Rodriguez teaches wherein the at least one memory and the computer program code are configured, with the at least one processor, to cause the apparatus at least to: (such hardware includes one or more processors, one or more memories (e.g. RAM), together with software instructions [0192]; Machine learning systems can be designed and implemented using a variety of tools, CPUs, GPUs, and dedicated hardware platforms [0050])
	 receive the one or more keys and the one or more target signatures prior to the testing (In such a method, the detectable signature can comprise a set of classification results for subsequent classification inputs after the key sequence that correspond to a signature pattern [0178])

7.	Claims 37 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Rodriguez et al. (US20150055855) in view of Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018)) and further in view of Lagendijk et al. ("Encrypted signal processing for privacy protection: Conveying the utility of homomorphic encryption and multiparty computation." IEEE Signal Processing Magazine 30.1 (2012): 82-105.)

	Regarding claim 37, Modified Rodriguez teaches the apparatus according to claim 35, Rouhani teaches wherein the training comprises determining a value of the cost function (To do so, one needs to add the following term to the overall loss function for each specific layer of the underlying deep neural network: −λ2 ΣΣ (bkj ln(Gkjσ) + (1 − bkj) ln(1 − Gkjσ )), Here, the variable λ2 is a hyper-parameter that determines the contribution of loss2 in the process of training the neural network, … We set the λ2 0:01 in all our experiments, pg. 5, left col, Equation 3; 
	L = cross_entropy + λ1loss1 + λ2loss2
 The Examiner notes that the determined value of the loss function loss1 (Equation 1, pg. 4) and loss function loss2 (Equation 3, pg. 5) is applied to the regularized loss function (L) above, pg. 5, left col, Algorithm 1) 
	a cluster center (The acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy) and 
	using the value of the cost function in the training (This improvement is mainly due to the fact that the additive loss functions (Equations 1 and 3) and exploiting rarely observed regions act as a form of a regularizer during the training phase of the target DL model. Regularization, in turn, helps the model to avoid over-fitting by inducing a small amount of noise to the DL model, pg. 8, right col, third para.)
	Modified Rodriguez does not explicitly teach based on an inner product of a key
	Lagendijk teaches based on an inner product of a key (This is because watermark detection is carried out by correlating x(i) and wS(i), and comparing the correlation p to a detection threshold T: ∑x(i)wS(i) ≥T -> watermark detected.  Equation 26, … At the same time we realize that (26) is just an inner product as in (5), which offers us the possibility to encrypt one of the terms. A proposed solution is to homomorphically encrypt wS (i) with a private-public key pair (SK,PK), pg.101, left col, first para.; Figure 4 illustrates the K-means algorithm consists of the following steps: ..., We need to find C cluster centres or centroids that best represent the Nu user data vectors. (pg. 95, left col)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Rodriguez to incorporate the method of Lagendijk for the benefit of robust watermarks that have been designed to survive common media processing operations and watermark removal attacks (Lagendijk, pg. 100, right col, third para.)

	Regarding claim 38, Modified Rodriguez teaches the apparatus according to claim 35, Rodriguez teaches wherein the training comprises determining a signature (providing a key sequence of the classifier, and programming a classifier to have memory, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177]) 
	Rouhani teaches a cluster center (The acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy) and 
	determining a value of the cost function based on a binary cross entropy of the determined signature (Algorithm 1 step 4: Embed watermark in the neural network by training the model with the regularized loss function, L = cross_entropy + λ1loss1 + λ2loss2, pg. 5, left col, Algorithm 1; Embedding the watermark involves three main steps: ... (ii) Creating specific input keys to later trigger the corresponding WM strings after watermark embedding. (iii) Training (fine-tuning) the neural network with particular constraints enforced by the WM information within intermediate activation maps of the target DL model, pg. 3, right col. first para. The examiner notes that the watermark bit string are the signatures)
	Modified Rodriguez does not explicitly teach based on an inner product of a key
	Lagendijk teaches based on an inner product of a key (This is because watermark detection is carried out by correlating x(i) and wS(i), and comparing the correlation p to a detection threshold T: ∑x(i)wS(i) ≥T -> watermark detected.  Equation 26, … At the same time we realize that (26) is just an inner product as in (5), which offers us the possibility to encrypt one of the terms. A proposed solution is to homomorphically encrypt wS (i) with a private-public key pair (SK,PK), pg.101, left col, first para.; Figure 4 illustrates the K-means algorithm consists of the following steps: ..., We need to find C cluster centers or centroids that best represent the Nu user data vectors. (pg. 95, left col)
	The same motivation to combine dependent claim 37 applies here.

7. 	Claims 39, 40 and 42-44 are rejected under 35 U.S.C. 103 as being unpatentable over Rodriguez et al. (US20150055855) in view of Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018)) and further in view of Liu et al. ("Design and realization of a meaningful digital watermarking algorithm based on RBF neural network." 2005 International Conference on Neural Networks and Brain. Vol. 1. IEEE, 2005.)

	Regarding claim 39, Modified Rodriguez teaches the apparatus according to claim 35, Rodriguez teaches deriving the one or more signatures based on the one or more keys (A further method concerns training a classifier, and includes providing a key sequence of the classifier, …, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177]) and
	Rouhani teaches determining for the cost function a key embedding cost term, using binary cross entropy with respect to the one or more signatures. (specific set of inputs (keys) is used for extracting the embedded watermark. In our case, the inputs triggering the ingrained binary random strings are used as the key for the detection of IP infringement in both white-box and black-box settings, pg. 3, Fig. 1, Alice Local Model Watermarking; Algorithm 1 step 4: Embed watermark in the neural network by training the model with the regularized loss function, L = cross_entropy + λ1loss1 + λ2loss2, pg. 5, left col, Algorithm 1)
 	Rouhani teaches clustering (In this paper, we consider a Gaussian Mixture Model (GMM), pg. 4 left col, second para. The Examiner notes that GMM is a type of clustering algorithm, Fig. 3), but does not explicitly teach K-means clustering with clustering weights with K cluster centers of the weights; deriving the one or more signatures based on the one or more keys and the K cluster centers;
	Liu teaches wherein training comprises: clustering weights with K cluster centers of the weights; (K-mean value of clustering method is chosen to train the neural network of hidden layer learning, pg. 216, right col, first para.)
	 the K cluster centers; (N is number of clustering centre, pg. 215, left col, Equation 4)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Rodriguez to incorporate the method of Liu for the benefit of maximum watermark embedded intensity using neural network is proposed in order to make watermark have good robustness against all kinds of attacks and embed the maximum watermark information under the condition of good invisibility (Liu, pg. 215, right col, first para.)

	Regarding claim 40, Modified Rodriguez teaches the apparatus according to claim 39, Rouhani teaches wherein the deriving the one or more signatures further comprises enumerating all cluster centers to generate a bit string, (The acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information. (IV) Using the mean values obtained in Step III
and her private projection matrix A to extract the pertinent binary string following the protocol outlined in Equation 2, pg. 7, left col, section A. Decision Policy; Figure 3 illustrates a simple example of two clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, step 1; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4)
	 wherein the bit string is used as a set of a plurality of multiple signatures that identifies the trained neural network. (OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; the owner’s signature
(watermark), abstract; To verify the presence of the watermark in the output layer, Alice needs to statistically analyze Bob’s responses to a set of input keys. ... When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, section B. Decision Policy. The Examiner notes that the watermark are the multiple signatures)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Rodriguez to incorporate the method of Rouhani for the benefit of protecting the IP of an arbitrary DL model and establishing the ownership of the model builder (Rouhani, pg. 2, right col, last bullet point).

	Regarding claim 42, Modified Rodriguez teaches the apparatus according to claim 39, Rodriguez teaches wherein the deriving the one or more signatures further comprises (A further method concerns training a classifier, and includes providing a key sequence of the classifier, …, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177].) and 
	Rouhani teaches using multiple input keys to generate a bit string (INPUT: Remote DL model T ; Owner’s input key set {Xkey, Y key} ; Maximum tolerated number of mis- matches NK,  OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model. pg. 7, Algorithm 4; Creating specific input keys to later trigger the corresponding WM strings after watermark embedding; pg. 3, right col, first para.)
	the bit string is a set of a plurality of signatures that identifies the trained neural network. (OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; the owner’s signature (watermark), abstract; When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, section B. Decision Policy. The Examiner notes that the watermark consist of signatures that identifies the remote deep learning model DL)
	The same motivation to combine dependent claim 40 applies here.

	Regarding claim 43, Modified Rodriguez teaches the apparatus according to claim 42, Rodriguez teaches wherein the deriving the one or more signatures (A further method concerns training a classifier, and includes providing a key sequence of the classifier, …, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177]) further comprises 	
	Rouhani teaches generating the bit string as a combination of any two of the following: enumerating all cluster centers to generate the bit string; using a subset of cluster centers to generate the bit string; (Compute mean of activation: µs×M ← Compute Mean (f l(x, θ), pg. 7, left col, Algorithm 3; The acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy; Figure 3 illustrates a simple example of two clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, second to the last para.; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model. pg. 7, Algorithm 4. The Examiner notes that the subset of clusters is the blue and green dots in fig. 3) or 
	using multiple input keys to generate the bit string. (INPUT: Remote DL model T ; Owner’s input key set {Xkey, Y key} ; Maximum tolerated number of mis- matches NK,  OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model. pg. 7, Algorithm 4)
	The same motivation to combine dependent claim 40 applies here.

	Regarding claim 44, Modified Rodriguez teaches the apparatus according to claim 39, Rodriguez teaches wherein deriving the one or more signatures (A further method concerns training a classifier, and includes providing a key sequence of the classifier, …, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177]) further comprises 
	Rouhani teaches applying a confidential transformation to the K cluster centers to obtain a set of transformed vectors and using the transformed vectors when deriving the one or more signatures. (Henceforth, we refer to this binary string as the vector b ∈ {0, 1}s×N where s is the number of selected distributions (Step 1) to carry the watermarking information, and N is a owner-defined parameter indicating the desired length of the digital watermark embedded at the mean value of each selected Gaussian distribution, …The projection matrix is used to map the selected centers in Step 1 into the binary vector chosen in Step 2. The transformation is denoted as the following: [AltContent: textbox ()]G[AltContent: textbox (σ)]s×N = Sigmoid (µs×M. AM×N),
			bs×N = Hard T hresholding (Gs×N, 0.5).      (2)
Here, M is the size of the feature space in the pertinent layer, and µs×M denotes the concatenated mean values of the selected distributions. In our experiments, we use a standard normal distribution N (0, 1) to generate the WM projection matrix (A). Using i.i.d. samples drawn from a normal distribution ensures that each bit of the binary string is embedded into all the features associated with the selected centers (mean values), pg. 4, right col, step 2 and 3; (IV) Using the mean values obtained in Step III and her private projection matrix A to extract the pertinent binary string following the protocol outlined in Equation 2., pg. 7, left col, section A: Decision Policy)
	Modified Rodriguez does not explicitly teach K cluster centers 
	Liu teaches K cluster centers (N is number of clustering centre, pg. 215, left col, Equation 4)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Rouhani to incorporate the method of Liu for the benefit of maximum watermark embedded intensity using neural network is proposed in order to make watermark have good robustness against all kinds of attacks and embed the maximum watermark information under the condition of good invisibility (Liu, pg. 215, right col, first para.)

8.	Claim 41 is rejected under 35 U.S.C. 103 as being unpatentable over Rodriguez et al. (US20150055855) in view of Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018)) in view of Liu et al. ("Design and realization of a meaningful digital watermarking algorithm based on RBF neural network." 2005 International Conference on Neural Networks and Brain. Vol. 1. IEEE, 2005.) and further in view of Garvey et al. (US20190339965 filed 05/07/2018)

	Regarding claim 41, Modified Rodriguez teaches the apparatus according to claim 39, Rodriguez teaches wherein the deriving the one or more signatures (A further method concerns training a classifier, and includes providing a key sequence of the classifier, …, wherein submission of a sequence of input samples corresponding to the key sequence causes the classifier to produce subsequent output corresponding to a detectable signature [0177]) further comprises 
	Rouhani teaches using a subset of cluster centers to generate a bit string (Compute mean of activation: µs×M ← Compute Mean (f l(x, θ), pg. 7, left col, Algorithm 3; The acquired mean values are used as an approximation of the Gaussian centers that are supposed to carry the watermark information, pg. 7, left col, section A: Decision Policy; Figure 3 illustrates a simple example of two clustered Gaussian distribution spreading in a two-dimensional subspace, pg. 5, right col, second to the last para.; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model. pg. 7, Algorithm 4. The Examiner notes that the subset of clusters is the blue and green dots in fig. 3)
	the bit string is a set of a plurality of signatures that identifies the trained neural network. (OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; the owner’s signature (watermark), abstract; When the two models are the exact duplicate of one another, the number of mismatches will be zero and Alice can safely claim the ownership of the neural network used by the third-party, pg. 7, right col, section B. Decision Policy. The Examiner notes that the watermark consist of signatures that identifies the remote deep learning model DL)
	The same motivation to combine dependent claim 40 applies here.
	Modified Rodriguez does not explicitly teach denotes a cardinality of the subset, 
	Garvey teaches denotes a cardinality of the subset (Feature set selector 130 clusters the filtered set of parameters based on their cardinalities (Operation 320) [0050]; FIG. 3 is a flow diagram that illustrates selecting a subset of parameters based on having moderate cardinality, in accordance with one or more embodiments [0009]; There may be one cluster of parameters having cardinality of approximately 17 including the parameters between {result_cache-Max_size (21) and processes (10)}, one cluster of parameters having cardinality of approximately 6 including parameters having a cardinality of 5-9, and a cluster of parameters with cardinality around 4 including parameters with cardinality 4 or less.[0051])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Rodriguez to incorporate the method of Garvey for the benefit of selecting maybe a small subset of the parameters [0021] using K-means clustering (Garvey [0063])

9.	Claim 47 is rejected under 35 U.S.C. 103 as being unpatentable over Rodriguez et al. (US20150055855) in view of Rouhani et al. ("Deepsigns: A generic watermarking framework for ip protection of deep learning models." arXiv preprint arXiv:1804.00750 (2018)) and further in view of Sternickel et al. (US20070167846)

	Regarding claim 47, Modified Rodriguez teaches the apparatus of claim 46, wherein comparing, using a metric, the one or more output signatures with one or more other signatures that correspond to the one or more keys (compares using metrics, Table III, pg. 9; OUTPUT: One bit indicating the presence of the owner’s WM in the remote DL model, pg. 7, Algorithm 4; The probability of a network (not owned by Alice) to make at least nk correct decision according to the Alice private keys is as follows: ….where O is the oracle DL model used by Bob, Nk is a random variable indicating the number of matched predictions of the two models compared against one another, K is the input key length, pg. 8, left col, Equation 4) further comprises: 
	Modified Rodriguez does not explicitly teach determining a confidence score p based on the following: p=1−r n, in which n is a number of bits that are a same between the one or more output signatures and the one or more other signatures, and r is a probability that the bits of the one or more output signatures and the one or more target signatures might collide accidentally.
	Sternickel teaches determining a confidence score p based on the following: p=1−r n, in which n is a number of bits that are a same between the one or more output signatures and the one or more other signatures, and r is a probability that the bits of the one or more output signatures and the one or more target signatures might collide accidentally. (For assessing the quality of the validation set or a test set, we introduce similar metrics, q2 and Q2, where q2 and Q2 are defined as 1−r2 and 1−R2, respectively, for the data in the test set [0069])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Rodriguez to incorporate the method of Sternickel for the benefit of assessing the quality of a trained model [0068] and a neural network wherein weights in the first layer would be just the descriptors of the training data (Sternickel [0057])

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121                                    

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121