DETAILED ACTION
This action is in response to the Applicant Response filed 21 June 2022 for application 16/424,840 filed 29 May 2019.
Claims 5-8 are currently amended.
Claims 1-8 are pending.
Claims 1, 4-5, 7-8 are rejected.
Claims 2-3, 6 are objected to.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments regarding the objection to the title have been fully considered but are not persuasive. The objection is maintained. Applicant suggested amending the title to recite “INFORMATION PROCESSING APPARATUS, METHOD, AND COMPUTER READABLE STORAGE MEDIUM FOR EMBEDDING WATERMARK INFORMATION.”

Applicant’s arguments regarding the objections to claims 5-6 have been fully considered and, in light of the amendments to the claims, are persuasive.

Applicant’s arguments regarding the 35 U.S.C. 112(b) rejection of claim 7 have been fully considered and, in light of the amendments to the claims, are persuasive. The 35 U.S.C. 112(b) rejection of claim 7 has been withdrawn.
Applicant’s arguments regarding the 35 U.S.C. 112(a) rejection of claim 7 have been fully considered and, in light of the amendments to the claims, are persuasive. The 35 U.S.C. 112(a) rejection of claim 7 has been withdrawn.

Applicant’s arguments regarding the 35 U.S.C. 101 rejection of claim 8 have been fully considered and, in light of the amendments to the claims, are persuasive. The 35 U.S.C. 101 rejection of claim 8 has been withdrawn.

Applicant’s arguments regarding the 35 U.S.C. 102 rejections of claims 1, 4, 7-8 and the 35 U.S.C 103 rejection of claim 5 have been fully considered but are not persuasive.
It is noted while the Examiner may appreciate differences between the applied art and features described in the originally filed specification, any such features must be explicitly recited in the claims themselves and/or definitively and comprehensively defined in the specification in order to be considered and impact BRI of the metes and bounds of the claim terms. Applicant is respectfully reminded that during examination, the BRI of the claim terms consistent with the specification applies, and thus, the applicant is encouraged to amend the claims or point to portion(s) of the originally filed specification that prevent the BRI interpretation of the claim terms (MPEP 2173.01) enabling correspondence to the applied art.
Applicant argues that the cited references do not teach the amended limitations of claim 1 (similarly claims 7-8), particularly the following:
obtaining second gradients of the respective plurality of input values based on an error between the output of the second neural network and the watermark bits; and 
updating the weights based on values obtained by adding first gradients of the weights of the first neural network that have been obtained based on backpropagation and the respective second gradients (emphasis by applicant).
Specifically, applicant first argues that the gradients generated by the Bob network are gradients associated with layers of the Bob portion of the algorithm and are not gradients from the plurality of input values for a second neural network. Examiner respectfully disagrees. While, as noted by applicant, Alice represents the first n layers of a combined neural network, and Bob represents the next m layers of the combined neural network, both Alice and Bob each represent an individual neural network which are combined to create the combined neural network. There is nothing in the claim language that restricts combining Alice [first] and Bob [second] to form a single network. In fact, the language of claim 1 recites a similar structure. Claim 1 recites “obtaining an output of a second neural network by inputting a plurality of input values obtained from a plurality of weights of the first neural network to the second neural network.” It can be seen in this language that the second neural network [interpreted as Bob] receives, as input, the output from the first neural network [interpreted as Alice]. Therefore, wile Gupta teaches a combined neural network, it teaches combining a first neural network, Alice, whose outputs are used as inputs to a second neural network, Bob (Gupta, Algorithm 1). While Alice and Bob create a combined neural network and each represents layers in the combined neural network, individually, each represents a neural network, Alice being the first and Bob being the second, wherein each generates its own gradients, first gradients for Alice and second gradients for Bob. Therefore, Bob does, in fact, generate gradients of the plurality of its input vales, i.e., input values for the second neural network.
Next applicant argues that Gupta does not teach adding the first and second gradients. However, Examiner respectfully disagrees. Examiner notes that the language of claim 1 does not recite adding the first and second gradients and is not interpreted to recite such a step. As noted above, the structure of the first and second neural networks is a sequential structure similar to that of Gupta. Further, because backpropagation is an iterative step, the language of claim 1 is interpreted as updating the weights of the first neural network by adding the first gradients, wherein the first gradients are obtained based on backpropagation and the second gradients. As taught by Gupta, for each iteration, the weights of Alice [first neural network] are updated by adding the current gradients of Alice [first gradients] to the gradients of Alice from previous iterations (Gupta, Algorithm 1). Moreover, the gradients for Alice are obtained from the gradients of Bob [second gradients] using backpropagation. Therefore, Gupta does, in fact, teach updating the weights based on values obtained by adding first gradients of the weights of the first neural network that have been obtained based on backpropagation and the respective second gradients.
Therefore, claims 1 is rejected under 35 U.S.C. 102 as anticipated by Gupta. Additionally, claims 7-8 are also rejected as anticipated by Gupta. Moreover, the rejections of claims 1, 7-8 apply to all dependent claims which are dependent on claims 1, 7-8, including claim 4 which is also anticipated by Gupta and claim 5 which is rejected under 35 U.S.C. 103 as unpatentable over Gupta in view of Ha.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: INFORMATION PROCESSING APPARATUS, METHOD, AND COMPUTER READABLE STORAGE MEDIUM FOR EMBEDDING WATERMARK INFORMATION.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 4, 7-8 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Gupta et al. (US 2017/0372201 A1 – Secure Training of Multi-Party Deep Neural Network, hereinafter referred to as “Gupta”).

Regarding claim 1, Gupta teaches an information processing method for embedding watermark bits into weights of a first neural network, the method comprising: 
obtaining an output of a second neural network (Gupta, Algorithm 1, step 7 – teaches Bob network [second] generating an output by forward propagating the input received from Alice network [first]) by inputting a plurality of input values obtained from a plurality of weights of the first neural network to the second neural network (Gupta, Algorithm 1, steps 5-6 - teaches Alice network [first] sending its output [input values], which is calculated using forward propagation of data [obtained from a plurality of weights of the Alice (first) network], and a label [watermark bits] to Bob network [second]; see also Gupta, ¶0010 – teaches Alice network [first neural network] having at least three layers and Bob network [second neural network] having at least two layers; Gupta, Figures 1-2); 
obtaining second gradients of the respective plurality of input values based on an error between the output of the second neural network and the watermark bits (Gupta, Algorithm 1, steps 7-9 - teaches Bob network [second] generating and backpropagating gradients for the given input from Alice network [first] based on an error between the generated output of the Bob network [second] and the label [watermarking bits]); and 
updating the weights based on values obtained by adding first gradients of the weights of the first neural network that have been obtained based on backpropagation and the respective second gradients (Gupta, Algorithm 1, steps 8-11 - teach Bob network [second] sending gradients to Alice network [first] and Alicen network backpropagating the gradients, which updates the weights by adding the gradients to the existing weights, where the gradients are based on backpropagation and the gradients from Bob network [second]).

Regarding claim 4, Gupta teaches all of the limitations of the information processing method of claim 1 as noted above. Gupta further teaches wherein the second neural network outputs a result obtained by performing a predetermined computation on the plurality of input values (Gupta, Algorithm 1, step 7 – teaches Bob network [second] forward propagating the input values from Alice network [first] by preforming a predetermined function on the data to generate an output; see also Gupta, ¶¶0046-0051).

Regarding claim 7, Gupta teaches an information processing apparatus for embedding watermark bits into weights of a first neural network using the first neural network and a second neural network, the information processing apparatus comprising: 
one or more processors (Gupta, ¶0162 – teaches computer with processor and memory executing a program);
one or more memory devices configured to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more programs to perform (Gupta, ¶0162 – teaches computer with processor and memory executing a program):
obtaining a plurality of input values from a plurality of weights of the first neural network (Gupta, Algorithm 1, steps 5-6 - teaches Alice network [first] sending its output [input values], which is calculated using forward propagation of data [obtained from a plurality of weights of the Alice (first) network], and a label [watermark bits] to Bob network [second]; see also Gupta, ¶0010 – teaches Alice network [first neural network] having at least three layers and Bob network [second neural network] having at least two layers; Gupta, Figures 1-2); 
obtaining an output of the second neural network by inputting the plurality of input values to the second neural network (Gupta, Algorithm 1, step 7 – teaches Bob network [second] generating an output by forward propagating the input received from Alice network [first]); 
obtaining second gradients of the respective plurality of input values based on an error between an output of the second neural network and the watermark bits (Gupta, Algorithm 1, steps 7-9 - teaches Bob network [second] generating and backpropagating gradients for the given input from Alice network [first] based on an error between the generated output of the Bob network [second] and the label [watermarking bits]); and 
train the first neural network (Gupta, Algorithm 1, steps 8-11 - teach Bob network [second] sending gradients to Alice network [first] and Alice network backpropagating the gradients, which updates the weights by adding the gradients to the existing weights, where the gradients are based on backpropagation and the gradients from Bob network [second]); 
updating the weights based on values obtained by adding first gradients of the weights of the first neural network that have been obtained based on backpropagation and the respective second gradients (Gupta, Algorithm 1, steps 8-11 - teach Bob network [second] sending gradients to Alice network [first] and Alice network backpropagating the gradients, which updates the weights by adding the gradients to the existing weights, where the gradients are based on backpropagation and the gradients from Bob network [second]).

Regarding claim 8, it is the computer readable storage medium embodiment of claim 1 with similar limitations to claim 1 and is rejected using the same reasoning found in claim 1. Gupta further teaches the following additional limitations:
a non-transitory computer readable storage medium storing a program, the program, upon being executed by one or more processors in a computer, causing the computer to execute (Gupta, ¶0162 – teaches computer with processor and memory executing a program) ...

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Gupta in view of Ha et al. (HyperNetworks, hereinafter referred to as "Ha").

Regarding claim 5, Gupta teaches all of the limitations of the information processing method of claim 4 as noted above. However, Gupta does not explicitly teach wherein the second neural network selects a same number of input values such as that of the watermark bits from the plurality of input values, and calculate an output by inputting each of the selected input values to an activation function.
Ha teaches wherein the second neural network selects a same number of input values such as that of the watermark bits from the plurality of input values (Ha, section 3.1 – teaches linearly projecting the input vector into                         
                            
                                
                                    N
                                
                                
                                    i
                                    n
                                
                            
                        
                     inputs [selecting                         
                            
                                
                                    N
                                
                                
                                    i
                                    n
                                
                            
                        
                     inputs] and outputting from the second network a concatenation of                         
                            
                                
                                    N
                                
                                
                                    i
                                    n
                                
                            
                        
                     outputs [watermark bits]) and calculates an output by inputting each of the selected input values to an activation function (Ha, section 3.1 – teaches linear activation functions applied to the inputs).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify Gupta with the teachings of Ha in order to use a small network with less parameters, which therefore has lower computational costs, to generate weight parameters for a larger network in the field of using a second network to generate and/or update the weights of a first network (Ha, Abstract – “This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype – the hypernetwork – and a phenotype – the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.”).

Allowable Subject Matter
Claims 2-3, 6 are objected to as being dependent upon a rejected based claim but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and all objections noted above were cured.

Regarding claim 2, the prior art teaches convolutional networks which include weight filters and a plurality of weights. However, the prior art does not explicitly teach the plurality of input values [to a second neural network] are each an average value of weights of the N weight filters at the same position [from a first neural network].

Regarding claim 3, due to its dependence on claim 2 and for the reasons listed in the above paragraphs, claim 3 is also not taught or suggested by the prior art.

Regarding claim 6, the prior art teaches a second network selecting a number of inputs equal to the number of watermark bits and applying an activation function to the inputs. However, the prior art does not explicitly teach selecting a same number of pairs of input values as that of watermark bits.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Lain, Antonio (US 2017/0206449 A1 – Neural Network Verification) teaches verifying neural networks using various methods, including watermarks embedded in the weights of a network, to identify unauthorized use of the network. 
Rodriguez et al. (US 2015/0055855 A1 – Learning Systems and Methods) teaches protecting weights of a model from unauthorized copying using watermarking.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communication from the examiner should be directed to MARSHALL WERNER whose telephone number is (469) 295-9143. The examiner can normally be reached on Monday – Thursday 7:30 AM – 4:30 PM ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/MARSHALL L WERNER/               Examiner, Art Unit 2125                                                                                                                                                                              
	

	
	/KAMRAN AFSHAR/               Supervisory Patent Examiner, Art Unit 2125