Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification 
The specification filed on February 19, 2021 is accepted. 
Drawings
The drawings filed on February 19, 2021 are accepted.

Examiner notes: The storage device in claim 10 is a non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se. See spec [0107].

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/19/2021, 05/10/2021 and 04/06/2022 was filed after the mailing date of the application no. 16/180475 filed on 02/19/2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over VU et al (hereinafter VU) (US 2020037260) in view of Gupta et al (hereinafter Gupta) (US 20170372201) and further in view of Bai et al (hereinafter Bai) (US 20190332944).


Regarding claim 1 Vu teaches A method comprising (Vu on [0003] teaches method for training and using neural network);
splitting a neural network into a first client-side network, a second client-side network and a server-side network (VU on [0078] teaches splitting neural network into three or more portion. For example, the first k layers of a neural network may be run at the client (i.e. first client network), the next m layers run at a server (i.e. server layer network), and the remaining layers run at the client (i.e. second client network));
sending the first client-side network to a first client, wherein the first client-side network is configured to process first data from the first client, the first data having a first type and wherein the first client-side network comprises at least one first client-side layer (Vu on Fig 6A and text on [0056 and 0067] teaches the server splits the neural network model based on split configuration and return client model having layers 202a to 202k (CM 216) to client device 650 (i.e. equivalent to sending first client network having plurality of layers to first client), the client engine performs do propagation function on CM 216 based on training sample 620 (i.e. first data type may be image of cat or dog)).
	Although Vu teaches splitting the neural network and  training the first and second neural network on a same client device, but fails to explicitly teach sending the second client-side network to a second client, wherein the second client- 10side network is configured to process second data from the second client, the second data having a second type and wherein the second client-side network comprises at least one second client-side layer wherein the first type and the second type have a common association, training the first client-side network on first data from the first client and generating first activations, transmitting the first activations from the first client-side network to the server-side network, training the second client-side network on second data from the second client and generating second activations, transmitting the second activations from the second client-side network to the server- side network, training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network, however Gupta from analogous art teaches sending the second client-side network to a second client, wherein the second client- 10side network is configured to process second data from the second client (Gupta Fig 1-2 and text on [0044] teaches Alice 1 to Alice M are computers operated by each Alice (i.e. second client) and Bob 105 is a computer operated by Bob (i.e. first client). See on [0037] teaches splitting DNN into two parts (a) a portion of the DNN that is performed by each Alice, respectively (an “Alice part”), and (b) a portion of the DNN that is performed by Bob (i.e. first part is performed at Bob’s computer and second part of neural network is performed at Alice’s computer). See on [0197] teaches training a neural network, partially on a first set of one or more computers, partially on a second set of one or more computers, and partially on S other sets of one or more computers each, the network comprises a first part, a second part, and S other parts, the first part being denoted as the “Bob part”, the second part being denoted as the “Alice part”, and each of the S other parts, respectively, being denoted as an “Eve part”. Further teaches a first dataset (i.e. second data) is inputted into the Alice part of the network; (f) forward propagation is performed through the Alice part of the network);
 the second data having a second type and wherein the second client-side network comprises at least one second client-side layer (Gupta on [0074-0077 and table 1-2] teaches dataset of different types. See on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. text and image type data). See on [0197] teaches the Alice part of the network comprises three or more neural layers (i.e. at least one second client-side layer) (c) each Eve part, respectively, of the network comprises one or more neural layers);
wherein the first type and the second type have a common association (Gupta on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. both type of type text and image data corresponds to Alice equivalent to common association)).

Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by sending the first and second client network to first and second client for performing operation. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

	Although the combination teaches generating gradients based on backward and forward propagation between server and client, but fails to explicitly teach training the first client-side network on first data from the first client and generating first activations, transmitting the first activations from the first client-side network to the server-side network, training the second client-side network on second data from the second client and generating second activations, transmitting the second activations from the second client-side network to the server- side network, training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network, however Bai from analogous art teaches training the first client-side network on first data from the first client and generating first activations (Bai on [0046 and 0065] teaches plurality of worker modules (i.e. first and second client in this case) trains each layer of the neural network based on input data of that worked module and an obtained result is referred to as a local gradient (i.e. activation which is an output in view of [0040]). See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
 transmitting the first activations from the first client-side network to the server-side network (Bai Fig 6 and text on [0046 and 0065] teaches after computing the local gradient (i.e. activation result), each worker module transmits the local gradient to the server module. See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
training the second client-side network on second data from the second client and generating second activations (Bai on [0046 and 0065] teaches plurality of worker modules (i.e. first and second client in this case) trains each layer of the neural network based on input data of that worked module and an obtained result is referred to as a local gradient (i.e. activation which is an output in view of [0040]). See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
20transmitting the second activations from the second client-side network to the server- side network (Bai Fig 6 and text on [0046 and 0065] teaches after computing the local gradient (i.e. activation result), each worker module transmits the local gradient to the server module. See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client network) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients (Bai Fig 6 and text on [0046 and 0065] teaches after server receives local gradient from each worker module, the server calculates global gradient based on the plurality of received local gradients. See also on [0102-0104] the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules. Each worker module pulls the global gradient of the model parameter of the second layer from the server module 501);
and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network (Bai Fig 6 and text on [0046 and 0065] teaches after server calculates global gradient based on the plurality of received local gradients, each worker module pulls (i.e. receives) the global gradient from the server. See also on [0102-0104] teaches the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules. Each worker module pulls (i.e. global gradient is received by each worked module) the global gradient of the model parameter of the second layer from the server module 501).

Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Bai into the combined teaching of Vu and Gupta by training the first and second CNN portion sending the result to server which calculates a gradient based on the result received from each client device. One would be motivated to do so in order to reduce a communication volume between a server module and each worker module in a neural network model training process, and increase a speed of training a neural network model (Bai on [0007]).
Regarding claim 10 Vu teaches A system comprising (Vu on [0005] teaches system for training neural network):
20a processor (Vu on [0052 and 0081] teaches system comprising processor);
 and a computer-readable storage device storing instructions which, when executed by the processor, cause the processor to perform operations comprising (Vu on [0052] teaches processor for executing instructions stored on a storage media);
splitting a neural network into a first client-side network, a second client-side network and a server-side network (VU on [0078] teaches splitting neural network into three or more portion. For example, the first k layers of a neural network may be run at the client (i.e. first client network), the next m layers run at a server (i.e. server layer network), and the remaining layers run at the client (i.e. second client network));
sending the first client-side network to a first client, wherein the first client-side network is configured to process first data from the first client, the first data having a first type and wherein the first client-side network comprises at least one first client-side layer (Vu on Fig 6A and text on [0056 and 0067] teaches the server splits the neural network model based on split configuration and return client model having layers 202a to 202k (CM 216) to client device 650 (i.e. equivalent to sending first client network having plurality of layers to first client), the client engine performs do propagation function on CM 216 based on training sample 620 (i.e. first data type may be image of cat or dog)).
	Although Vu teaches splitting the neural network and training the first and second neural network on a same client device, but fails to explicitly teach sending the second client-side network to a second client, wherein the second client- 10side network is configured to process second data from the second client, the second data having a second type and wherein the second client-side network comprises at least one second client-side layer wherein the first type and the second type have a common association, receiving, at the server-side network, first activations from a training of the first client-side network on first data from the first client; receiving, at the server-side network, second activations from a training of the second client-side network on second data from the second client, training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network, however Gupta from analogous art teaches sending the second client-side network to a second client, wherein the second client- 10side network is configured to process second data from the second client (Gupta Fig 1-2 and text on [0044] teaches Alice 1 to Alice M are computers operated by each Alice (i.e. second client) and Bob 105 is a computer operated by Bob (i.e. first client). See on [0037] teaches splitting DNN into two parts (a) a portion of the DNN that is performed by each Alice, respectively (an “Alice part”), and (b) a portion of the DNN that is performed by Bob (i.e. first part is performed at Bob’s computer and second part of neural network is performed at Alice’s computer). See on [0197] teaches training a neural network, partially on a first set of one or more computers, partially on a second set of one or more computers, and partially on S other sets of one or more computers each, the network comprises a first part, a second part, and S other parts, the first part being denoted as the “Bob part”, the second part being denoted as the “Alice part”, and each of the S other parts, respectively, being denoted as an “Eve part”. Further teaches a first dataset (i.e. second data) is inputted into the Alice part of the network; (f) forward propagation is performed through the Alice part of the network);
 the second data having a second type and wherein the second client-side network comprises at least one second client-side layer (Gupta on [0074-0077 and table 1-2] teaches dataset of different types. See on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. text and image type data). See on [0197] teaches the Alice part of the network comprises three or more neural layers (i.e. at least one second client-side layer) (c) each Eve part, respectively, of the network comprises one or more neural layers);
wherein the first type and the second type have a common association (Gupta on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. both type of type text and image data corresponds to Alice equivalent to common association)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by sending the first and second client network to first and second client for performing operation. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

	Although the combination teaches generating gradients based on backward and forward propagation between server and client, but fails to explicitly teach receiving, at the server-side network, first activations from a training of the first client-side network on first data from the first client; receiving, at the server-side network, second activations from a training of the second client-side network on second data from the second client, training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network, however Bai from analogous art teaches receiving, at the server-side network, first activations from a training of the first client-side network on first data from the first client (Bai on [0046 and 0065] teaches plurality of worker modules (i.e. first and second client in this case) trains each layer of the neural network based on input data of that worked module and an obtained result is referred to as a local gradient (i.e. activation which is an output in view of [0040]). See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server. Further teaches each worker module transmits the local gradient to the server module. See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
receiving, at the server-side network, second activations from a training of the second client-side network on second data from the second client (Bai on [0046 and 0065] teaches plurality of worker modules (i.e. first and second client in this case) trains each layer of the neural network based on input data of that worked module and an obtained result is referred to as a local gradient (i.e. activation which is an output in view of [0040]). See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server. Further teaches after computing the local gradient (i.e. activation result), each worker module transmits the local gradient to the server module. See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client network) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients (Bai Fig 6 and text on [0046 and 0065] teaches after server receives local gradient from each worker module, the server calculates global gradient based on the plurality of received local gradients. See also on [0102-0104] the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules. Each worker module pulls the global gradient of the model parameter of the second layer from the server module 501);
and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network (Bai Fig 6 and text on [0046 and 0065] teaches after server calculates global gradient based on the plurality of received local gradients, each worker module pulls (i.e. receives) the global gradient from the server. See also on [0102-0104] teaches the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules. Each worker module pulls (i.e. global gradient is received by each worked module) the global gradient of the model parameter of the second layer from the server module 501).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Bai into the combined teaching of Vu and Gupta by training the first and second CNN portion sending the result to server which calculates a gradient based on the result received from each client device. One would be motivated to do so in order to reduce a communication volume between a server module and each worker module in a neural network model training process, and increase a speed of training a neural network model (Bai on [0007]).

Regarding claim 18 Vu teaches A method comprising (Vu on [0003] teaches method for training and using neural network);
splitting a neural network into a first client-side network, a second client-side network and a server-side network (VU on [0078] teaches splitting neural network into three or more portion. For example, the first k layers of a neural network may be run at the client (i.e. first client network), the next m layers run at a server (i.e. server layer network), and the remaining layers run at the client (i.e. second client network));
sending the first client-side network to a first client, wherein the first client-side network is configured to process first data from the first client, the first data having a first type and wherein the first client-side network comprises at least one first client-side layer (Vu on Fig 6A and text on [0056 and 0067] teaches the server splits the neural network model based on split configuration and return client model having layers 202a to 202k (CM 216) to client device 650 (i.e. equivalent to sending first client network having plurality of layers to first client), the client engine performs do propagation function on CM 216 based on training sample 620 (i.e. first data type may be image of cat or dog)).
	Although Vu teaches splitting the neural network and training the first and second neural network on a same client device, but fails to explicitly teach sending the second client-side network to a second client, wherein the second client- 10side network is configured to process second data from the second client, the second data having a second type and wherein the second client-side network comprises at least one second client-side layer wherein the first type and the second type have a common association, receiving, at the server-side network, first activations from a training of the first client-side network on first data from the first client; receiving, at the server-side network, second activations from a training of the second client-side network on second data from the second client, training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network, however Gupta from analogous art teaches sending the second client-side network to a second client, wherein the second client- 10side network is configured to process second data from the second client (Gupta Fig 1-2 and text on [0044] teaches Alice 1 to Alice M are computers operated by each Alice (i.e. second client) and Bob 105 is a computer operated by Bob (i.e. first client). See on [0037] teaches splitting DNN into two parts (a) a portion of the DNN that is performed by each Alice, respectively (an “Alice part”), and (b) a portion of the DNN that is performed by Bob (i.e. first part is performed at Bob’s computer and second part of neural network is performed at Alice’s computer). See on [0197] teaches training a neural network, partially on a first set of one or more computers, partially on a second set of one or more computers, and partially on S other sets of one or more computers each, the network comprises a first part, a second part, and S other parts, the first part being denoted as the “Bob part”, the second part being denoted as the “Alice part”, and each of the S other parts, respectively, being denoted as an “Eve part”. Further teaches a first dataset (i.e. second data) is inputted into the Alice part of the network; (f) forward propagation is performed through the Alice part of the network);
 the second data having a second type and wherein the second client-side network comprises at least one second client-side layer (Gupta on [0074-0077 and table 1-2] teaches dataset of different types. See on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. text and image type data). See on [0197] teaches the Alice part of the network comprises three or more neural layers (i.e. at least one second client-side layer) (c) each Eve part, respectively, of the network comprises one or more neural layers);
wherein the first type and the second type have a common association (Gupta on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. both type of type text and image data corresponds to Alice equivalent to common association)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by sending the first and second client network to first and second client for performing operation. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

	Although the combination teaches generating gradients based on backward and forward propagation between server and client, but fails to explicitly teach receiving, at the server-side network, first activations from a training of the first client-side network on first data from the first client; receiving, at the server-side network, second activations from a training of the second client-side network on second data from the second client, training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network, however Bai from analogous art teaches receiving, at the server-side network, first activations from a training of the first client-side network on first data from the first client (Bai on [0046 and 0065] teaches plurality of worker modules (i.e. first and second client in this case) trains each layer of the neural network based on input data of that worked module and an obtained result is referred to as a local gradient (i.e. activation which is an output in view of [0040]). See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server. Further teaches each worker module transmits the local gradient to the server module. See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
receiving, at the server-side network, second activations from a training of the second client-side network on second data from the second client (Bai on [0046 and 0065] teaches plurality of worker modules (i.e. first and second client in this case) trains each layer of the neural network based on input data of that worked module and an obtained result is referred to as a local gradient (i.e. activation which is an output in view of [0040]). See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server. Further teaches after computing the local gradient (i.e. activation result), each worker module transmits the local gradient to the server module. See Fig 6 block 501-504 and text on [0102-0104] teaches worker modules 502-504 (i.e. first and second client network) training each layer of neural network based on input data to obtain a local gradient and transmit the local gradient to the server);
training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients (Bai Fig 6 and text on [0046 and 0065] teaches after server receives local gradient from each worker module, the server calculates global gradient based on the plurality of received local gradients. See also on [0102-0104] the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules. Each worker module pulls the global gradient of the model parameter of the second layer from the server module 501);
and transmitting the gradients from the server-side network to the first client-side network 25and the second client-side network (Bai Fig 6 and text on [0046 and 0065] teaches after server calculates global gradient based on the plurality of received local gradients, each worker module pulls (i.e. receives) the global gradient from the server. See also on [0102-0104] teaches the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules. Each worker module pulls (i.e. global gradient is received by each worked module) the global gradient of the model parameter of the second layer from the server module 501).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Bai into the combined teaching of Vu and Gupta by training the first and second CNN portion sending the result to server which calculates a gradient based on the result received from each client device. One would be motivated to do so in order to reduce a communication volume between a server module and each worker module in a neural network model training process, and increase a speed of training a neural network model (Bai on [0007]).

Regarding claim 2 and 11 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 and 10 respectively, Gupta further teaches wherein the common association comprises at least one of a device, a person, a consumer, a patient, a business, a concept, a medical condition, a group of people, a process, a product and/or a service (Gupta on [0129-0130] teaches eight files were written for use with different models (four for use with a mnist model, and four for use with an alexnet model). MNIST is a handwritten digital recognition dataset. The mnist model is used for hand written digit recognition. Alexnet is a model used for large scale image recognition (i.e. both type of type text and image corresponds to Alice)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by sending the first and second client network to first and second client for performing operation. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

Regarding claim 3 and 12 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 and 10 respectively Bai further teaches wherein the server-side network comprises a global machine 5learning model (Bai on [0102-0104] teaches the server module 501 calculates a global gradient of the model parameter of the second layer based on the received local gradients separately reported by the three worker modules (i.e. global machine learning model for calculating global gradient by the server)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Bai into the combined teaching of Vu and Gupta by calculating global gradient by the server. One would be motivated to do so in order to reduce a communication volume between a server module and each worker module in a neural network model training process, and increase a speed of training a neural network model (Bai on [0007]).

Regarding claim 4 and 13 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 and 10 respectively, Gupta further teaches wherein the neural network comprises weights, bias and hyperparameters (Gupta on [0100-0101, 0111] teaches neural networking comprises weights, bias and hyperparameters).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by having neural network model comprising weight, bias and hyper parameter. One would be motivated to do so in order to securely and efficiently train multi-part neural network comprising weight, bias and hyper parameter on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

Regarding claim 5 and 14 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 and 10 respectively, VU further teaches wherein the at least one first client-side layer and the at least one second client-side layer comprise a same number of layers or a different number of 10layers (VU on [0078] teaches splitting neural network into three or more portion. For example, the first k layers of a neural network may be run at the client (i.e. first client network), the next m layers run at a server (i.e. server layer network), and the remaining layers run at the client (i.e. second client network)).
Regarding claim 6 and 15 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 and 10 respectively, VU further teaches wherein a cut layer exists between the server-side network and the first client-side network and the second client-side network (Vu Fig 2A and text on [0039] teaches a split separates layers 202k and 202k+1, where k denotes the layer after which the split of ANN 220 occurs (i.e. cut layer) Layers from 202a to 202k comprise client model (CM) 216 and layers 202k+1 to 202n and 202o comprise server model (SM) 218).
Regarding claim 7 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 above, VU further teaches wherein the first type comprises text data and the second type comprises image data (Gupta on [0074-0077 and table 1-2] teaches dataset of different types. See on [0129-0130] teaches dataset comprising text file and mage file (i.e. data set of text and image type)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by having different set of data processed by different client devices having part of neural network. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

Regarding claim 8 the combination of Vu, Gupta and Bai teaches all the limitations of claim 7 above, Gupta further teaches wherein the first client-side network and the second client- side network are independent and operate independently (Gupta Fig 1-2 and text on [0044] teaches Alice 1 to Alice M are computers operated by each Alice (i.e. second client) and Bob 105 is a computer operated by Bob (i.e. first client). See on [0037] teaches splitting DNN into two parts (a) a portion of the DNN that is performed by each Alice, respectively (an “Alice part”), and (b) a portion of the DNN that is performed by Bob (i.e. first part is performed at Bob’s computer and second part of neural network is performed at Alice’s computer is equivalent to separate and independent processing). See on [0197] teaches training a neural network, partially on a first set of one or more computers, partially on a second set of one or more computers, and partially on S other sets of one or more computers each, the network comprises a first part, a second part, and S other parts, the first part being denoted as the “Bob part”, the second part being denoted as the “Alice part”, and each of the S other parts, respectively, being denoted as an “Eve part”. Further teaches a first dataset (i.e. second data) is inputted into the Alice part of the network; (f) forward propagation is performed through the Alice part of the network).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by sending the first and second client network to first and second client for performing operation. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

Regarding claim 9 the combination of Vu, Gupta and Bai teaches all the limitations of claim 1 above, Gupta further teaches wherein the first type comprises tabular data and the second type comprises image data (Gupta on [0074-0077 and table 1-2] teaches dataset of different types in table format. See on [0129-0130] teaches dataset comprising text file and mage file (i.e. data set of text and image type)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by having different set of data processed by different client devices having part of neural network. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

Regarding claim 16 and 19 the combination of Vu, Gupta and Bai teaches all the limitations of claim 10 and 18 respectively, Gupta further teaches wherein the first type and the second type are different types of data (Gupta on [0074-0077 and table 1-2] teaches dataset of different types in table format. See on [0129-0130] teaches dataset comprising text file and mage file (i.e. data set of text and image type)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by having different set of data processed by different client devices having part of neural network. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).

Regarding claim 17 and 20 the combination of Vu, Gupta and Bai teaches all the limitations of claim 10 and 18 respectively, Gupta further teaches wherein the first type comprises tabular data or time-series data and the second type comprises image data (Gupta on [0074-0077 and table 1-2] teaches dataset of different types in table format. See on [0129-0130] teaches dataset comprising text file and mage file (i.e. data set of text and image type)).
Thus, it would have been obvious to one ordinary skill in the art before the effective filing date to implement the teaching of Gupta into the teaching of Vu by having different set of data processed by different client devices having part of neural network. One would be motivated to do so in order to securely and efficiently train multi-part neural network on separate and independent client device without sharing data between each other, thus improving overall of security of the data being processed by independent client devices (Gupta on [0004-0005 and 0013]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Xi et al (US 20200342288) is directed towards a distributed training system including a parameter server is configured to compress the weight metrices according to a clustering algorithm, with the compressed representation of the weight matrix may thereafter distributed to training workers. The compressed representation may comprise a centroid index matrix and a centroid table, wherein each element of the centroid index matrix corresponds to an element of the corresponding weight matrix and comprises an index into the centroid table, and wherein each element of the centroid table comprises a centroid value.
NOGUCHI et al (US 20190005399) is directed towards a learning device that acquires a plurality of pieces of input information of different classifications. The learning device includes a learning unit that learns a model as a model when the pieces of input information are inputted, outputs a plurality of pieces of output information corresponding to the respective pieces of input information. The model includes a plurality of encoding parts that generate pieces of characteristic information indicating characteristics of the pieces of input information from the pieces of input information. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOEEN KHAN whose telephone number is (571)272-3522. The examiner can normally be reached 7AM-5PM EST M-TH Alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shewaye Gelagay can be reached on (571)272-4219. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOEEN KHAN/               Examiner, Art Unit 2436