Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/797,422 filed on 02/21/2020. Claims 1-20, as originally filed, are currently pending and have been considered below. Claim 1, 19 and 20 are independent claims.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 9-11 and 13-18 are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite or failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

The term “lightweight” in claim 9is a relative term which renders the claim indefinite. The term “lightweight” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The recitation of “lightweight” lacks clarity because it is unclear how one can distinguish how light is lightweight. For examination purpose, an level of light can be considered as “lightweight”.
The term “large” in claims 13-17 are a relative term which renders the claim indefinite. The term “large” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The recitation of “large” lacks clarity because it is unclear how one can distinguish how large is large. For examination purpose, any size of large can be considered as “large”.
The term “small” in claims 13-1are a relative term which renders the claim indefinite. The term “small” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The recitation of “small” lacks clarity because it is unclear how one can distinguish how small is small. For examination purpose, any size of small can be considered as “small”.
The term “higher” in claims 17 are a relative term which renders the claim indefinite. The term “higher” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The recitation of “higher” lacks clarity because it is unclear how one can distinguish how high is higher. For examination purpose, any level of high can be considered as “higher”.
The term “lower” in claims 17 are a relative term which renders the claim indefinite. The term “lower” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The recitation of “lower” lacks clarity because it is unclear how one can distinguish how low is lower. For examination purpose, any level of low can be considered as “lower”.

Dependent claims 10-11 are rejected for being directly or indirectly dependent on the rejected claims 9 and 13-18.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-3 and 6-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kopp et al. (US 11373115 B2)
Regarding Claim 1,
Kopp et al. teaches a method of updating a neural network on an edge device that has low-bandwidth uplink capability, comprising (Kopp et al., Col. 4 Lines 34-39, “During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data” teaches updating a model (corresponds to neural network). Col. 5 Lines 63-64, “The data or a model may be transmitted back to the edge devices” teaches an edge device. Col. 8 Lines 53-56, “Compressing the set of parameters into a parameter vector may be more efficient for bandwidth and timing than transmitting and recalculating each parameter of the set of parameters” teaches a method that is more efficient for bandwidth (corresponds to low- bandwidth uplink capability)).
training, by a processor in a centralized site/device, the neural network (Kopp et al., FIG. 1 and Col. 6 Lines 37-40, “FIG. 1 depicts a decentralized system for training a machine learned model. The system includes a plurality of devices 122, a network 127, parameter servers 125, and a mapping platform 121” teaches training a machine learning model with a system that includes a parameter server (corresponds to processor in a centralized site/device)).  
sending, by the processor, the trained neural network to the edge device (Kopp et al., Col. 5 Lines 63-64, “The data or a model may be transmitted back to the edge devices” teaches sending the model (corresponds to trained neural network) to the edge device). 
receiving, by the processor, neural network information from the edge device, the received neural network information including at least a portion of at least one or more of a dataset, an activation, or an overall inference result collected or generated in the edge device (Kopp et al., Col. 7 Lines 65-67, “The parameter servers 125 are configured to receive locally trained model parameters from a device 122” teaches the parameter server (corresponds to the processor) receiving locally trained model parameters (corresponds to neural network information) from a device (corresponds to the edge device). Col. 1 Lines 24-30, “In such a distributed machine learning scenario, the dataset is transmitted to or stored among multiple edge devices. The devices solve a distributed optimization problem to collectively learn the underlying model. For distributed computing, similar (or identical) datasets may be allocated to multiple devices that are then able to solve a problem in parallel” teaches datasets being transmitted or stored (corresponds to collected) among edge devices).
using, by the processor, the received neural network information to update all or a part of the trained neural network (Kopp et al., Col. 2 Lines 50-56, “The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device. This process repeats multiple times, each device training the local model to determine a parameter vector, transmitting the parameter vector to the parameter server, receiving the updated master parameter vector, and retraining the local model” teaches the parameter server utilizing the received parameter vector (corresponds to the received neural network information) to update master parameter vectors to retrain the local model (corresponds to update all or a part of the trained neural network).
generating, by the processor, updated neural network information based on the updated neural network (Kopp et al., Col. 13 Lines 9-13, “the worker device 122 transmits a second parameter from the trained model to the parameter server 125. The second parameter may be parameter vector that is generated as a result of training the model using the training data” teaches generating a second parameter (corresponds to updated neural network information) based on the trained model (corresponds the updated neural network)). 
sending, by the processor, the updated neural network information to the edge device (Kopp et al., Col. 2 Lines 50-52, “The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device” teaches the parameter server (corresponds to the processor) transmitting the updated master parameter vector (corresponds to the updated neural network information) to the respective device (corresponds to edge device)).
Regarding Claim 2,
Kopp et al. teaches the method of claim 1, 
Kopp et al. further teaches wherein sending the trained neural network to the edge device comprises sending the trained neural network to an edge device that has been deployed (Kopp et al., Col. 1 Lines 24-30, “In such a distributed machine learning scenario, the dataset is transmitted to or stored among multiple edge devices. The devices solve a distributed optimization problem to collectively learn the underlying model. For distributed computing, similar (or identical) datasets may be allocated to multiple devices that are then able to solve a problem in parallel” teaches a deployed edge device).
Regarding Claim 3,
Kopp et al. teaches the method of claim 1, wherein using the received neural network information to update all or a part of the trained neural network and generating the updated neural network information based on the updated neural network comprises:
Kopp et al. further teaches generating a neural network difference model by comparing the updated neural network to the trained neural network (Col. 4 Lines 33-38, “During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data” teaches the models being trained on the new data or combination of (corresponds to generating a neural network difference model), based on the updated set of the central set of parameters (corresponds to the updated neural network) to the local parameters (corresponds to the trained neural network)).
Regarding Claim 6,
Kopp et al. teaches the method of claim 1, further comprising: 
Kopp et al. further teaches receiving, by the edge device, the trained neural network (Kopp et al., Col. 7-8 Lines 65-67 and Lines 1-2, “The parameter servers 125 are configured to receive locally trained model parameters from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device” teaches the server update receiving the locally trained parameter from device, updates the model parameters, and then transmits model parameter back to the device (corresponds to the edge device receiving the trained neural network again)).
collecting, by the edge device, the dataset from sensors of the edge device (Kopp et al., Col. 1 Lines 43-46, “The device includes at least one sensor, a communications interface, and a device processor. The at least one sensor is configured to acquire training data” teaches the sensor of the device (corresponds to edge device) acquiring training data (corresponds to collecting the dataset)).
applying, by the edge device, the collected dataset as inputs to the received neural network to generate activations and the overall inference result (Kopp et al., FIG. 3 and Col. 5 Lines 19-22, “At act A170, the model is used on new data to generate, for example, a prediction or classification” teaches the device applying new data (corresponds to the collected dataset as inputs) to the model (corresponds to the received neural network) to generate prediction or classification (corresponds to overall inference result)). Col. 13 Lines 27-30, “The parameter server 125 using a weighting function and a weight (Alpha) so that newly received local parameter vectors do not overwhelm the central parameter vector” teaches generating a weighting function (corresponds to activations)). 
storing, by the edge device, at least a portion of at least one or more of the collected dataset, the generated activations or the overall inference result in a memory of the edge device (Kopp et al., Col. 6 Lines 53-54, “Each device 122 may collect and/or store data relating to the model” teaches the edge device storing the collected data relating to the model. FIG. 3 and Col. 5 Lines 19-22, “At act A170, the model is used on new data to generate, for example, a prediction or classification” teaches the device applying new data (corresponds to the collected dataset as inputs) to the model (corresponds to the received neural network) to generate prediction or classification (corresponds to overall inference result)). Col. 13 Lines 27-30, “The parameter server 125 using a weighting function and a weight (Alpha) so that newly received local parameter vectors do not overwhelm the central parameter vector” teaches generating a weighting function (corresponds to activations)). 
sending, by the edge device, the neural network information that includes at least a portion of at least one or more of the collected dataset, the generated activations or the overall inference result to the centralized site/device (Kopp et al., Col. 15 Lines 22-26, “A worker device first trains the locally stored model through gradient descent in a pre-arranged fashion (fixed or flexible number of epochs) and then sends the trained parameters to the device housing the process representing the parameter server 125” teaches the worker device (corresponds to edge device) sending trained parameters (corresponds to the neural network information, the overall inference result) to the centralized server. Col. 10 Lines 52-55, “By using a plurality of distributed worker devices 122, the model is trained on a much larger volume of data on the edge than can be transferred to a centralized server for bandwidth, privacy, business, and timing reasons” teaches large volume of data transferred to a centralized server (corresponds to the centralized site/device)).
Regarding Claim 7,
Kopp et al. teaches the method of claim 6, further comprising: 
Kopp et al. further teaches receiving, by the edge device, the updated neural network information (Kopp et al., Col. 4 Lines 26-28, “The parameter server updates a central set of parameters and transmits the updated central set of parameters back to the worker device” teaches the worker device (corresponds to edge device) receiving the updated central set of parameters of the machine learning model (corresponds to the updated neural network information)).
generating, by the edge device, an updated neural network based on the received trained neural network and the received updated neural network information (Kopp et al., Col. 1 Lines 47-53, “The device processor is configured to train the machine learned model using the training data, transmit a parameter vector of the trained model to the parameter server, and receive in response, an updated central parameter vector from the parameter server. The device processor is further configured to retrain the model using the updated central parameter vector” teaches training the machine learning model (corresponds to an updated neural network) based on the updated central parameter (corresponds to the received updated neural network information)).
applying a second dataset as input to the updated neural network to generate second inference results (Kopp et al., Col. 13 Lines 9-22, “the worker device 122 transmits a second parameter from the trained model to the parameter server 125. The second parameter may be parameter vector that is generated as a result of training the model using the training data… the second parameter set encoded using a sparsely encoding scheme” teaches a second parameter (corresponds to a second dataset as input) of training the model (corresponds to updated neural network). Col. 5 Lines 19-22, “Unsupervised learning identifies hidden patterns or intrinsic structures in the data. Unsupervised learning is used to draw inferences from the datasets that include input data without labeled responses” teaches generating inference from the datasets that include input data).
Regarding Claim 8, 
Kopp et al. teaches the method of claim 7, 
Kopp et al. further teaches wherein receiving the updated neural network information comprises receiving a neural network difference model (Kopp et al., Col. 18-19 Lines 66-67 and Lines 1-5, “The model may be trained using local data on multiple devices that included both LIDAR and camera systems. The model may be deployed on cars that only include camera systems. The training data would include both the LIDAR data and optical images. The model minimization is calculated as the average difference in prediction of depth from camera and LIDAR” teaches a model that calculates the average difference in prediction (corresponds to a neural network difference model)).
Regarding Claim 9, 
Kopp et al. teaches the method of claim 1, wherein: training the neural network comprises: 
Kopp et al. further teaches collecting training data from one or more of a plurality of edge devices (Kopp et al., Col. 26 Lines 15-17, “the navigation device acquires different training data from other devices that are training the model” teaches navigation device acquiring training data from other devices (corresponds to one or more of a plurality of edge device)).
labelling the collected training data (Kopp et al., Col. 11 Line 22, “In an embodiment, the training data is labeled” teaches labeled training data).
selecting two or more lightweight neural networks (Kopp et al., Col. 9 Lines 33-40, “The devices 122 may use trained models (using received parameters) to provide data to assist in identifying a location of the device 122, objects in the vicinity of the device 122, or environmental conditions around the device for example” teaches utilizing (corresponds to selecting) two or more trained models (corresponds to two or more neural networks) to identify a location and objects. Col. 14 Lines 7-11, “the Alpha value is set between 0.01 and 0.2 indicating that new incoming parameters are discounted between 80% and 99% when generating the new central parameter vector. Alternative values of Alpha may be used for different processes or models” teaches the Alpha value being set between 0.01 and 0.2 (corresponds to light weight neural networks)).
generating an ensemble based on the selected neural networks (Kopp et al., Col. 4 Lines 36-38, “As workers collect new data, the local models may be trained on the new data or a combination of the new and old data” teaches the collection of different models at each of the worker devices (corresponds to generating an ensemble based on the selected neural networks) to improve predictive performance of the trained models).
using the labelled training data to train the ensemble (Kopp et al., Col. 11 Line 22, “In an embodiment, the training data is labeled” teaches labeled training data. Col. 4 Lines 36-38, “As workers collect new data, the local models may be trained on the new data or a combination of the new and old data” teaches combination of new and old and old training data (corresponds to training an ensemble)).
sending the trained neural network to the edge device comprises (Kopp et al., Col. 5 Lines 63-64, “The data or a model may be transmitted back to the edge devices” teaches sending the data or a model (corresponds to trained neural network) to the edge device).
sending the trained ensemble and an ensemble aggregation function to the edge device (Kopp et al., Col. 7-8 Lines 61-67 and Lines 1-6, “One or more devices 122 or the mapping platform 127 may be configured as a parameter server 125. The parameter server 125 may also be configured distinct from the devices 122 or mapping platform 127. The system may include one or more parameter servers 125. The parameter servers 125 are configured to receive locally trained model parameters from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device. The parameter server 125 communicates with each device 122 of the plurality of devices 122 that are assigned to the parameter server 125. The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122” teaches sending the trained parameters and models (corresponds to trained ensemble) and aggregated parameters (corresponds to an ensemble aggregation function) to the device).
Regarding Claim 10,  
Kopp et al. teaches the method of claim 9, wherein using the received neural network information to update all or a part of the trained neural network comprises: 
Kopp et al. further teaches adding a neural network to the trained ensemble (Kopp et al., Col. 4 Lines 36-38, “As workers collect new data, the local models may be trained on the new data or a combination of the new and old data” teaches training the local model with old and new data (corresponds to adding new data of a neural network to the trained ensemble)).
Regarding Claim 11, 
Kopp et al. teaches the method of claim 9, wherein using the received neural network information to update all or a part of the trained neural network comprises: 
Kopp et al. further teaches updating the ensemble aggregation function based on a result of analyzing the received neural network information (Kopp et al., Col. 10 Lines 25-29, “The parameter server 125 aggregates the parameter vectors from each of the three devices and generates a central parameter vector. In an embodiment, the aggregation is done using equation 1 described above” teaches utilizing an aggregation equation (corresponds to updating the ensemble aggregation function) the received parameter vectors (corresponds to received neural network information))
updating all or a part of the trained neural network based on the updated ensemble aggregation function (Kopp et al., Col. 8 Lines 4-6, “The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122” teaches aggregating the parameters from the models that are trained (corresponds to updating the trained neural network based on the updated ensemble aggregation function)).
Regarding Claim 12, 
Kopp et al. teaches the method of claim 6, wherein: 
Kopp et al. further teaches receiving the trained neural network comprises receiving a trained ensemble (Kopp et al., Col. 7-8 Lines 61-67 and Lines 1-6, “One or more devices 122 or the mapping platform 127 may be configured as a parameter server 125. The parameter server 125 may also be configured distinct from the devices 122 or mapping platform 127. The system may include one or more parameter servers 125. The parameter servers 125 are configured to receive locally trained model parameters from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device. The parameter server 125 communicates with each device 122 of the plurality of devices 122 that are assigned to the parameter server 125. The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122” teaches sending the trained parameters and models (corresponds to trained ensemble) and aggregated parameters to the device (corresponds to the device receiving). 
applying the collected dataset as inputs to the received neural network to generate the activations and the overall inference result comprises applying the collected dataset as inputs to the received ensemble to generate the activations and the overall inference result (Kopp et al., Col. 5 Lines 19-22, “Unsupervised learning identifies hidden patterns or intrinsic structures in the data. Unsupervised learning is used to draw inferences from the datasets that include input data without labeled responses” teaches generating inferences from collected datasets as input data to the machine learning model. Col. 5 Lines 19-22, “Unsupervised learning identifies hidden patterns or intrinsic structures in the data. Unsupervised learning is used to draw inferences from the datasets that include input data without labeled responses” teaches generating inferences from collected datasets as input data to the machine learning model. Col. 13 Lines 27-30, “The parameter server 125 using a weighting function and a weight (Alpha) so that newly received local parameter vectors do not overwhelm the central parameter vector” teaches generating a weighting function (corresponds to activations)).
Regarding Claim 13, 
Kopp et al. teaches the method of claim 1, wherein training the neural network comprises: 
Kopp et al. further teaches generating a stratified neural network that includes large data volume parts and small data parts (Kopp et al., Col. 12 Lines 13-17, “In an example, a two-stage convolutional neural network is used that includes max pooling layers. The two-stage convolutional neural network (CNN) uses rectified linear units for the non-linearity and a fully-connected layer at the end for image classification” teaches the neural network (corresponds to stratified neural network) that consist of two-stage (corresponds to large data volume parts and small data parts)).
Regarding Claim 14, 
Kopp et al. teaches the method of claim 13, wherein sending the updated neural network information to the edge device comprises 
Kopp et al. further teaches sending the small data parts of the stratified neural network to the edge device (Kopp et al., Col. 6 Lines 27-36, “The training occurs in a decentralized manner on multiple devices with only the local data available to each device. The multiple devices do not share data. The aggregation of model parameters occurs asynchronously on a centralized parameter server. The aggregation of the model parameters includes a small linear weighting of the locally-trained model parameters to the centrally-stored model parameters that is independent of the number of data points, the staleness of the parameter updates, and the data distribution” teaches a small linear weighting (corresponds to small data parts) of the locally-trained model (corresponds to stratified neural network) to the devices).
Regarding Claim 15, 
Kopp et al. teaches the method of claim 13, wherein generating the stratified neural network that includes the large data volume parts and the small data parts comprises: generating the stratified neural network to include: 
Kopp et al. further teaches a large data volume part that include a feature identification layer (Kopp et al., Col. 12 Lines 13-14, “In an example, a two-stage convolutional neural network is used that includes max pooling layers” teaches max pooling layers (corresponds to feature identification layer)).
a small data part that includes a fully connected layer (Kopp et al., Col. 12 Lines 14-17, “The two-stage convolutional neural network (CNN) uses rectified linear units for the non-linearity and a fully-connected layer at the end for image classification” teaches fully-connected layer).
Regarding Claim 16, 
Kopp et al. teaches the method of claim 13, wherein generating the stratified neural network that includes the large data volume parts and the small data parts comprises: generating the stratified neural network to include: 
Kopp et al. further teaches large data volume parts that include multiple partial layers that are not cross-connected (FIG. 5 and Col. 15-16 Lines 65-67 and Lines 1-3, “FIG. 5 depicts an embodiment for aggregation by a hierarchy of parameters servers. There is not just a single parameter server 125, but the worker devices 122 and parameter servers 125 have been further partitioned into groups. Each parameter server 125 further transmits parameters to a master parameter server 525 to be aggregated” teaches the parameter server (corresponds to large data volume parts) includes layers that are not cross-connected)
small data parts that include cross-connected weights between the multiple partial layers in the large data volume parts (FIG. 5 and Col. 15-16 Lines 65-67 and Lines 1-3, “FIG. 5 depicts an embodiment for aggregation by a hierarchy of parameters servers. There is not just a single parameter server 125, but the worker devices 122 and parameter servers 125 have been further partitioned into groups. Each parameter server 125 further transmits parameters to a master parameter server 525 to be aggregated” teaches the worker devices (corresponds to small data parts) that includes cross-connected parameters between the worker devices).
Regarding Claim 17, 
Kopp et al. teaches the method of claim 13, wherein generating the stratified neural network that includes the large data volume parts and the small data parts comprises: generating the stratified neural network to include: 
Kopp et al. further teaches large data volume parts that include layers with a higher numerical precision (Kopp et al., Col. 9 Lines 14-16, “A lower alpha value discounts the newer incoming parameter, leading to less change in the central parameter vector” teaches a lower alpha value (corresponds to higher numerical precision)). 
small data parts that include layers with a lower numerical precision (Kopp et al., Col. 9 Lines 16-18, “A higher alpha value allows for the incoming parameters vectors to quickly change the central parameter vector” teaches a higher alpha value (corresponds to lower numerical precision)).
Regarding Claim 18, 
Kopp et al. teaches the method of claim 13, wherein using the received neural network information to update all or a part of the trained neural network and generating the updated neural network information based on the updated neural network comprises: 
Kopp et al. further teaches retraining only the small data parts of the stratified neural network (Kopp et al., Col. 14 Lines 12-13, “the worker device 122 retrains the model using the local training data and the third parameter” teaches only the worker device (corresponds to the small data parts) retraining the model (corresponds to stratified neural network) with the updated parameter received from the parameter server).
Regarding Claim 19, 
Kopp et al. teaches a centralized site/device, comprising (Kopp et al., Col. 6 Lines 1-4, “the devices may finish processing the data at or about the same time allowing a centralized server to capture the results at the same time. A centralized server may balance data between devices” teaches a centralized server (corresponds to a centralized site/device)).
a processor is configured with processor-executable instructions to perform operations comprising (Kopp et al., Col. 22 Lines 22-25, “carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein” teaches a processor with executable instructions to perform operations disclosed).
training a neural network (Kopp et al., FIG. 1 and Col. 6 Lines 37-40, “FIG. 1 depicts a decentralized system for training a machine learned model. The system includes a plurality of devices 122, a network 127, parameter servers 125, and a mapping platform 121” teaches training a machine learning model with a system that includes a parameter server).  
sending the trained neural network to an edge device that has low- bandwidth uplink capability (Kopp et al., Col. 5 Lines 63-64, “The data or a model may be transmitted back to the edge devices” teaches sending the model (corresponds to trained neural network) to the edge device. Col. 8 Lines 53-56, “Compressing the set of parameters into a parameter vector may be more efficient for bandwidth and timing than transmitting and recalculating each parameter of the set of parameters” teaches a method that is more efficient for bandwidth (corresponds to low- bandwidth uplink capability)).
receiving neural network information from the edge device, the received neural network information including at least a portion of at least one or more of a dataset, an activation, or an overall inference result collected or generated in the edge device  (Kopp et al., Col. 7 Lines 65-67, “The parameter servers 125 are configured to receive locally trained model parameters from a device 122” teaches the parameter server (corresponds to the processor) receiving locally trained model parameters (corresponds to neural network information) from a device (corresponds to the edge device). Col. 1 Lines 24-30, “In such a distributed machine learning scenario, the dataset is transmitted to or stored among multiple edge devices. The devices solve a distributed optimization problem to collectively learn the underlying model. For distributed computing, similar (or identical) datasets may be allocated to multiple devices that are then able to solve a problem in parallel” teaches datasets being transmitted or stored (corresponds to collected) among edge devices).
using the received neural network information to update all or a part of the trained neural network (Kopp et al., Col. 2 Lines 50-56, “The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device. This process repeats multiple times, each device training the local model to determine a parameter vector, transmitting the parameter vector to the parameter server, receiving the updated master parameter vector, and retraining the local model” teaches the parameter server utilizing the received parameter vector (corresponds to the received neural network information) to update master parameter vectors to retrain the local model (corresponds to update all or a part of the trained neural network). 
generating updated neural network information based on the updated neural network (Kopp et al., Col. 13 Lines 9-13, “the worker device 122 transmits a second parameter from the trained model to the parameter server 125. The second parameter may be parameter vector that is generated as a result of training the model using the training data” teaches generating a second parameter (corresponds to updated neural network information) based on the trained model (corresponds the updated neural network)).
sending the updated neural network information to the edge device (Kopp et al., Col. 2 Lines 50-52, “The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device” teaches the parameter server (corresponds to the processor) transmitting the updated master parameter vector (corresponds to the updated neural network information) to the respective device (corresponds to edge device)).
Regarding Claim 20, 
Kopp et al. teaches a non-transitory computer readable storage medium having stored thereon processor-executable software instructions configured to cause a processor in a centralized site/device to perform operations for updating a neural network on an edge device that has low-bandwidth uplink capability, the operations comprising (Kopp et al., Col. 22 Lines 21-25, “include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein” teaches a computer readable medium (corresponds to a non-transitory computer readable storage medium) capable of storing a set of instructions for execution by a processor. Col. 4 Lines 34-39, “During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data” teaches updating a model (corresponds to neural network). Col. 5 Lines 63-64, “The data or a model may be transmitted back to the edge devices” teaches and edge device. Col. 8 Lines 53-56, “Compressing the set of parameters into a parameter vector may be more efficient for bandwidth and timing than transmitting and recalculating each parameter of the set of parameters” teaches a method that is more efficient for bandwidth (corresponds to low- bandwidth uplink capability)).
training a neural network (Kopp et al., FIG. 1 and Col. 6 Lines 37-40, “FIG. 1 depicts a decentralized system for training a machine learned model. The system includes a plurality of devices 122, a network 127, parameter servers 125, and a mapping platform 121” teaches training a machine learning model with a system that includes a parameter server).  
sending the trained neural network to an edge device that has low- bandwidth uplink capability (Kopp et al., Col. 5 Lines 63-64, “The data or a model may be transmitted back to the edge devices” teaches sending the model (corresponds to trained neural network) to the edge device. Col. 8 Lines 53-56, “Compressing the set of parameters into a parameter vector may be more efficient for bandwidth and timing than transmitting and recalculating each parameter of the set of parameters” teaches a method that is more efficient for bandwidth (corresponds to low- bandwidth uplink capability)).
receiving neural network information from the edge device, the received neural network information including at least a portion of at least one or more of a dataset, an activation, or an overall inference result collected or generated in the edge device (Kopp et al., Col. 7 Lines 65-67, “The parameter servers 125 are configured to receive locally trained model parameters from a device 122” teaches the parameter server (corresponds to the processor) receiving locally trained model parameters (corresponds to neural network information) from a device (corresponds to the edge device). Col. 1 Lines 24-30, “In such a distributed machine learning scenario, the dataset is transmitted to or stored among multiple edge devices. The devices solve a distributed optimization problem to collectively learn the underlying model. For distributed computing, similar (or identical) datasets may be allocated to multiple devices that are then able to solve a problem in parallel” teaches datasets being transmitted or stored (corresponds to collected) among edge devices).
using the received neural network information to update all or a part of the trained neural network (Kopp et al., Col. 2 Lines 50-56, “The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device. This process repeats multiple times, each device training the local model to determine a parameter vector, transmitting the parameter vector to the parameter server, receiving the updated master parameter vector, and retraining the local model” teaches the parameter server utilizing the received parameter vector (corresponds to the received neural network information) to update master parameter vectors to retrain the local model (corresponds to update all or a part of the trained neural network).
generating updated neural network information based on the updated neural network (Kopp et al., Col. 13 Lines 9-13, “the worker device 122 transmits a second parameter from the trained model to the parameter server 125. The second parameter may be parameter vector that is generated as a result of training the model using the training data” teaches generating a second parameter (corresponds to updated neural network information) based on the trained model (corresponds the updated neural network)).
sending the updated neural network information to the edge device (Kopp et al., Col. 2 Lines 50-52, “The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device” teaches the parameter server (corresponds to the processor) transmitting the updated master parameter vector (corresponds to the updated neural network information) to the respective device (corresponds to edge device)).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Kopp et al. in view of Wahid et al. (“Classification of Microscopic Images of Bacteria Using Deep Convolutional Neural Network”)
Regarding Claim 4,
Kopp et al. teaches the method of claim 3, wherein generating the neural network difference model by comparing the updated neural network to the trained neural network comprises: 
Kopp et al.  does not appear to explicitly teach generating a patch that identifies the differences between the updated neural network and the trained neural network via one of: layer freezing using a minimum size technique; layer freezing using a minimum delta technique; weights freezing using the minimum size technique; or weights freezing using the minimum delta technique
However, Wahid et al., teaches generating a patch that identifies the differences between the updated neural network and the trained neural network via one of (Wahid et al., Section III Pg. 219, “The results of the predictions are compared to the actual labels of those images and the comparison-result is used to update the weights of active layers. This process is called ‘Back-propagation’ that increases the batch-wise training-accuracy and decreases the error-rate gradually after certain number of iterations” teaches backpropagation of a neural network (corresponds to the updated neural network and the trained neural network) that compares actual labels (corresponds to identifies the differences)).
layer freezing using a minimum size technique; layer freezing using a minimum delta technique; weights freezing using the minimum size technique; or weights freezing using the minimum delta technique (Wahid et al., Section IV Pg. 219, “If the training-accuracy increases over validation-accuracy, the network is treated to be ‘over-fitted’. This can be controlled by changing the numbers of frozen layers, or the value of learning-rate factor (LR) and batch-size” teaches layer freezing to control over-fitting (corresponds to minimum size technique)).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to identify the difference between the differences between the updated neural network and the trained neural network with freezing layer utilizing a minimum size technique, as taught by Wahid et al., to the method of updating a neural network on an edge device that has low- bandwidth uplink capability of Kopp et al. The motivation to control overfitting and increase learning-rate (Wahid et al., Section IV Pg. 219, “If the training accuracy increases over validation-accuracy, the network is treated to be ‘over-fitted’. This can be controlled by changing the numbers of frozen layers, or the value of learning-rate factor (LR) and batch-size. We conducted the retraining operation with different batch-sizes keeping the LR of frozen and active-layers fixed at 0 and 0.0001 respectively. LR of our fully-connected neural-network was 10. Thus we maintained a gradually increasing learning-rate all over the network”)
Regarding Claim 5,
Kopp et al. teaches the method of claim 3, wherein generating the neural network difference model by comparing the updated neural network to the trained neural network comprises: 
Kopp et al.  does not appear to explicitly teach determining one or more neural network layers or one or more neural network weights of the one or more neural network layers to freeze based on a mean of activations of layers in the neural network
However, Wahid et al., teaches determining one or more neural network layers or one or more neural network weights of the one or more neural network layers to freeze based on a mean of activations of layers in the neural network (Wahid et al., Fig. 3 and Section III Pg. 218“Feature-extraction from input-images is done by using the filters of initial frozen layers with pre-trained weights. Those features are then used to retrain the rest of the network that contains only active layers. Output from the last active layer is fed to our fully-connected network, where final classification result is represented” teaches frozen layers (corresponds to the one or more neural network layers to freeze) is based on active layers on the fully connected network (corresponds to the means of activations of layers in the neural network)).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HENRY TRONG NGUYEN/Examiner, Art Unit 2125

/BRIAN M SMITH/Primary Examiner, Art Unit 2122