Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on July 27, 2022, in which claims 1-8, 11, 15-18, and 20 are currently amended. Claims 9 and 19 are canceled.  Claims 1-8, 10-18, and 20 are currently pending.

Response to Arguments
The claim interpretation to claims 15-20 under 35 U.S.C. § 112(f) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejection to claims 3-4 and 12-13 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 101 based on amendment have been considered, however, have not been deemed persuasive. 
With regards to Applicant’s arguments that communicating a portion of the data expresses a technical improvement, Examiner respectfully disagrees.  The claim limitation still amounts to insignificant extra-solution activity of outputting data.  Examiner asserts that the amount of data does being output does not distinguish the limitation from insignificant extra-solution activity.  In addition, the language is very broad and under broadest reasonable interpretation one of ordinary skill in the art would recognize that any sized portion including the entire portion would satisfy this criteria.  The additional limitation “determining that an accuracy associated with a specific one of the one or more features is outside a range of acceptable values; and communicating, by the processing circuitry of the first device” amounts to mathematical calculations and relationships, and the addition of “processing circuitry of a first/second device” still amounts to generic computer components recited at a high level of generality.  For these reasons, Examiner asserts that it is appropriate to maintain the rejection. 
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 102/103 based on amendment have been considered, however, have not been deemed persuasive. 
With respect to Applicant's arguments that Leroux does not teach "communicating, by the processing circuitry of the first device, a portion of the data set associated with the specific one of the features to the second device for processing via the second one or more layers of the neural network implemented on the processing circuitry of the second device", Examiner respectfully disagrees.  Leroux explicitly teaches ([p. 795 §3] "Instead of off-loading the entire network to a cloud back-end, we off-load only a part of the network. The first layers are evaluated locally and the remote part is only required when these layers are unable to classify a sample with sufficient confidence") Being unable to classify a sample with sufficient confidence is interpreted as synonymous with having a range of accuracy associated with a specific one of the one or more features being outside a range of acceptable values. See also ([p. 795 §3] "we train multiple output layers: one directly on the raw input data and one after every hidden layer in the network. This allows to stop propagating a sample through the network once a sufficiently confident result is obtained. We use an interesting property of neural network classifiers stating that they provide outputs which estimate Bayesian a posteriori probabilities [26], meaning the outputs can be interpreted as confidence measures (i.e., how confident is the network that a certain sample belongs to a certain class?)."  Confidence is interpreted as synonymous with accuracy. 
With respect to Applicant's argument that Leroux does not teach "an accuracy associated with a specific one of the one or more features", Examiner respectfully disagrees.  Leroux explicitly teaches that a confidence measure which is used as a stopping threshold in training based specifically on the classification accuracy of a particular output feature  ([p. 792 §1] "Each layer extracts more complex features from its input. The last layer uses the high-level features to classify the input...We cease the evaluation of deeper layers once a certain required confidence threshold is reached" [p. 795 §3] "we train multiple output layers: one directly on the raw input data and one after every hidden layer in the network. This allows to stop propagating a sample through the network once a sufficiently confident result is obtained. We use an interesting property of neural network classifiers stating that they provide outputs which estimate Bayesian a posteriori probabilities [26], meaning the outputs can be interpreted as confidence measures (i.e., how confident is the network that a certain sample belongs to a certain class?).".

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-8, 10-18, and 20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations and relationships.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
a data set that is reduced in size relative to the input data while identifying one or more features of the input data for processing by a second one or more layers of the neural network (observation, evaluation, and judgement),
determining that an accuracy associated with a specific one of the one or more features is outside a range of acceptable values (mathematical calculations and relationships)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “processing circuitry of a first device”, “processing circuitry of a second device”, and “the neural network”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 1 also recites additional elements “receiving, by processing circuitry of a first device configured with a first one or more layers of a neural network, input data for processing via the neural network implemented across the processing circuitry of the first device and processing circuitry of a second device”, “outputting, by the first one or more layers of the neural network implemented on processing circuitry of the first device, a data set that is reduced in size relative to the input data”, and “communicating, by the processing circuitry of the first device, a portion of the data set associated with the specific one of the features to the second device for processing via the second one or more layers of the neural network implemented on the processing circuitry of the second device” which amounts to gathering and outputting data  (gathering data and outputting data are considered as pre-solution and post-solution activities per MPEP 2106.05(g)) which amounts to insignificant extra-solution activity (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component as well as insignificant extra-solution activity of gathering and outputting data.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 15, which recites a system and a computer program product, respectively, as well as to dependent claims 2-7 and 16-18 and 20.  The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 2 and 16 recite additional mathematical calculations “reducing, by the first one or more layers, the data set by compressing the data set for transmission via a network to the processing circuitry of the second device.”.
Dependent claims 3 and 17 recite additional observation, evaluation, and judgement “wherein the second one or more layers determine if a particular feature is one of one or more features within the input data.”
Dependent claims 4 and 18 recite additional insignificant extra-solution activity “receiving, by the processing circuitry of the first device, an indication from the processing circuitry of the second device that the particular feature was detected by the second one or more layers”
Dependent claim 5 recites additional observation, evaluation, and judgement “detecting, by processing circuitry of the first device, that a particular one of the one or more features meets a threshold of accuracy to take an action by the processing circuitry of the first device”.
Dependent claim 6 recites additional insignificant extra-solution activity “performing, by the processing circuitry of the first device responsive to the detection, the action with respect to the particular one of the one or more features” which amounts to post-solution activity.  The action performed is not well-described in the specification.  ¶0131 describes as a non-limiting example outputting to the display as a possible action which is not meaningful towards implementing the judicial exception into a practical application.  
Dependent claims 7 and 20 recite additional insignificant extra-solution activity “performing the action without communicating the data set to the processing circuitry of the second device” which amounts to post-solution activity.  The action performed is not well-described in the specification.  ¶0131 describes as a non-limiting example outputting to the display as a possible action which is not meaningful towards implementing the judicial exception into a practical application.  

Regarding Claim 8:  Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 8 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations and relationships.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
a data set that is reduced in size relative to the input data while identifying one or more features of the input data for processing by a second one or more layers of the neural network (observation, evaluation, and judgement),
detecting, by the processor, a feature of the one or more features in the data set (observation, evaluation, and judgement)
determining if an accuracy associated with a specific one of the one or more features is outside a range of acceptable values; in response to the accuracy associated with the specific one of the one or more features being outside a range of acceptable values (mathematical calculations and relationships)
performing, by the processor an action with respect to the feature instead of communicating the portion of the data set to the second device (observation, evaluation, and judgement)
Therefore, claim 8 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 8 recites additional elements “A processor”, “a wearable head display”, and “the neural network”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 8 also recites additional elements “receiving, by a processor of a wearable head display, input data captured by the wearable head display”, “generating, by a first one or more layers of a neural network implemented on the processor, a data set that is reduced in size relative to the input data while identifying one or more features of the input data for processing by a second one or more layers of the neural network”. and “communicating, by the processor of the wearable head display, a portion of the data set associated with the specific one of the features to a second device for processing via the second one or more layers of the neural network implemented on processing circuitry of the second device, in response to the accuracy associated with the specific one of the one or more features being within the range of acceptable values” which amounts to gathering and outputting data  (gathering data and outputting data are considered as pre-solution and post-solution activities per MPEP 2106.05(g)) which amounts to insignificant extra-solution activity (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015)).  Therefore, claim 8 is directed to a judicial exception.
Step 2B Analysis:  Claim 8 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 8 amount to no more than mere instructions to apply the judicial exception using a generic computer component as well as insignificant extra-solution activity of gathering and outputting data.
For the reasons above, claim 8 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to dependent claims 9-14.  The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 10 recites additional insignificant extra-solution activity “performing the action comprising modifying an image being displayed via the wearable head display.” which amounts to post-solution activity.  The action performed is not well-described in the specification.  ¶0131 describes as a non-limiting example outputting to the display as a possible action which is not meaningful towards implementing the judicial exception into a practical application.  
Dependent claim 11 recites additional insignificant extra-solution activity “generating, by the first one or more layers implemented on the processor, a second data set that is reduced in size relative to a second input data while identifying a second one or more of features in the second input data.” of gathering and outputting data.  
Dependent claim 12 recites additional observation, evaluation, and judgement “determining if an accuracy associated with a specific one of the second one or more features is outside a range of acceptable values.”
Dependent claim 13 recites additional insignificant extra-solution activity “communicating, by the processor responsive to the determination, the second data set to the second device implementing the second one or more layers of the neural network.” which amounts to gathering and outputting data  
Dependent claim 14 recites additional insignificant extra-solution activity “receiving, by the processor, from the second device an indication of a result of processing of the second data set by the second one or more layers.” which amounts to gathering and outputting data  

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-8, 10-18, 15-18, and 20 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-3, 5-7, 15-17, and 20 are rejected under 35 U.S.C. 102 as being anticipated by Leroux (“The cascading neural network: building the Internet of Smart Things”, 2015).

	Regarding claim 1, Leroux teaches A method comprising: receiving ([p. 795 §3] "we train multiple output layers: one directly on the raw input data".   See FIG. 11, the receiving is made at the input of the first convolution layer.), by processing circuitry of a first device ([p. 807 §5.3] "Jetson TK1 board and GPU server (GTX980 GPU)".)
	configured with a first one or more layers of a neural network, ([p. 802 §5.2] "The network consists of three convolutional layers with 64 5 by 5 filters each and one fully connected layer with 1024 neurons at the end" See also FIG. 8)
	input data for processing via the neural network ([p. 802 §5.2] "The input image data was rescaled to have zero mean and unit variance, but no other preprocessing or data augmentation techniques were used" See also FIG. 11, the 221x221 RGB image data)
	implemented across the processing circuitry of the first device and processing circuitry of a second device; ([p. 807 §5.3] "Jetson TK1 board and GPU server (GTX980 GPU)...The alternative approach is to off-load all the computations to the GPU server in the cloud." GPU Server interpreted as synonymous with second device.)
	outputting, by the first one or more layers of the neural network implemented on processing circuitry of the first device, ([p. 795 §3] "we train multiple output layers: one directly on the raw input data and one after every hidden layer in the network. This allows to stop propagating a sample through the network once a sufficiently confident result is obtained. We use an interesting property of neural network classifiers stating that they provide outputs which estimate Bayesian a posteriori probabilities")
	a data set that is reduced in size relative to the input data while identifying one or more features of the input data for processing by a second one or more layers of the neural network; and ([p. 805] "FIG. 11 the input data has a size of 221*221*3 (RGB) = 146523 elements compared to the elements output from the second convolution layer comprising 57600 elements.")
	determining that an accuracy associated with a specific one of the one or more features is outside a range of acceptable values; and ([p. 792 §1] "Each layer extracts more complex features from its input. The last layer uses the high-level features to classify the input...We cease the evaluation of deeper layers once a certain required confidence threshold is reached" [p. 795 §3] "we train multiple output layers: one directly on the raw input data and one after every hidden layer in the network. This allows to stop propagating a sample through the network once a sufficiently confident result is obtained. We use an interesting property of neural network classifiers stating that they provide outputs which estimate Bayesian a posteriori probabilities [26], meaning the outputs can be interpreted as confidence measures (i.e., how confident is the network that a certain sample belongs to a certain class?)."   Confidence is interpreted as synonymous with accuracy.  Leroux explicitly teaches that the confidence measure is associated with the output features to classify the input.)
	communicating, by the processing circuitry of the first device, a portion of the data set associated with the specific one of the features to the second device for processing via the second one or more layers of the neural network implemented on the processing circuitry of the second device. ([p. 795 §3] "Instead of off-loading the entire network to a cloud back-end, we off-load only a part of the network" [p. 805 §5.3] "We transformed the pretrained overfeat network into a cascade by training two additional output layers after the second and the fourth convolutional layer" [p. 808 §5.3] "Table 11 shows that a complete off-load to the cloud takes less time than the local computation except in the case of very limited bandwidth...Table 12 shows the required runtime of the first cascade with varying network bandwidth and latency" Cloud (GPU Server) interpreted as second device.  While Leroux teaches that a smaller portion of the data set is off-loaded, the entire portion of the data set is still interpreted as a portion of the data set.). 

	Regarding claim 2, Leroux teaches The method of claim 1, further comprising reducing, by the first one or more layers, the data set by compressing the data set for transmission via a network to the processing circuitry of the second device. ([p. 805] FIG. 11 the input data has a size of 221*221*3 (RGB) = 146523 elements compared to the elements output from the second convolution layer comprising 57600 elements. See also FIG. 12 showing compressed data being transmitted to GPU server (second device).). 

	Regarding claim 3, Leroux teaches The method of claim 1, wherein the second one or more layers  determine if a particular feature is one of the one or more features within the input data. ([p. 792 §1] "Each layer extracts more complex features from its input. The last layer uses the high-level features to classify the input" See also FIG. 8 which shows multiple convolutional layers. Extracting features interpreted as syonymous with detecting features.). 

	Regarding claim 5, Leroux teaches The method of claim 1, further comprising detecting, by the processing circuitry of the first device, that a particular one of the one or more features meets a threshold of accuracy to take an action by the processing circuitry of the first device. ([p. 796] See Algorithm 1 "Propagating a sample through the cascade network: Keep evaluating the hidden layers until a confident result is obtained" [p. 797] "The cascade network divides the neural network into different parts. One part is always evaluated locally so the system will still be able to operate when the Internet connection drops...The cascade network decides whether to accept or to reject a classification based on the threshold value" To accept or reject interpreted as actions to take based on the threshold accuracy.). 

	Regarding claim 6, Leroux teaches The method of claim 5, further comprising performing, by the processing circuitry of the first device responsive to the detection, the action with respect to the particular one of the one or more features. ([p. 808] "The cascading architecture avoids sending data over the network when a confident classification can be made by the local part of the network" Avoiding sending data interpreted as response made by first (local) device in response to detection based on threshold.). 

	Regarding claim 7, Leroux teaches The method of claim 6, further comprising performing the action without communicating the data set to the processing circuitry of the second device. ([p. 808] "The cascading architecture avoids sending data over the network when a confident classification can be made by the local part of the network"). 

Claims 15-17, and 20 are substantially similar to claims 1-3, 5, and 7.  Therefore, the rejection applied to claims 1-3, 5, and 7 also applies to claims 15-17 and 20 respectively.  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Leroux and in view of Rajkumar (US 20200311616 A1).  

	Regarding claim 4, Leroux teaches The method of claim 3.
	However, Leroux does not explicitly teach receiving, by the processing circuitry of the first device, an indication from the processing circuitry of the second device that the particular feature was detected by the second one or more layers.  

Rajkumar, in the same field of endeavor, teaches The method of claim 3, further comprising receiving, by the processing circuitry of the first device, an indication from the processing circuitry of the second device that the particular feature was detected by the second one or more layers. ([¶0111] "the tag may include an indication that the embedding is “new,” e.g., represents a newly learned classification that the model is not able to recognize, and that the information should be shared with other robots. The server system 112 determines that the new embedding and classification can to be distributed to the other robots" [¶0131] "In addition to using the machine learning model 214 to predict a classification for the object 124, the robot 104D also uses the processing of the machine learning model 214 to generate an embedding 314 for the object 124. For example, the embedding may be derived from activations at one or more hidden layers of the model 214 and/or from data at an output layer of the model 214." With respect to disclosure of Rajkumar server is interpreted as second device, robot interpreted as first device.). 

	Leroux and Rajkumar are both directed towards distributed training of a neural network.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Leroux with the teachings of Rajkumar by implementing an indication between the first and second device that a feature classification had met a certain threshold. Rajkumar teaches as a motivation for combination that the disclosed cloud computing method is advantageous due to its performance ([¶0138] “it is important for the process to be computationally efficient and power efficient. The architecture discussed here provides these benefits, since processing of the machine learning model can be very fast and efficient, and the lookup and comparison process for using datasets in the local cache can similarly be fast and efficient.”).  

Claim 8, and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Leroux and in view of Spizhevoy (US 20180018451 A1). 

	Regarding claim 8, Leroux teaches generating, by a first one or more layers of a neural network implemented on the processor, ([p. 795 §3] "we train multiple output layers: one directly on the raw input data and one after every hidden layer in the network. This allows to stop propagating a sample through the network once a sufficiently confident result is obtained. We use an interesting property of neural network classifiers stating that they provide outputs which estimate Bayesian a posteriori probabilities")
	a data set that is reduced in size relative to the input data while identifying one or more features of the input data for processing by a second one or more layers of the neural network; ([p. 805] FIG. 11 the input data has a size of 221*221*3 (RGB) = 146523 elements compared to the elements output from the second convolution layer comprising 57600 elements.)
	determining if an accuracy associated with a specific one of the one or more features is outside a range of acceptable values; ([p. 792 §1] "Each layer extracts more complex features from its input. The last layer uses the high-level features to classify the input...We cease the evaluation of deeper layers once a certain required confidence threshold is reached" [p. 795 §3] "we train multiple output layers: one directly on the raw input data and one after every hidden layer in the network. This allows to stop propagating a sample through the network once a sufficiently confident result is obtained. We use an interesting property of neural network classifiers stating that they provide outputs which estimate Bayesian a posteriori probabilities [26], meaning the outputs can be interpreted as confidence measures (i.e., how confident is the network that a certain sample belongs to a certain class?)."   Confidence is interpreted as synonymous with accuracy.  Leroux explicitly teaches that the confidence measure is associated with the output features to classify the input.)
	in response to the accuracy associated with the specific one of the one or more features being outside a range of acceptable values: communicating, by the processor of the wearable head display, a portion of the data set associated with the specific one of the features to a second device for processing via the second one or more layers of the neural network implemented on processing circuitry of the second device, ([p. 795 §3] "Instead of off-loading the entire network to a cloud back-end, we off-load only a part of the network" [p. 805 §5.3] "We transformed the pretrained overfeat network into a cascade by training two additional output layers after the second and the fourth convolutional layer" [p. 808 §5.3] "Table 11 shows that a complete off-load to the cloud takes less time than the local computation except in the case of very limited bandwidth...Table 12 shows the required runtime of the first cascade with varying network bandwidth and latency" Cloud (GPU Server) interpreted as second device.  While Leroux teaches that a smaller portion of the data set is off-loaded, the entire portion of the data set is still interpreted as a portion of the data set.)
	in response to the accuracy associated with the specific one of the one or more features being within the range of acceptable values: performing, by the processor, an action with respect to the feature instead of communicating the portion of the data set to a second device implementing the second one or more layers of the neural network. ([p. 808] "The cascading architecture avoids sending data over the network when a confident classification can be made by the local part of the network")
	However, Leroux does not explicitly teach receiving, by a processor of a wearable head display, input data captured by the wearable head display;  

Spizhevoy teaches receiving, by a processor of a wearable head display, input data captured by the wearable head display; ([¶0093] ". At block 708, a user's eye image is received. For example, an image sensor (e.g., a digital camera) of the user device can capture the user's eye image" [¶0097] "see, e.g., the head mounted display 900 described with reference to FIG. 9").

	Leroux and Spizhevoy are both directed towards a distributed neural network system.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Leroux with the teachings of Spizhevoy by implementing the distributed neural network system using a virtual reality headset as one of the nodes in the distributed network. The disclosure of Spizhevoy includes, in an embodiment, a method of user authentication by using the virtual reality headset to scan images of a users’ eye and to process it for authentication through a convolutional neural network.  As both Leroux and Spizhevoy are directed towards inputting image data into a convolutional neural network for classification, the disclosure Spizhevoy is seen as providing a practical application of this technology.  Spizhevoy further teaches as a motivation for combination, using a triplet method to further improve the image processing capabilities ([¶0079] “The resulting embedding 108 may advantageously have better quality (e.g., higher true positive rate, higher true negative rate, lower equal error rate, or a combination thereof) compared to if only random triplets were used during the learning of the embedding 108.”).

	Regarding claim 11, the combination of Leroux, and Spizhevoy teaches The method of claim 8, further comprises generating, by the first one or more layers implemented on the processor, the portion of the data set as a second data set that is reduced in size relative to a second input data while identifying a second one or more of features in the second input data. (Leroux [p. 805] FIG. 11 the input data has a size of 221*221*3 (RGB) = 146523 elements compared to the elements output from the second convolution layer comprising 57600 elements. [p. 798 §5.1] "It consists of a 60,000 sample training set and a 10,000 sample test set" Leroux explicitly teaches 70,000 data sets that are reduced in size by the layers.). 

	Regarding claim 12, the combination of Leroux, and Spizhevoy teaches The method of claim 11, further comprises determining if an accuracy associated with a specific one of the second one or more features is outside a range of acceptable values, by the processor, that a second feature of the second one or more features is not detectable within a threshold of accuracy. (Leroux [p. 796] See Algorithm 1 "Propagating a sample through the cascade network: Keep evaluating the hidden layers until a confident result is obtained" [p. 797] "The cascade network divides the neural network into different parts. One part is always evaluated locally so the system will still be able to operate when the Internet connection drops...The cascade network decides whether to accept or to reject a classification based on the threshold value" To accept or reject interpreted as actions to take based on the threshold accuracy. Rejecting explicitly taught as action took after determining feature is not detectable within theshold of accuracy.). 
	Regarding claim 13, the combination of Leroux, and Spizhevoy teaches The method of claim 12, further comprising communicating, by the processor responsive to the determination, the second data set to the second device implementing the second one or more layers of the neural network. (Leroux [p. 805 §5.3] "We transformed the pretrained overfeat network into a cascade by training two additional output layers after the second and the fourth convolutional layer" [p. 808 §5.3] "Table 11 shows that a complete off-load to the cloud takes less time than the local computation except in the case of very limited bandwidth..." [p. 808] "The cascading architecture avoids sending data over the network when a confident classification can be made by the local part of the network" Cloud (GPU Server) interpreted as second device.  Leroux explicitly teaches making a determination for whether or not to send data to the server based on the classification accuracy threshold.). 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Leroux, and Spizhevoy and in further view of Xie (“Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks”, 2016). 

	Regarding claim 10, the combination of Leroux and Spizhevoy teaches The method of claim 8.
	However, the combination of Leroux and Spizhevoy does not explicitly teach performing the action comprising modifying an image being displayed via the wearable head display.  

Xie, in the same field of endeavor, teaches performing the action comprising modifying an image being displayed via the wearable head display. ([p. 1] "Fig. 1: We propose Deep3D, a fully automatic 2D-to-3D conversion algorithm that takes 2D images or video frames as input and outputs stereo 3D image pairs. The stereo images can be viewed with 3D glasses or head-mounted VR displays" [p. 2] "To that end, we design a deep neural network that takes as input the left eye’s view, internally estimates a soft (probabilistic) disparity map, and then renders a novel image for the right eye" Estimating and rendering right eye based on the left eye interpreted as synonymous with modifying an image being displayed via the wearable head display.). 

	Leroux, Spizhevoy, and Xie are all directed towards a method of using a convolutional neural network on a virtual reality headset.  Therefore, Leroux, Spizhevoy, and Xie are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of the combination of Leroux and Spizhevoy with the teachings of Xie by using the convolutional neural network to modify images displayed via the virtual reality headset display. The combination of Leroux and Spizhevoy already disclose processing images using the convolutional neural network, and Xie teaches that a 3D image pair can be produced by modifying a single image provided to the virtual reality headset.  Xie teaches as a motivation for combination ([Abstract] “This novel training scheme makes it possible to exploit orders of magnitude more data and significantly increases performance. Indeed, Deep3D outperforms baselines in both quantitative and human subject evaluations.”).  

Claims 14 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Leroux and Spizhevoy and in further view of Rajkumar. 

	Regarding claim 14, the combination of Leroux and Spizhevoy teaches The method of claim 13
	However, the combination of Leroux and Spizhevoy does not explicitly teach receiving, by the processor, from the second device an indication of a result of processing of the second data set by the second one or more layers.  

Rajkumar, in the same field of endeavor, teaches receiving, by the processor, from the second device an indication of a result of processing of the second data set by the second one or more layers. ([¶0111] "the tag may include an indication that the embedding is “new,” e.g., represents a newly learned classification that the model is not able to recognize, and that the information should be shared with other robots. The server system 112 determines that the new embedding and classification can to be distributed to the other robots" [¶0131] "In addition to using the machine learning model 214 to predict a classification for the object 124, the robot 104D also uses the processing of the machine learning model 214 to generate an embedding 314 for the object 124. For example, the embedding may be derived from activations at one or more hidden layers of the model 214 and/or from data at an output layer of the model 214."). 

	Both the combination of Leroux and Spizhevoy as well as Rajkumar are directed towards distributed training of a neural network.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of the combination of Leroux and Spizhevoy with the teachings of Rajkumar by implementing an indication between the first and second device that a feature classification had met a certain threshold. Rajkumar teaches as a motivation for combination that the disclosed cloud computing method is advantageous due to its performance ([¶0138] “it is important for the process to be computationally efficient and power efficient. The architecture discussed here provides these benefits, since processing of the machine learning model can be very fast and efficient, and the lookup and comparison process for using datasets in the local cache can similarly be fast and efficient.”).  

Claim 18 is substantially similar to claim 14.  Therefore, the rejection applied to claim 14 also applies to claim 18.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126