Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 2 is objected to because of the following informalities:  Claim 2 depends on itself and should depend on claim 1. 
Claim 4 specifies “descrete” and probably meant to be discrete.
Claim 6 appears to be part of claim 5 and needs to be separated. Appropriate correction is required.
Claim 19 specifies “funtion” and should be function.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 16-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of the “machine readable medium” encompasses signals per se. A claim whose BRI covers both statutory and non-statutory embodiments embraces subject matter that is not eligible for patent protection and therefore is directed to non-statutory subject matter. See MPEP 2106.03(II). It is suggested that claim 1 be amended to recite a “non-transitory” machine readable medium to overcome this rejection. Although the disclosure specifies “In at least one embodiment, a computer-readable storage medium is a non-transitory computer-readable storage medium that excludes transitory signals” it is not clear that the claimed medium is non-transitory in all embodiments.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites
Step 1 analysis:
In the instant case, the claims are directed to a method/machine. Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A analysis:
Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. Specifically the abstract idea of “Mental Processes: Concepts performed in the human mind (including an observation, evaluation, judgment, opinion)”. 
2A Prong 1: The limitations as drafted, are a process that, under its broadest reasonable interpretation, in light of the disclosure encompasses a mental process of  
1. A processor comprising: one or more arithmetic logic units (ALUs) (see below)to determine an architecture of a neural network (mental process with assistance of pen and paper) based, at least in part, on comparing one or more performance characteristics of the neural network resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network (mental process of e.g., modeling or comparing data with assistance of pen and paper) but for the recitation of generic computer components. 
That is, other than reciting “neural network and ALU” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “ALU” language, “determine” in the context of this claim encompasses the user manually determining a model/NN. 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of NN and ALU. The NN and processor is recited at a high-level of generality (i.e., as a generic processor performing mathematical calculations) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of NN and ALU amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h)).  Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible.
2. The processor of claim 2 wherein: the one or more performance characteristics of the neural network resulting from training the neural network indicate how well the neural network performs using training data (mental process of modeling with assistance of pen and paper); and the one or more performance characteristics resulting from testing the neural network indicate how the neural network performs using validation data (mental process of modeling or comparing data with assistance of pen and paper).  
3. The processor of claim 1, wherein: the architecture of the neural network is evaluated using an objective function based on a generalization error; and the generalization error is based at least in part on a difference between a validation loss and a training loss(mental process of modeling or comparing data with assistance of pen and paper).  
4. The processor of claim 1, wherein determining an architecture of the neural network is performed using a gradient estimate of a differentiable loss function based at least in part on a relaxation of descrete variables (mental process of modeling or comparing data with assistance of pen and paper).  
5. The processor of claim 1, wherein determining an architecture of the neural network is performed using a surrogate function that represents a non-differentiable loss function(mental process of modeling or comparing data with assistance of pen and paper).
6. The processor of claim 4, wherein the differentiable loss function is a measure of a cross-entropy loss of the neural network (mental process of modeling or comparing data with assistance of pen and paper).  
7. The processor of claim 5, wherein the non-differentiable loss function is a measure of latency of the neural network (mathematical concepts).

2A Prong 1: The limitations as drafted, are a process that, under its broadest reasonable interpretation, in light of the disclosure encompasses a mental process of  
8. A system comprising: one or more processors to be configured to determine a network architecture by evaluating candidate neural network architectures, at least in part, by comparing one or more performance characteristics of a neural network resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network. (mental process of modeling or comparing data with assistance of pen and paper)
That is, other than reciting “processor” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “ALU” language, “determine” in the context of this claim encompasses the user manually determining a model/NN. 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of neural network and processors. The NN and processor are recited at a high-level of generality (i.e., as a generic processor performing mathematical calculations) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of NN and processors amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h)).  Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible.
  
9. The system of claim 8, wherein: the one or more performance characteristics of the neural network resulting from training the neural network indicate how well the neural network performs using training data; and the one or more performance characteristics resulting from testing the neural network indicate how the neural network performs using validation data(mental process of modeling or comparing data with assistance of pen and paper).  
10. The system of claim 8, wherein: the network architecture is determined using a gradient of a differentiable loss; and the gradient is estimated using a Gumbel-Softmax distribution. (mental process of modeling or comparing data with assistance of pen and paper)  
11. The system of claim 8, wherein: the network architecture is determined using a gradient of a non-differentiable loss function; and the non-differentiable loss is approximated with a surrogate function(mental process of modeling or comparing data with assistance of pen and paper).  
12. The system of claim 10, wherein the differentiable loss function is based at least in part on cross-entropy loss. (mental process of modeling or comparing data with assistance of pen and paper) 
13. The system of claim 11, wherein the non-differentiable loss is based at least in part on network latency or network accuracy. (mental process of modeling or comparing data with assistance of pen and paper)  
14. The system of claim 8, wherein the network architecture is determined using differentiable architecture search(mental process of modeling or comparing data with assistance of pen and paper).  
15. The system of claim 8, wherein sparsity of the network architecture is enforced by using a paired-input cell structure. (mental process of modeling or comparing data with assistance of pen and paper) .

2A Prong 1: The limitations as drafted, are a process that, under its broadest reasonable interpretation, in light of the disclosure encompasses a mental process of  
16. A machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least determine an architecture for a neural network by searching candidate neural network architectures using a gradient estimate of a loss, and evaluating candidate neural network architectures based on one or more performance characteristics of the neural network resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network.  
 (mental process of modeling or comparing data with assistance of pen and paper)
That is, other than reciting “neural network and processor” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “NN and processor” language, “determine” in the context of this claim encompasses the user manually determining a model/NN. 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of processors. The NN and processor is recited at a high-level of generality (i.e., as a generic processor performing mathematical calculations) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of NN and processors amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h)).  Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible.

17. The machine-readable medium of claim 16, wherein: the loss is non-differentiable; and the gradient is estimated using a neural network trained to model a loss function. (mental process of modeling or comparing data with assistance of pen and paper) 
18. The machine-readable medium of claim 16, wherein: the loss is differentiable; and the gradient is estimated using samples from a conditional Gumbel-Softmax distribution. (mental process of modeling or comparing data with assistance of pen and paper)  
19. The machine-readable medium of claim 16, wherein: the architecture is selected based on an objective function; and the objective function is a function of latency and accuracy. (mental process of modeling or comparing data with assistance of pen and paper) 
20. The machine-readable medium of claim 16, wherein: an input selector selects two nodes of previous nodes; and an operation selector selects two operations for each input (mental process of selecting).  
21. The machine-readable medium of claim 16, wherein the process of estimating the gradient does not introduce bias to the gradient (mathematical concepts).  

2A Prong 1: The limitations as drafted, are a process that, under its broadest reasonable interpretation, in light of the disclosure encompasses a mental process of  
22. An autonomous vehicle comprising a neural network with a network architecture selected from a plurality of candidate network architectures based on one or more performance characteristics of the neural network resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network. 
 (mental process of modeling or comparing data with assistance of pen and paper)
That is, other than reciting “neural network and autonomous vehicle” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “NN and autonomous vehicle” language, “determine” in the context of this claim encompasses the user manually determining a model/NN. 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of NN and autonomous vehicle. The NN and autonomous vehicle is recited at a high-level of generality (i.e., as a generic processor performing mathematical calculations) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of NN and autonomous vehicle amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h)).  Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible.
 
23. The autonomous vehicle(additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h)) of claim 22, wherein: an input selector selects two nodes of previous nodes; and an operation selector selects two operations for each input (mental process of selecting an operation).  
24. The autonomous vehicle (additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h))of claim 22, wherein: the autonomous vehicle includes a camera that captures an image; and the image is processed by the neural network to identify an object in the image. (vehicle and camera are additional elements considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h));
Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception – see MPEP 2106.05(d) and Berkheimer memo.
 
25. The autonomous vehicle (additional element considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h)) of claim 22, wherein: the autonomous vehicle includes a camera that captures an image; and the image is processed by the neural network to produce a vehicle control signal.  (vehicle and camera are additional elements considered to be generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h));
Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception – see MPEP 2106.05(d) and Berkheimer memo.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Chang et al. (Differentiable Architecture Search with Ensemble Gumbel-Softmax, 2018) in view of Talathi (US 2017/0061326) and Merrill (US 2019/0378210).
Chang discloses:
1. A processor (see below) comprising: 
one or more arithmetic logic units (ALUs) (see below) to determine an architecture of a neural network (reads on architecture search, NAS, abstract, “efficient NAS method, Differentiable ARchiTecture Search with Ensemble Gumbel-Softmax (DARTS-EGS), which is capable of discovering more diversified network architectures”, intro) based, at least in part, on comparing one or more performance characteristics (training/discovering, Fig. 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of the neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, intro) resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network (Fig. 2, sections 3; Tables 1-2).
	Chang fails to particularly call for the details of the Gumbel-Softmax, how neural networks are inherently trained and the internal components therefore Chang fails to disclose a processor and an ALU.
	Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).
Merrill teaches using a processor and an ALU is well-known (“In some embodiments, the differentiable model at S240 is a neural network model. In some embodiments, the non-differentiable model at S230 is included in an ensemble that includes the non-differentiable model of S230.”, 0236) in a differentiable system (“ In some embodiments, the differentiable model at S240 is a neural network model. In some embodiments, the non-differentiable model at S230 is included in an ensemble that includes the non-differentiable model of S230.”, 0091; “In some embodiments, the differentiable model is a perceptron, a feed-forward neural network, an autoencoder, a probabilistic network, a convolutional neural network, a radial basis function network, a multilayer perceptron, a deep neural network, or a recurrent neural network, including: Boltzman machines, echo state networks, long short-term memory (LSTM), hierarchical neural networks, stochastic neural networks, and other types of differentiable neural networks, without limitation.”, 0113, 0128).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and adding an inherent processor or an ALU to a system can increase the efficiency and/or decrease latency. Neural network models are trained and optimized for the purpose of improving their ability and tuning them based on specific test/training data.

2. The processor of claim 2 wherein: 
the one or more performance characteristics of the neural network resulting from training the neural network (fig. 2) indicate how well the neural network performs using training data; and the one or more performance characteristics (table 2 and section 3.4) resulting from testing the neural network indicate how the neural network performs using validation data. (testing/training parameters, Fig. 2, Table 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of the neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, intro; 
Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

3. The processor of claim 1, wherein: the architecture of the neural network is evaluated using an objective function based on a generalization error (not further defined, reads on any error experienced during training or optimizing, Fig. 2); and the generalization error is based at least in part on a difference between a validation loss and a training loss. (using Gumbel Softmax, “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of the neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, intro; 
Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

4. The processor of claim 1, wherein determining an architecture of the neural network is performed using a gradient estimate (“pass gradients in a continuous manner”, section 3.2) of a “differentiable loss function” (not further defined, reads on training, Fig. 2, Table 2) based at least in part on a relaxation of descrete variables. (Table 2 and Fig. 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of the neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, Intro)

5. The processor of claim 1, wherein determining an architecture of the neural network is performed using a “surrogate function” (not further defined, reads on objective functions in Gumbel Softmax or Table 2 or Fig. 2) that represents a non-differentiable loss function. (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of the neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, intro; 
Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

6. The processor of claim 4, wherein the differentiable loss function is a measure of a cross-entropy loss of the neural network. Talathi teaches cross entropy is well-known (Given an input x, the machine learning model M.sub.λ produces an estimate for the output probability, which may be expressed as:
{circumflex over (p)}=M.sub.λ(x,W) (1) so as to minimize the multi-class cross entropy function”, 0034).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and a measure of quality can be the negative of an error between a training network output and a known output for the training network input, e.g., a cross-entropy error or a mean-square error.

7. The processor of claim 5, wherein the non-differentiable loss function is a measure of latency of the neural network (not further defined, reads on any delay/latency detected or improved upon while training or discovering neural networks, Chang:“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).

Claim Rejections - 35 USC § 103
Claims 8-21 are rejected under 35 U.S.C. 103 as being unpatentable over Chang et al. (Differentiable Architecture Search with Ensemble Gumbel-Softmax, 2018) in view of Talathi (US 2017/0061326).
Chang discloses:

8. A system comprising: one or more processors to be configured to determine a network architecture by evaluating candidate neural network architectures, at least in part, by comparing one or more performance characteristics (reads on architecture search, NAS, abstract, “efficient NAS method, Differentiable ARchiTecture Search with Ensemble Gumbel-Softmax (DARTS-EGS), which is capable of discovering more diversified network architectures”, intro) based, at least in part, on comparing one or more performance characteristics (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of a neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, intro) resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network. 
Chang fails to particularly call for the details of the Gumbel-Softmax, how neural networks are inherently trained.
	Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and neural network models are trained and optimized for the purpose of improving their ability and tuning them based on specific test/training data (Chang: Fig. 2).

9. The system of claim 8, wherein: the one or more performance characteristics (Table 2) of the neural network resulting from training the neural network (Fig. 2) indicate how well the neural network performs using training data; and the one or more performance characteristics resulting from testing the neural network indicate how the neural network performs using validation data. (testing data is shown in Table 2 and performed in Fig. 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
	Talathi also teaches the inherent details of how neural networks are tested (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

10. The system of claim 8, wherein: the network architecture is determined using a gradient of a differentiable loss; (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4)  and the gradient is estimated using a Gumbel-Softmax distribution (“pass gradients in a continuous manner”, section 3.2; “Towards achieving this goal, we develop a differentiable NAS solution, where the search space includes arbitrary feed-forward network consisting of the predefined number of connections. Benefiting from a proposed ensemble Gumbel-Softmax estimator, our method optimizes both the architecture of a deep network and its parameters in the same round of backward propagation, yielding an end-to-end mechanism of searching network architectures”, abstract, section 3.3.2).

11. The system of claim 8, wherein: the network architecture is determined using a gradient of a non-differentiable loss function; and the non-differentiable loss is approximated with a surrogate function. (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4) of the neural network (e.g., Fig. 2 or “neural architecture search (NAS)”, intro; 
Talathi also teaches non-differentiable objective function (Figs. 6; “FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

12. The system of claim 10, wherein the differentiable loss function is based at least in part on cross-entropy loss. Talathi teaches cross entropy is well-known (Given an input x, the machine learning model M.sub.λ produces an estimate for the output probability, which may be expressed as:
{circumflex over (p)}=M.sub.λ(x,W) (1) so as to minimize the multi-class cross entropy function”, 0034).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and a measure of quality can be the negative of an error between a training network output and a known output for the training network input, e.g., a cross-entropy error or a mean-square error.

13. The system of claim 11, wherein the non-differentiable loss is based at least in part on network latency or network accuracy. (not further defined, reads on any delay/latency detected or improved on while training or discovering neural networks, Chang: Fig. 2, Table 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4). Talathi also teaches non-differentiable objective function (Figs. 6; “FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

14. The system of claim 8, wherein the network architecture is determined using differentiable architecture search. (title of article;  abstract, “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).

15. The system of claim 8, wherein sparsity of the network architecture is enforced by using a paired-input cell structure (not further defined, reads on using a plurality of inputs as set forth in the training/discovery of NN architectures, Fig. 2, Table 2; section 3.2; or
Talathi: Figs. 6).

16. A machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least determine an architecture (reads on architecture search, NAS, abstract, “efficient NAS method, Differentiable ARchiTecture Search with Ensemble Gumbel-Softmax (DARTS-EGS), which is capable of discovering more diversified network architectures”, intro) for a neural network by searching candidate neural network architectures using a gradient estimate of a loss, and evaluating candidate neural network architectures (NAS, abstract) based on one or more performance characteristics of the neural network resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network. (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
Chang fails to particularly call for the details of the Gumbel-Softmax, how it uses processors, and how neural networks are inherently trained.
	Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085) and processors (“The processing system may be configured as a general-purpose processing system with one or more microprocessors providing the processor functionality and external memory providing at least a portion of the machine-readable media, all linked together with other supporting circuitry through an external bus architecture. Alternatively, the processing system may comprise one or more neuromorphic processors for implementing the neuron models and models of neural systems described herein. ”, 0112).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and neural network models are trained and optimized for the purpose of improving their ability and tuning them based on specific test/training data (Chang: Fig. 2) while using well-known processors to execute the Gumbel Softmax algorithm.

17. The machine-readable medium of claim 16, wherein: the loss is non-differentiable; and the gradient is estimated using a neural network trained to model a loss function. (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
Talathi teaches non-differentiable (Figs. 6, “FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

18. The machine-readable medium of claim 16, wherein: the loss is differentiable; and the gradient is estimated using samples from a conditional Gumbel-Softmax distribution. (“efficient NAS method, Differentiable ARchiTecture Search with Ensemble Gumbel-Softmax (DARTS-EGS), sections 3.3-3.4 which is capable of discovering more diversified network architectures”, intro; section 3.3.2; “pass gradients in a continuous manner”, section 3.2).

19. The machine-readable medium of claim 16, wherein: the architecture is selected based on an objective function; and the objective function is a function of latency and accuracy (“In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
Talathi also teaches objective functions (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

20. The machine-readable medium of claim 16, wherein: an input selector selects two nodes of previous nodes; and an operation selector selects two operations for each input. (not further defined, reads on using a plurality of inputs selected over time in the training/discovery of NN architectures, Fig. 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
Talathi also teaches objective functions (Figs. 6; “FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085).

21. The machine-readable medium of claim 16, wherein the process of estimating the gradient does not introduce bias to the gradient (reads on using Gumbel Softmax and pass gradients in a continuous manner, section 3.2).

Claim Rejections - 35 USC § 103
Claims 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chang et al. (Differentiable Architecture Search with Ensemble Gumbel-Softmax, 2018) and Talathi (US 2017/0061326), in view of Fidler (US 2019/0294970) and examiner’s official notice.
Chang discloses
22. An autonomous vehicle comprising a neural network with a network architecture selected from a plurality of candidate network architectures based on one or more performance characteristics (reads on architecture search, NAS, abstract, “efficient NAS method, Differentiable ARchiTecture Search with Ensemble Gumbel-Softmax (DARTS-EGS), which is capable of discovering more diversified network architectures”, intro) of the neural network resulting from training the neural network to the one or more performance characteristics resulting from testing the neural network.
(Chang: Fig. 2, Table 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
Change fails to particularly call for the details of the Gumbel-Softmax, well-known autonomous vehicles, and how neural networks are inherently trained.
	Talathi teaches the inherent details of how neural networks are trained (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085) and vehicles (“For instance, a network 300 designed to recognize visual features from a car-mounted camera may develop high layer neurons with different properties depending on their association with the lower versus the upper portion of the image. Neurons associated with the lower portion of the image may learn to recognize lane markings, for example, while neurons associated with the upper portion of the image may learn to recognize traffic lights, traffic signs, and the like”, 0060; “ the classification of motorized vehicles may benefit from first learning to recognize wheels, windshields, and other features. These features may be combined at higher layers in different ways to recognize cars, trucks, and airplanes.”, 0057).
Fidler teaches autonomous vehicles (“FIG. 1, a polygon object recognition model may be applied to a variety of dataset domains, including autonomous driving imagery, medical imagery, and aerial imagery, and may also be applied to other general scenes”, 0051; “it is often critical in the domain of autonomous driving to localize and outline all cars, pedestrians, and miscellaneous static and dynamic objects”, 0002).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and neural network models are trained and optimized for the purpose of improving their ability and tuning them based on specific test/training data (Chang: Fig. 2) while using well-known autonomous vehicles for the purpose of having an optimized neural network in a vehicle.  The examiner takes official notice that it is known to feed image signals back to an autonomous vehicle for the purpose of controlling/braking the vehicle as in collision avoidance.

23. The autonomous vehicle of claim 22, wherein: an input selector selects two nodes of previous nodes; and an operation selector (reads on an algorithm run by a Gumbel softmax system)  selects two operations for each input. (not further defined, reads on using a plurality of inputs selected over time in the training/discovery of NN architectures, Fig. 2; “In NAS, desired methods should not only show effectiveness (i.e., test-set accuracy) but also possess superior efficiency (i.e., search cost), “the validation-set accuracy and search time are employed to represent the effectiveness and efficiency of our model”, 3.2.2; “the validation set performance is considered as the reward in our model, but using an end-to end differentiable manner. Denote the training loss as Lw(A). The goal in architecture search is to find a high performance architecture, i.e., min w,α EA∼pα(A) [Lw(A)] . (9) The main process of optimizing this objective is to minimize the expected performance of architectures sampled with pα(A). That is, the network A is first sampled with pα(A). Afterward, the loss on the training dataset can be calculated by forward propagation. Relying on this loss, the gradients of the network architecture parameter α and the network parameter w are yielded to modify these parameters better”, 3.4).
Talathi also teaches objective functions (“FIG. 7 presents a schematic diagram of an exemplary classifier 700 to improve the performance of a trained machine learning model (e.g., a neural network) in accordance with aspects of the present disclosure. Referring to FIG. 7, a non-differentiable objective function, O, is added at the output of the classifier (regression) layer of the neural network. The objective function may be specified such that the maximum non-zero value for the objective function for a given training (or testing) dataset will only occur when the number of training (testing) errors are below those obtained for the original trained neural network.”, 0085; “ the classification of motorized vehicles may benefit from first learning to recognize wheels, windshields, and other features. These features may be combined at higher layers in different ways to recognize cars, trucks, and airplanes.”, 0057).

24. The autonomous vehicle of claim 22, wherein: the autonomous vehicle includes a camera that captures an image; and the image is processed by the neural network to identify an object in the image.
Chang fails to call for installing the system in an autonomous vehicle.
 Talathi teaches cameras in or on vehicles (“For instance, a network 300 designed to recognize visual features from a car-mounted camera may develop high layer neurons with different properties depending on their association with the lower versus the upper portion of the image. Neurons associated with the lower portion of the image may learn to recognize lane markings, for example, while neurons associated with the upper portion of the image may learn to recognize traffic lights, traffic signs, and the like”, 0060; “ the classification of motorized vehicles may benefit from first learning to recognize wheels, windshields, and other features. These features may be combined at higher layers in different ways to recognize cars, trucks, and airplanes.”, 0057). 
Fidler teaches autonomous vehicles (“FIG. 1, a polygon object recognition model may be applied to a variety of dataset domains, including autonomous driving imagery, medical imagery, and aerial imagery, and may also be applied to other general scenes”, 0051; “it is often critical in the domain of autonomous driving to localize and outline all cars, pedestrians, and miscellaneous static and dynamic objects”, 0002).
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and neural network models are trained and optimized for the purpose of improving their ability and tuning them based on specific test/training data (Chang: Fig. 2) while using well-known autonomous vehicles for the purpose of having an optimized neural network in a vehicle that can be used for imaging. The examiner takes official notice that it is known to feed image signals back to an autonomous vehicle for the purpose of controlling/braking the vehicle as in collision avoidance.


25. The autonomous vehicle of claim 22, wherein: the autonomous vehicle includes a camera that captures an image; and the image is processed by the neural network to produce a vehicle control signal. Talathi teaches cameras in or on vehicles (“For instance, a network 300 designed to recognize visual features from a car-mounted camera may develop high layer neurons with different properties depending on their association with the lower versus the upper portion of the image. Neurons associated with the lower portion of the image may learn to recognize lane markings, for example, while neurons associated with the upper portion of the image may learn to recognize traffic lights, traffic signs, and the like”, 0060; “ the classification of motorized vehicles may benefit from first learning to recognize wheels, windshields, and other features. These features may be combined at higher layers in different ways to recognize cars, trucks, and airplanes.”, 0057). 
Fidler teaches autonomous vehicles (“FIG. 1, a polygon object recognition model may be applied to a variety of dataset domains, including autonomous driving imagery, medical imagery, and aerial imagery, and may also be applied to other general scenes”, 0051; “it is often critical in the domain of autonomous driving to localize and outline all cars, pedestrians, and miscellaneous static and dynamic objects”, 0002).
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and neural network models are trained and optimized for the purpose of improving their ability and tuning them based on specific test/training data (Chang: Fig. 2) while using well-known autonomous vehicles for the purpose of having an optimized neural network in a vehicle that can be used for imaging or braking when objects are detected. The examiner takes official notice that it is known to feed image signals back to an autonomous vehicle for the purpose of controlling/braking the vehicle as in collision avoidance.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Odena (US 2019/0236438) teaches neural networks have latency requirements (“system memory, processing power, and processing time, used by a task neural network to account for the amount of resources available for use by the systems, quality requirements for the outputs generated by the task neural network, latency requirements for generating an output, or other factors that impact how many resources should be consumed by processing a given network input”, 0006) and cross entropy (“ if the set of usage factors includes the quality usage factor, the value can be a measure of quality of the training network output relative to a known output for the training network output. For example, the measure of quality can be the negative of an error between the training network output and a known output for the training network input, e.g., a cross-entropy error or a mean-square error. As another example, the measure of quality can be the likelihood or the log likelihood assigned to the known output by the training network output.”, 0083).


Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080. The examiner can normally be reached ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID R VINCENT/Primary Examiner, Art Unit 2123