DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2019-08-13 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Status
Claims 1-20 are pending in the application.  Claims 21-49 are cancelled.
Specification
The disclosure is objected to because of the following informalities: Fig. 2A Element 218 is referred to in three different ways:
“Ensemble Model Executor” in the Drawing Fig. 2A, and in [0045], [0088], and [0089]
“Example Ensemble Model Executor” in [0018], [0023], and [0029]
“Example Model Executor” in [0020] and [0045]
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
The claim limitations being interpreted under 35 U.S.C. 112(f) are the following limitations of Claim 18:
“means for acquiring a model”; 
“means for identifying a number of exit points to place in the model”; 
“means for selecting exit points to be enabled in the model”; 
“means for generating an additional model structure to calculate an output at each respective exit point”.  
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 18 limitations:
“means for acquiring a model”; 
“means for identifying a number of exit points to place in the model”; 
“means for selecting exit points to be enabled in the model”; 
“means for generating an additional model structure to calculate an output at each respective exit point”; 
invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.  While instant specification [0057] states:  “Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor”, merely reciting a “processor” is not sufficient to impart structure to a computer-implemented means-plus-function limitation, as an algorithm must be recited.  See MPEP 2181(II)(B):  “For a computer-implemented 35 U.S.C. 112(f)  claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b)”.   The following statements in Instant Specification [0025-0028] also do not disclose a sufficient algorithm:  “The model acquirer 230 may implement means for acquiring”; “The example exit point quantity identifier 235 may implement means for identifying”; “The example exit point selector 240 may implement means for selecting”; “The example exit output generator 245 may implement means for generating”. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically a mental process, without significantly more. 
Step 1:
Claims 1-7 are directed to an apparatus, Claims 8-17 are directed to a non-transitory computer readable medium, Claim 18 is directed to an apparatus, and Claims 19-20 are directed to a method.  As for the apparatus claims 1-7 and 18, Instant Specification [0029] recites:  “When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example, ensemble model generator 215, the example ensemble model executor 218, the example adversarial attack identifier 220, the example adversarial attack indicator 225, the example model acquirer 230, the example exit point quantity identifier 235, the example exit point selector 240, and the example exit output generator 245 is/are hereby expressly defined to include a non-transitory computer readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. including the software and/or firmware.”  Thus, all claims are directed to one of the four statutory categories of patentable subject matter.
Step 2A Prong 1:
Claims 1, 8, 18, and 19 recite:
Identifying a number of exit points to place in the model; identifying can be done in the human mind or with pen and paper, and is thus a mental process
selecting exit points to be enabled in the model; selecting can be done in the human mind or with pen and paper, and is thus a mental process
generating an additional model structure to calculate an output at each respective exit point; generating a model and performing a calculation can be done by a human with pen and paper, and is thus a mental process
Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional element of acquiring the model amounts to insignificant extra solution activity (mere data gathering, see MPEP 2106.05(g)(3))
Step 2B:
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as discussed above, the additional element of acquiring the model amounts to insignificant extra solution activity (mere data gathering, see MPEP 2106.05(g)(3)).
Dependent Claims 2-7, 9-17, and 20 are also directed to an abstract idea, for the following reasons:
Claims 2, 9, and 20 recite acquiring multiple trained models.  As discussed above, acquiring models amounts to insignificant extra solution activity (mere data gathering, see MPEP 2106.05(g)(3)).
Claims 3 and 10 recite determining the number of exit points to be placed using a count of convolutional layers in the model; this can be performed in the human mind or with pen and paper, and is thus a mental process.
Claims 4 and 11 recite determining the number of exit points to be placed by mapping a type of layer to the number of exit points; this can be performed in the human mind or with pen and paper, and is thus a mental process.
Claims 5 and 12 recite identifying the exit points using cross entropy loss; this can be performed in the human mind or with pen and paper, and is thus a mental process.
Claims 6 and 13 recite creating the additional model structure using an insertion of a fully connected layer and a softmax layer; this can be performed by a human with pen and paper, and is thus a mental process.
Claims 7 and 14 recite creating the additional model structure at each exit point that incorporates a calculated importance weight; this can be performed by a human with pen and paper, and is thus a mental process
Claim 15 recites “in response to a generation of the additional model structures, generate a structure to aggregate output data for every exit point”; this can be performed by a human with pen and paper, and is thus a mental process.
Claim 16 recites “aggregate the data into an array of the output and confidence score associated with each exit location” ; this can be performed by a human with pen and paper, and is thus a mental process.
Claim 17 recites “indicate whether an adversarial attack has been detected”; this can be performed in the human mind or with pen and paper, and is thus a mental process.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-8, 13-16, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Teerapittayanon et al. (“BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks”; hereinafter “Teerapittayanon”) in view of Demir (“Early-Exit Convolutional Neural Networks”).
As per Claim 1, Teerapittayanon teaches an apparatus to generate an ensemble model from a trained machine learning model, the apparatus comprising: (Teerapittayanon, Page 4 Section IV Para 1, discloses an apparatus:   “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Teerapittayanon, Page 4 Section V A, discloses:  “Two important hyperparameters of BranchyNet are the weights wn in joint optimization (Section III-B) and the exit thresholds T for the fast inference algorithm described in Figure 2. When selecting the weight of each branch, we observed that giving more weight to early branches improves the accuracy of the later branches due to the added regularization.”  Here, Teerapittayanon discloses an ensemble model, as an ensemble combines weighted results (“weight of each branch”) to come to a final results.  This is generated from a trained machine learning model, as disclosed on Teerapittayanon Page 4 Section IV Para 3: “We initialize B-LeNet, B-AlexNet and B-ResNet with weights trained from LeNet, AlexNet and ResNet respectively.”)
a model acquirer to acquire the model (Teerapittayanon Page 4 Section IV Para 3, discloses starting by acquiring a baseline network:  “We found the initializing each BranchyNet network with the weights trained from the baseline network improved the classification accuracy of the network by several percent over random initialization.”)
and an exit output generator to generate an additional model structure to calculate an output at each respective exit point. (Teerapittayanon Page 2 Top, discloses the details of the structure of each exit point (“branch”), which comprise more convolutional and fully connected layers:  “For LeNet-5 [15] which consists of 3 convolutional layers and 2 fully-connected layers, we add a branch consisting of 1 convolutional layer and 1 fully-connected layer after the first convolutional layer of the main network. For AlexNet [13] which consists of 5 convolutional layers and 3 fully-connected layers, we add 2 branches. One branch consisting of 2 convolutional layers and 1 fully-connected layer is added after the 1st convolutional layer of the main network, and another branch consisting of 1 convolutional layer and 1 fully-connected layer is added after the 2nd convolutional layer of the main network. For ResNet-110 [7] which consists of 109 convolutional layers and 1 fully-connected layer, we add 2 branches. One branch consisting of 3 convolutional layers and 1 fully-connected layer is added after the 2nd convolutional layer of the main network, and the second branch consisting of 2 convolutional layers and 1 fully-connected layer is added after the 37th convolutional layer of the main network.”)
However, Teerapittayanon does not explicitly teach an exit point quantity identifier to determine a number of exit points to place in the model; an exit point selector to select exit points to be enabled in the model.  Rather, Teerapittayanon teaches a fixed predetermined exit points manually selected by the implementers, as shown on Page 2 Section III where they merely recite “certain locations”:  “BranchyNet modifies the standard deep network structure by adding exit branches (also called side branches or simply branches for brevity), at certain locations throughout the network.”  This is reinforced by Page 5 Section A Last Paragraph:  “Future work will be to derive an algorithm to find the optimal placement locations of the branches automatically”.
Demir teaches an exit point quantity identifier to determine a number of exit points to place in the model; an exit point selector to select exit points to be enabled in the model (Demir, Page 23 Section 3.5, discloses:  “The number of early-exit (EE) blocks and their distribution technique is another important factor in the architecture of the model. The number of EE-blocks depends on the depth of the network…The EE-blocks can be distributed based on the dataset and the capacity of the network. EENets are suitable for many distribution methods such as; Pareto, Golden Ratio, Fine, Linear, Quadratic, etc”.  Here, Demir discloses to determine a number of exit points (“number of early-exit blocks”) and to select exit points (“and their distribution”)).
Teerapittayanon and Demir are analogous art because they are both in the field of endeavor of early-exit neural networks.
It would have been obvious before the effective filing date of the claimed invention to combine the branchy network with fixed branch locations of Teerapittayanon with the calculated branch locations of Demir.  One of ordinary skill in the art would be motivated to do so in order to gain efficiency by optimizing the computational cost of model operations (Demir, Page 24:  “The EE-blocks can be distributed based on the dataset and the capacity of the network. EENets are suitable for many distribution methods such as; Pareto, Golden Ratio, Fine, Linear, Quadratic, etc. According to the Pareto principle, 80% of the results have been done by 20% of works. The Pareto distribution is inspired by that principle (i.e. 80% of examples may be classified just by spending 20% of the total computational cost of the model)…The Linear and Quadratic distributions split the network where the computational cost of the layers between two consecutive EE-blocks increases in linear or quadratic form, respectively.)

As per Claim 6, the combination of Teerapittayanon and Demir teaches the apparatus of claim 1.  Teerapittayanon teaches wherein the exit output generator creates the additional model structure using an insertion of a fully connected layer and a softmax layer. (Teerapittayanon, Page 4 Top Left, discloses a softmax layer in the exit branch:  “For each exit point, the input sample is fed through the corresponding branch. The procedure then calculates the softmax and entropy of the output and checks if the entropy is below the exit point threshold Tn.”  Teerapittayanon, Page 4 Section IV Para 3, discloses fully connected layers in the exit branch:  “For LeNet-5…we add a branch consisting of 1 convolutional layer and 1 fully-connected layer after the first convolutional layer of the main network. For AlexNet [13]…One branch consisting of 2 convolutional layers and 1 fully-connected layer is added after the 1st convolutional layer of the main network, and another branch consisting of 1 convolutional layer and 1 fully-connected layer is added after the 2nd convolutional layer of the main network. For ResNet-110…we add 2 branches. One branch consisting of 3 convolutional layers and 1 fully-connected layer is added after the 2nd convolutional layer of the main network, and the second branch consisting of 2 convolutional layers and 1 fully-connected layer is added after the 37th convolutional layer of the main network.”)

As per Claim 7, the combination of Teerapittayanon and Demir teaches the apparatus of claim 1.  Teerapittayanon teaches wherein the exit output generator creates the additional model structure at each exit point that incorporates a calculated importance weight (Teerapittayanon, Page 4 Section V A, discloses an importance weight for each branch:  “Two important hyperparameters of BranchyNet are the weights wn in joint optimization (Section III-B) and the exit thresholds T for the fast inference algorithm described in Figure 2. When selecting the weight of each branch, we observed that giving more weight to early branches improves the accuracy of the later branches due to the added regularization”).

As per Claims 8, 13, and 14, these claims are non-transitory computer readable medium claims corresponding to apparatus claims 1, 6, and 7, respectively..  The difference is that they recite a non-transitory computer readable medium and a processor.  Teerapittayanon, Page 4 Section IV Para 1, discloses a non-transitory computer readable medium and a processor:   “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Claims 8, 13, and 14 are rejected for the same reasons as Claims 1, 6, and 7, respectively.

As per Claim 15, the combination of Teerapittayanon and Demir teaches the non-transitory computer readable medium of Claim 8.  Teerapittayanon teaches wherein the instructions, when executed, further cause the at least one processor to, in response to a generation of the additional model structures, generate a structure to aggregate output data for every exit point.  (Recall from Claim 1 that Teerapittayanon teaches additional model structures in each exit branch.  Teerapittayanon, Page 2, under “Regularization via Joint Optimization”, discloses a structure to aggregate output data for every exit point:  “BranchyNet jointly optimizes the weighted loss of all exit points. Each exit point provides regularization on the others, thus preventing overfitting and improving test accuracy.”)

As per Claim 16, the combination of Teerapittayanon and Demir teaches the non-transitory computer readable medium of Claim 15.  Teerapittayanon teaches wherein the instructions, when executed, further cause the at least one processor to aggregate the data into an array of the output and confidence score associated with each exit location (Teerapittayanon , Page 2 Top Left, discloses a confidence score for each exit:  “At each exit point, BranchyNet uses the entropy of a classification result (e.g., by softmax) as a measure of confidence in the prediction. If the entropy of a test sample is below a learned threshold value, meaning that the classifier is confident in the prediction, the sample exits the network with the prediction result at this exit point, and is not processed by the higher network layers. If the entropy value is above the threshold, then the classifier at this exit point is deemed not confident, and the sample continues to the next exit point in the network. If the sample reaches the last exit point, which is the last layer of the baseline neural network, it always performs classification.”)

As per Claim 18, this claim is a method claim corresponding to apparatus claim 1.  The difference is that it recites means, for which the Instant Specification [0057] states:  “Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor”.  Teerapittayanon, Page 4 Section IV Para 1, discloses a processor:   “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Claim 18 is rejected for the same reasons as Claim 1.

As per Claim 19, this claim is a method claim corresponding to apparatus claim 1.  The difference is that it recites a processor.  Teerapittayanon, Page 4 Section IV Para 1, discloses a processor:   “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Claim 19 is rejected for the same reasons as Claim 1.

Claims 2-4, 9-11, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Teerapittayanon and Demir, further in view of Bolukbasi et al. (“Adaptive Neural Networks for Efficient Inference”; hereinafter “Bolukbasi”).
As per Claim 2, the combination of Teerapittayanon and Demir teaches the apparatus of claim 1.  However, the combination of Teerapittayanon and Demir does not explicitly teach wherein the model obtained by the model acquirer includes multiple trained models.
Bolukbasi teaches wherein the model obtained by the model acquirer includes multiple trained models. (Bolukbasi, Page 4 Section 4 Para 2, discloses:  “As an example, assume we have three pre-trained networks, N1, N2, and N3. For an example x, denote the predictions for the networks as N1(x), N2(x), and N3(x). Additionally, denote the evaluation times for each of the networks as τ (N1), τ (N2), and τ (N3).”)
Bolukbasi and the combination of Teerapittayanon and Demir are analogous art because they are both in the field of endeavor of neural network architecture.
It would have been obvious before the effective filing date of the claimed invention to combine the early exit branching system of Teerapittayanon and Demir with the concatenated cascaded networks of Bolukbasi.  Combining these would allow one to cascade diverse types of networks with different levels of complexity from simple to complex.  One of ordinary skill in the art would be motivated to do so because exiting early can save a lot of time and resources  (Bolukbasi, Page 1, Abstract:  “We show that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples”)

As per Claim 3, the combination of Teerapittayanon and Demir teaches the apparatus of claim 1 as well as exit point quantity identifier is to determine the number of exit points to be placed (see Rejection to Claim 1).  However, the combination of Teerapittayanon and Demir does not explicitly teach wherein the exit point quantity identifier is to determine the number of exit points to be placed using a count of convolutional layers in the model.
Bolukbasi teaches wherein the exit point quantity identifier is to determine the number of exit points to be placed using a count of convolutional layers in the model. (Recall in Claim 1 that Demir disclosed determining a number of exit points.  Bolukbasi Page 3 Section 3 Para 2, discloses:  “As a running DNN example, we consider the AlexNet architecture (Krizhevsky et al., 2012), which is composed of 5 convolutional layers followed 3 fully connected layers. During evaluation of the network, computing each convolutional layer takes more than 3 times longer than computing a fully connected layer, so we consider a system that allows an example to exit the network after each of the first 4 convolutional layers.”  Here, Bolukbasi discloses placing an exit point after each convolutional layer, except for the last one.)
Bolukbasi and the combination of Teerapittayanon and Demir are analogous art because they are both in the field of endeavor of neural network architecture.
It would have been obvious before the effective filing date of the claimed invention to combine the early exit branching system of Teerapittayanon and Demir with the exit branch after each convolutional layer of Bolukbasi.  One of ordinary skill in the art would be motivated to do so because convolutional layers are expensive, and skipping some can save on time and resources (Bolukbasi, Page 3 Section 3 Para 2:  “computing each convolutional layer takes more than 3 times longer than computing a fully connected layer”)

As per Claim 4, the combination of Teerapittayanon and Demir teaches the apparatus of claim 1 as well as exit point quantity identifier is to determine the number of exit points to be placed (see Rejection to Claim 1).  However, the combination of Teerapittayanon and Demir does not explicitly teach wherein the exit point quantity identifier is to determine the number of exit points to be placed by mapping a type of layer to the number of exit points.
Bolukbasi teaches wherein the exit point quantity identifier is to determine the number of exit points to be placed by mapping a type of layer to the number of exit points. (Recall in Claim 1 that Demir disclosed determining a number of exit points.  Bolukbasi Page 3 Section 3 Para 2, discloses:  “As a running DNN example, we consider the AlexNet architecture (Krizhevsky et al., 2012), which is composed of 5 convolutional layers followed 3 fully connected layers. During evaluation of the network, computing each convolutional layer takes more than 3 times longer than computing a fully connected layer, so we consider a system that allows an example to exit the network after each of the first 4 convolutional layers.”  Here, Bolukbasi discloses placing an exit point after each type of layer (“convolutional layer”), except for the last one.)
Bolukbasi and the combination of Teerapittayanon and Demir are analogous art because they are both in the field of endeavor of neural network architecture.
It would have been obvious before the effective filing date of the claimed invention to combine the early exit branching system of Teerapittayanon and Demir with the exit branch after each convolutional layer of Bolukbasi.  One of ordinary skill in the art would be motivated to do so because convolutional layers are expensive, and skipping some can save on time and resources (Bolukbasi, Page 3 Section 3 Para 2:  “computing each convolutional layer takes more than 3 times longer than computing a fully connected layer”)

As per Claims 9-11, these claims are non-transitory computer readable medium claims corresponding to apparatus claims 2-4, respectively.  The difference is that they recite a non-transitory computer readable medium and a processor.  Teerapittayanon, Page 4 Section IV Para 1, discloses a non-transitory computer readable medium and a processor:   “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Claims 9-11 are rejected for the same reasons as Claims 2-4, respectively.

As per Claim 20, this claim is a method claim corresponding to apparatus claim 2.  The difference is that it recites a processor.  Teerapittayanon, Page 4 Section IV Para 1, discloses a processor:   “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Claim 20 is rejected for the same reasons as Claim 2.


Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Teerapittayanon and Demir, further in view of Benyahia et al. (US 2020/0104688 A1; hereinafter “Benyahia”).
As per Claim 5, the combination of Teerapittayanon and Demir teaches the apparatus of claim 1 as well as exit point selector identifies the exit points (see Rejection to Claim 1).  However, the combination of Teerapittayanon and Demir does not explicitly teach wherein the exit point selector identifies the exit points using cross entropy loss.
Benyahia teaches wherein [the exit point selector] identifies the [exit points] architecture using cross entropy loss. (Recall above Demir teaches identifying the exit points.  Benyahia [0004] discloses:  “Systems and/or methods are provided for neural architecture search.”  Benyahia continues to disclose that each candidate architecture is called a “subgraph” in [0039]: “The controller may be trained to suggest candidate models, e.g., from a subset of options of nodes for the model and possible subgraphs”, and suggests training each candidate model architecture until a preferred is identified in [0040]: “Identifying the preferred model may be an iterative process in which a plurality of candidate models are obtained which are then sequentially trained and then evaluated. Based on an evaluation (e.g., analysis) of the trained models, the controller may be updated. The updated controller may then suggest a subsequent set of candidate models for sequential training and evaluation. This process may be repeated until a preferred model is identified.”  Benhayhia then in [0093-0097] discloses that each architecture has a loss function, comprising cross-entropy loss, which is minimized:  “A loss function custom-character.sub.WPL.sub.mj may be defined, e.g., for each architecture (subgraph)… The weight plasticity loss function for each weighting may also comprise an indication of the cross-entropy loss function for that subgraph… When each of the candidate models is trained, the weight plasticity loss function may be minimized.”  Finally, in [0108-0109], Benyahia discloses that the architecture is chosen based on this:  “At step 430, each of the subgraphs is trained in an iterative fashion, as described above. That is, the training uses training data and minimizes a weight plasticity loss function so that updates to weightings of each candidate model are controlled—e.g., based on how important those weightings are to other previously trained candidate models. At step 440, the subgraphs are evaluated, e.g., using validation data sets. In this regard, at this stage, each of the candidate models has been trained.. The models are then tested on a validation data set so that a best candidate model may be identified. The best candidate model may then be evaluated (e.g., by a controller)).
Benyahia and the combination of Teerapittayanon and Demir are analogous art because they are both in the field of endeavor of neural network architecture.
It would have been obvious before the effective filing date of the claimed invention to combine the early exit branching system of Teerapittayanon and Demir with the architecture search of Benyahia.  Searching for the optimal arrangement of exit blocks, as suggested by Demir, is a selection between alternative architectures.  Thus, Benyahia’s method of iteratively testing different architectures for the best cross entropy loss can be applied to the early exit architectures of Teerapittayanon and Demir.  One of ordinary skill in the art would be motivated to do so in order to save on time and resources in the use of the final neural network (Benyahia [0021]:  “As the training of a network may be very time-consuming, initially selecting suitable neurons may provide a significant reduction in the time, and resources, taken to provide a neural network that performs satisfactorily for a selected task. In addition to increasing the efficiency of training a neural network, the ability to select a suitable configuration of neurons may make the difference between the provision of a neural network that may solve a technical problem, and one that cannot.”)

As per Claim 12, this claim is a non-transitory computer readable medium claim corresponding to apparatus claim 5.  The difference is that it recites a non-transitory computer readable medium and a processor.  Teerapittayanon, Page 4 Section IV Para 1, discloses a non-transitory computer readable medium and a processor:  “We use a 3.0GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITAN X (Maxwell) 12GB GPU.”  Claim 12 is rejected for the same reasons as Claim 5.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Teerapittayanon and Demir, further in view of Chen et al. (US 2020/0226459 A1; hereinafter “Chen”).
As per Claim 17, the combination of Teerapittayanon and Demir teaches the non-transitory computer readable medium of Claim 15.  However, the combination of Teerapittayanon and Demir does not explicitly teach wherein the instructions, when executed, further cause the at least one processor to indicate whether an adversarial attack has been detected.
Chen teaches wherein the instructions, when executed, further cause the at least one processor to indicate whether an adversarial attack has been detected. (Chen, Para [0028], discloses using an ensemble of models to detect an adversarial attack:  “In one or more embodiments, different precision neural networks of the same neural network architecture may produce very different responses to adversarial inputs in whereas for normal (e.g., non-adversarial) input data the same neural network may produce substantially similar responses. In one or more embodiments, the system calculates a difference metric between the responses of each neural network and compares the difference metric to a predetermined threshold value. In one or more embodiments, if the difference metric is less than or equal to the predetermined threshold value, the system determines that the input data does not include adversarial data and classifies the input data based upon the outputs of the neural networks. In one or more embodiments, if the difference metric is greater than the predetermined threshold value, the system identifies the input data as including adversarial data and filters out or discards the input data.”)
Chen and the combination of Teerapittayanon and Demir are analogous art because they are both in the field of endeavor of neural network architecture.
It would have been obvious before the effective filing date of the claimed invention to combine the early exit branching system of Teerapittayanon and Demir with the adversarial attack detection of Chen.  The early exit branching system of Teerapittayanon and Demir is functionally an ensemble of several models, in which a final decision is made based on weighting the decisions together.  Chen uses an ensemble of several models and compares their results to detect an adversarial attack based on discrepancies between the results. One of ordinary skill in the art would be motivated to use the ensemble to identify adversarial attacks in order to better detect such attacks and therefore preserve the accuracy of the model (Chen, Para [0028]: “In one or more embodiments, different precision neural networks of the same neural network architecture may produce very different responses to adversarial inputs in whereas for normal (e.g., non-adversarial) input data the same neural network may produce substantially similar responses.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kaya et al. (“Shallow-Deep Networks: Understanding and Mitigating Network Overthinking”) discloses “Internal Classifiers” after some convolutional layers, with confidence-based early exits
Huang et al. (“Multi-Scale Dense Networks for Resource Efficient Image Classification”) discloses “To maximally re-use computation between the classifiers, we incorporate them as early-exits into a single deep convolutional neural network and inter-connect them with dense connectivity”
Park (US 2019/0279115 A1) discloses a “cascade classifier” with an “early fail” operation described in [0073]
Huang et al. (US 2019/0034761 A1) discloses an early-exit decision in [0158]
D’Ercoli et al. (US 2020/0210834 A1) discloses an early exit in a DNN in Para [0050]
Venkatesh et al. (US 2021/0012178 A1) discloses a system and method for an early exit from convolution
Durham et al. (US 2019/0156183 A1), in Para [0014], discloses using an ensemble of models to defend against being fooled by adversarial inputs
Li (US 2019/0080089 A1), Para [0020-0025] discloses an ensemble output that provides “resiliency against a large class of evasion attacks”
Sharma et al. (US 2019/0220003 A1), Para [0021] and [0069], discloses that “The diversity of neural networks and the stochastic nature of the ensemble of neural networks used for identifying and classifying objects in the immediate environment results in more resiliency to adversarial attacks”, and “Because, in embodiments, an ensemble of neural networks/classifiers associated with the different vehicles 202 have been used that have different topologies, different manufacturers (who purchased the underlying technology from different OEM suppliers) and hence are differently trained, the decision boundaries are different. This makes the distributed and consensus-based classifier approach more resilient to white box and black box attacks.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126       
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126