DETAILED ACTION
1.	This action is in response to a request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12 April 2022 has been entered. Furthermore, this action is in action is in response to amendments and arguments filed 11 March 2022 for application 16/433786, filed 6 June 2019. Claims 1, 3-7, and 9-20 are currently pending. Claims 2 and 8 have been canceled.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 11 March 2022 have been fully considered but they are not persuasive. 
Specifically, the Applicants Argue:
PATENTApplicant submits that the features in claim 1 are not taught by Panda and Kim, considered individually or in combination for at least the following reasons. First, neither Panda nor Kim appear to teach "wherein execution of the overwrite instruction triggers an overwrite action that causes the instructions of each subsequent block of instructions of the plurality of blocks of instructions to be overwritten with no operation (NOP)". While Kim does teach that "[t]he first conditional branch in the normal branch code is converted to a wish jump instruction", this "wish jump instruction" does not appear to modify subsequent instructions (Kim at Section 3.1). Instead, Kim describes that "[w]hen the processor fetches the wish jump instruction, it generates a prediction for the direction of the wish jump using a branch predictor" and thereafter enters a "high-confidence-mode" or a "low-confidence-mode", in which "the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated" (Id at Section 3. 
Examiner Response
The Examiner respectfully disagrees. Panda and Kim teach “generating, by the compiler, an overwrite instruction  for each of the first block of code, the 11second block of code, and the third block of code, wherein the overwrite instruction, when 12executed, causes the instructions of subsequent blocks of code in the plurality of blocks of code to 13be overwritten with no operation (NOP) when the break condition of the while loop 14is satisfied.” Specifically, Panda teaches “generating, by the compiler, an overwrite instruction  for each of the first block of code, the 11second block of code, and the third block of code, wherein the overwrite instruction, when 12executed, causes the instructions of subsequent blocks of code in the plurality of blocks of code to 13be overwritten with no operation … when the break condition of the while loop 14is satisfied.”, because he teaches that a classifier associated with each layer/block of code comprises the overwrite instructions (generated as indicated previously through the Synopsys Power compiler) that, when executed, perform the analysis/linear classification of the output features from each respective layer including the computation of the confidence measure which form predicate information used for the evaluation of the break condition of the while loop (as well as the operations which overwrite the code) such that if the break condition is satisfied for the first layer/block (i.e., the confidence computed by the classification exceeds a threshold) then all subsequent blocks of code (convolutions and classifiers already compiled) are not activated in the sense that the instructions are overwritten/superseded to not perform an operation (or, in other words, the break condition changes a loop execution control flow relative to the blocks subsequent to the first block) (-viz., [p. 478, Section IIIB,  p. 478, Section IV, Algorithm 2, Figure 3, Table I, Table II] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists.).  However, Panda does not explicitly teach … (NOP) instructions. In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions, he does not disclose that these instructions are replaced by NOP instructions which are subsequently executed. However, Kim teaches “10generating, by the compiler, an overwrite instruction for each of the first block of code, the 11second block of code, and the third block of code, wherein the overwrite instruction, when 12executed, causes the instructions of subsequent blocks of code in the plurality of blocks of code to 13be overwritten with no operation (NOP) instructions when the break condition of the while loop 14is satisfied” since he teaches that instructions are generated through a compiler to convert while loops into a wish loop having a predicated termination such that blocks of code indicating successive iteration loops are fetched (according to a prediction of a number required but such that overwrite instructions are also generated by the compiler (1) to form a prediction of which blocks are needed), (2) to determine what confidence mode that prediction corresponds to, (3) to determine (at each iteration) if there are unneeded fetched blocks, and (4) to overwrite/override the blocks of code with NOPs (i.e., second and third blocks) that are determined to no longer be needed as the result of a late-exit in a low-confidence mode; in other words, as exemplified in section 3.2, the predicate evaluation at block X_3 leads to transition to the block Y, rendering the instructions fetched for blocks X_4 and X_5 to be extraneous and enabling the replacement of those extraneous blocks by nop’s which are nonetheless blocks that are executed subsequently and wherein it is noted that, like Panda, Kim teaches blocks of code that include common set of instructions in the form the code that is repeated during each loop (-viz., [p. 1, Section 1, p. 2, Section 2.1, p. 3, Section 3.1, p. 4, Section 3.2, p. 5, Section 3.5.4, Figure 3, Figure 4, Figure 8] We propose a mechanism in which the compiler generates code that can be executed either as predicated code or non-predicated code (i.e., code with normal conditional branches). The hardware decides whether the predicated code or the non-predicated code is executed based on a run-time confidence estimation of the branch’s prediction. The code generated by the compiler is the same as predicated code, except the predicated conditional branches are NOT removed—they are left intact in the program code. These conditional branches are called wish branches…. Hence, wish branches provide the hardware with a way to dynamically choose between conditional branch prediction and predicated execution depending on accurate run-time information about the branch’s behavior., If the predicate is TRUE, the instruction performs the computation and stores the result into the destination register. If the predicate is FALSE, the instruction simply moves the old value of the destination register into its destination register, which is architecturally a NOP operation. Hence, regardless of the predicate value, the instruction always writes into the destination register, allowing the dependent instructions to be renamed correctly., However, in low-confidence-mode, the processor never needs to flush the pipeline, even when the branch prediction is incorrect. Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required.)
The Applicants Further Argue:
Second, neither Panda nor Kim appear to teach "adding the overwrite instruction within and at an end of each of the one or more blocks of instructions after the set of common instructions". While Kim does teach that "[w]ish branches require extra branch instructions" (Id. at Section 3.7) and further that "[t]he first conditional branch in the normal branch code is converted to a wish jump instruction" (Id. at Section 3.1), Kim does not appear to teach adding the "wish jump instruction" (assuming, arguendo, that the "wish jump instruction" can be 11 of 13 Appl. No. 16/433,786PATENTAmdt. dated March 11, 2022Reply to Office Action of January 12, 2022equated with Applicant's "overwrite instruction") either "within" or "at an end of each of the one or more blocks of instructions after the set of common instructions" as recited in Applicant's 

Examiner Response
The Examiner respectfully disagrees. Panda and Kim teach “adding, by the compiler, the overwrite instruction within and at an end of each of the first block of 16code, the second block of code, and the third  of code after the set of common instructions.” Specifically Panda teaches  “adding, by the compiler, the overwrite instruction within and at an end of each of the first block of 16code, the second block of code, and the third  of code after the set of common instructions” because he teaches that each block of code has associated with it the overwrite instructions (corresponding to the classification logic for evaluating the output of a layer) such that this is added to the code of each block (including the common set of instructions associated with convolution/feature extraction) by virtue of being associated with each block/layer and acting on the output of each set of common instructions/convolution operations  (with, as noted, the compiler forming these instructions to synthesize an integrated design – i.e., the integration of the classifiers into the body/framework of the CNN deep neural network code) such that this code lies within and extends to the end of a layer-specific block of code (i.e., it is put in place by the compiler after the convolution operation/common set of instructions for each layer and extends to the end of computations for that block/layer) (-viz., [p. 478, Section IIIB,  p. 478, Section IV, Algorithm 2, Figure 3, Table I, Table II] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists.) It is noted that Kim also teaches “adding, by the compiler, the overwrite instruction within and at an end of each of the first block of 16code, the second block of code, and the third  of code after the set of common instructions” because he teaches that the inclusion (by a compiler) instructions for controlling the wish loop execution including overwrite instructions to form a prediction of which blocks are needed, to determine what confidence mode that prediction corresponds to, to determine (at each iteration) if there are unneeded fetched blocks, and to determine to overwrite/override the blocks of code with NOPs (i.e., second and third blocks) if they are deemed to not be needed; wherein a determination is made if a break occurs based upon the execution of the break condition (e.g., P1 is not zero in Figure 4b) but by also by the branch misprediction recovery module (i.e., determination of a divergence between the wish loop flow and the actual execution flow), such that the (wish loop instructions) lie within each block (executed every loop pass) and are executed after the execution of the common set of instructions for that loop (i.e., are evaluated for each loop in response to the results of the common set of instructions) such that these instructions also lie at the end but are within the block/loop (-viz., [p. 4, Section 3.2, p. 5, Section 3.5.4, Figure 3, Figure 4, Figure 8] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required.) …

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 3-6, 9-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Panda et al. (“Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition”, 2016 Design, Automation, & Test in Europe Conference & Exhibition, IEEE, 2016, pp. 475-480), hereinafter referred to as Panda, in view of Kim et al. (“Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution”, 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’05), 2005, pp. 1-12), hereinafter referred to as Kim.

In regards to claim 1, Panda teaches A method for reducing computation in neural network processing, the 2method comprising:  3generating, by a compiler, instructions from source code to perform a repeatable 4set of operations up to a number of iterations based on a break condition of a while loop, wherein 5the instructions include a plurality of blocks of code, each of the plurality of blocks of code including a set of common instructions 6corresponding to an iteration of performing the repeatable set of operations, (See at [p. 476, Section II] As mentioned earlier, CNN layers of DLN models that are trained for classification, have been used as feature extractors by removal of the output layer. We exploit the efficacy of the convolutional layer features to develop an architecture in which easy instances can be classified earlier without activating the latter layers of the DLN network., (Further at [Section IIIB,  p. 478] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , (Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., (Further at [pp. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein operations are reduced (i.e., accelerated) in a deep neural network in which the execution of the deep neural network (CNN) is characterized by a loop (while loop with i less than total number of stages with code within each loop/layer corresponding to a block of code) over each stage i of that neural network (CNN_i) up to the final layer (maximum number of iterations), such that each iteration in that loop corresponds to a different layer of the neural network, such that, within each layer-specific block of code is contained repeatable code corresponding to feature extraction at each layer (e.g., the operation of convolution), such that the reduction (acceleration) in operations is achieved by early exiting of that loop according to a break condition (satisfaction of confidence criteria), such that the set of code within a given loop/block (Algorithm 2) that specifically performs the convolution operation (for a given layer) forms a common set of instructions (common across layers/blocks), and such that instructions for this integrated neural network implementation, classifier, and accelerator/computation reduction framework are generated using a Synopsys design compiler (but also, in a more general sense the associated algorithms – algorithm 2 and the characterization of the neural network architectures – are representative/indicative of source code from which the instructions are generated).)  wherein the plurality 7of blocks of code include a first block of code corresponding to a first layer of a neural network, 8a second block of code corresponding to a second layer of the neural network, and a third block 9of code corresponding to a third layer of the neural network;  (See at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein a first block of code (layer i) is executed to generate output features from the CNN associated with the execution of the (repeatable/feature extraction) instructions of the corresponding block/layer/iteration and wherein any two subsequent layers (not necessarily consecutive) form the second and third blocks which, like the first, generate output features from the CNN (i.e., the instructions correspond to common set of code associated with the convolution operation/feature extraction for each layer).) 10generating, by the compiler, an overwrite instruction  for each of the first block of code, the 11second block of code, and the third block of code, wherein the overwrite instruction, when 12executed, causes the instructions of subsequent blocks of code in the plurality of blocks of code to 13be overwritten with no operation … when the break condition of the while loop 14is satisfied, …; (See at [p. 478, Section IIIB,  p. 478, Section IV, Algorithm 2, Figure 3, Table I, Table II] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein the classifier associated with each layer/block of code comprises the overwrite instructions (generated as indicated previously through the Synopsys Power compiler) that, when executed, perform the analysis/linear classification of the output features from each respective layer including the computation of the confidence measure which form predicate information used for the evaluation of the break condition of the while loop (as well as the operations which overwrite the code) such that if the break condition is satisfied for the first layer/block (i.e., the confidence computed by the classification exceeds a threshold) then all subsequent blocks of code (convolutions and classifiers already compiled) are not activated in the sense that the instructions are overwritten/superseded to not perform an operation (or, in other words, the break condition changes a loop execution control flow relative to the blocks subsequent to the first block).) 15adding, by the compiler, the overwrite instruction within and at an end of each of the first block of 16code, the second block of code, and the third  of code after the set of common instructions;  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II]Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein each block of code has associated with it the overwrite instructions (corresponding to the classification logic for evaluating the output of a layer) such that this is added to the code of each block (including the common set of instructions associated with convolution/feature extraction) by virtue of being associated with each block and acting on the output of each common set of instructions/convolution operations(with, as noted, the compiler forming these instructions to synthesize an integrated design – i.e., the integration of the classifiers into the body/framework of the CNN deep neural network code) such that this code lies within and extends to the end of a layer-specific block of code (i.e., it is put in place by the compiler after the convolution operation/common set of instructions for each layer and extends to the end of computations for that block/layer).) 17executing, by a neural network accelerator, the set of common instructions of the first block of 18code;  (See at [p. 478, Section IIIB,  ] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the convolution operations/common set of instructions for a given layer of the neural network is executed to generate output features.) after executing the set of common instructions of the first block of code, 19executing, by the neural network accelerator, the overwrite instruction of the first 20block of code;  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the linear classifier associated with the first block/ith layer is executed to determine a confidence value for evaluation of a break condition.) 21determining, by the neural network accelerator, that the break condition of the 22while loop is satisfied; (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein if the confidence level computed using the linear classifier exceeds a threshold then a break condition is satisfied.) in response to executing the overwrite instruction of the first block of code and to 24determining that the break condition of the while loop is satisfied, overwriting the instructions of 25the second block of code and the third block of code; executing, by the neural network accelerator, the … instructions of the second 27block of code and the third block of code  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein, if the break condition is satisfied (confidence>threshold) based on the execution of the instructions for analyzing/classifying the CNN outputs of the ith layer (first block), the instructions associated with the code for the subsequent layers (second and third blocks) are not activated – i.e., the neural network accelerator implements the instructions to overwrite/supersede the instructions for subsequent blocks by an absence of any operation in the execution control flow.)  
However, Panda does not explicitly teach … with no operation (NOP) instructions …  , the NOP instructions to be executed during execution of the subsequent blocks of code …with the NOP instructions; and …26… the NOP instructions of the second 27block of code and the third block of code. In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions, he does not disclose that these instructions are replaced by NOP instructions which are subsequently executed.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches wherein 5the instructions include a plurality of blocks of code, each of the plurality of blocks of code including a set of common instructions 6corresponding to an iteration of performing the repeatable set of operations…10generating, by the compiler, an overwrite instruction for each of the first block of code, the 11second block of code, and the third block of code, wherein the overwrite instruction, when 12executed, causes the instructions of subsequent blocks of code in the plurality of blocks of code to 13be overwritten with no operation (NOP) instructions when the break condition of the while loop 14is satisfied; the NOP instructions to be executed during execution of the subsequent blocks of code (See at [p. 1, Section 1] We propose a mechanism in which the compiler generates code that can be executed either as predicated code or non-predicated code (i.e., code with normal conditional branches). The hardware decides whether the predicated code or the non-predicated code is executed based on a run-time confidence estimation of the branch’s prediction. The code generated by the compiler is the same as predicated code, except the predicated conditional branches are NOT removed—they are left intact in the program code. These conditional branches are called wish branches…. Hence, wish branches provide the hardware with a way to dynamically choose between conditional branch prediction and predicated execution depending on accurate run-time information about the branch’s behavior., Further at [p. 2, Section 2.1, p. 3, Section 3.1] If the predicate is TRUE, the instruction performs the computation and stores the result into the destination register. If the predicate is FALSE, the instruction simply moves the old value of the destination register into its destination register, which is architecturally a NOP operation. Hence, regardless of the predicate value, the instruction always writes into the destination register, allowing the dependent instructions to be renamed correctly., Further at [p. 3, Section 3.1] However, in low-confidence-mode, the processor never needs to flush the pipeline, even when the branch prediction is incorrect. Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., Further at [p. 4, Section 3.2, Figure 3, Figure 4, Figure 8]  In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4, Figure 8] If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required., wherein instructions are generated through a compiler to convert while loops into a wish loop having a predicated termination such that blocks of code indicating successive iteration loops are fetched (according to a prediction of a number required but such that overwrite instructions are also generated by the compiler (1) to form a prediction of which blocks are needed), (2) to determine what confidence mode that prediction corresponds to, (3) to determine (at each iteration) if there are unneeded fetched blocks, and (4) to overwrite/override the blocks of code (i.e., second and third blocks) that are determined to no longer be needed as the result of a late-exit in a low-confidence mode; in other words, as exemplified in section 3.2, the predicate evaluation at block X_3 leads to transition to the block Y, rendering the instructions fetched for blocks X_4 and X_5 to be extraneous and enabling the replacement of those extraneous blocks by nop’s which are nonetheless blocks that are executed subsequently and wherein it is noted that, like Panda, Kim teaches blocks of code that include common set of instructions in the form the code that is repeated during each loop.) … adding, by the compiler, the overwrite instruction within and at an end of each of the first block of 16code, the second block of code, and the third  of code after the set of common instructions;  … after executing the set of common instructions of the first block of code, …executing, … the overwrite instruction of the first 20block of code;  21determining, … that the break condition of the 22while loop is satisfied; in response to executing the overwrite instruction of the first block of code and to 24determining that the break condition of the while loop is satisfied, overwriting the instructions of 25the second block of code and the third block of code with the NOP instructions; and  26executing, …,  the NOP instructions of the second 27block of code and the third block of code.  (See at [p. 4, Section 3.2] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4, Figure 8]  If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required. , wherein the instructions are included within the loop for controlling the wish loop execution including overwrite instructions to form a prediction of which blocks are needed, to determine what confidence mode that prediction corresponds to, to determine (at each iteration) if there are unneeded fetched blocks, and to determine to overwrite/override the blocks of code (i.e., second and third blocks) if they are deemed to not be needed; wherein a determination is made if a break occurs based upon the execution of the break condition (e.g., P1 is not zero in Figure 4b) but by also by the branch misprediction recovery module (i.e., determination of a divergence between the wish loop flow and the actual execution flow), wherein, in response to the execution of the overwrite instructions (confidence mode determination, branch evaluation of flow, response to misprediction) and the determination that the loop break/branch condition has been satisfied, the execution of the overwrite instructions cause the replacement of the extraneous subsequent blocks of code by nop’s (e.g., X_4 and X_5 are replace by nop’s in the late-exit case for a low confidence mode when exactly 3 iterations are required to transition out of the while loop), and wherein, like Panda Kim teaches that the (wish loop instructions) lie within each block and are executed after the execution of the common set of instructions for that loop (i.e., are evaluated for each loop in response to the results of the common set of instructions) such that these instructions also lie at the end but are within the block/loop.) …
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to implement a neural network accelerator for a neural network which implements overwrite instructions for overwriting code instructions with nop instructions and evaluates a break condition, the satisfaction of which at one iteration of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions and with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

In regard to claim 3, the rejection of claim 1 is incorporated, and Panda further teaches 33wherein an evaluation instruction is added to each of the plurality of blocks of code that, when executed, causes a determination if whether the 3break condition of the while loop is satisfied, and wherein the evaluation instruction precedes the 4overwrite instruction in each of the plurality of blocks of code.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein the code which implements the linear classifier comprises the evaluation instructions since it evaluates the output of the respective layer of the neural network to generate information that may be used in the break condition and for the subsequent determination and action/termination (overwrite instructions) associated with the break condition result.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim for the same reasons as pointed out for claim 1.

In regard to claim 4, the rejection of claim 3 is incorporated, and Panda further teaches 12in response to executing the overwrite instruction of the first block of code, 3causing, by the neural network accelerator, the evaluation instruction of the second block of code 4and the third block of code to be overwritten ….  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer.,  Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the code that determines the evaluation of the break condition (sufficient confidence) code is in place for every layer (see Figure 3b) but rendered inactive (overwritten with no operation) for layers subsequent to layer i if the break condition (execution flow control condition) for layer i is satisfied (i.e., the instructions for the linear classifier and the break condition become inactive). 
However, Panda does not explicitly disclose …with a NOP instruction… In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions which include the evaluation instructions (e.g., linear classifier) performed in subsequent layers, he does not disclose that these subsequent evaluation instructions are replaced by NOP instructions.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10in response to executing the overwrite instruction of the first block of code, 3causing, …, the evaluation instruction of the second block of code 4and the third block of code to be overwritten with a NOP instruction;  (See at [p. 3, Section 3.1] Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., Further at [p. 4, Section 3.2] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4] If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required. , wherein the replacement of the extraneous fetched block by nops also includes replacement of evaluation instructions associated with those blocks in the form of the code that is used to directly determine the predicates to the break condition (e.g., p1 in Figure 4B or Figure 5B) because these instructions are no longer on the correct control flow path.) …
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to implement a neural network accelerator for a neural network which implements overwrite instructions for overwriting code instructions, including the evaluation instructions, with nop instructions and evaluates a break condition, the satisfaction of which at one iteration of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions and with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous as well as any associated code not on the correct control flow path by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

In regard to claim 5, the rejection of claim 1 is incorporated, and Panda further teaches 12identifying, …, the repeatable set of operations in the source code, 3wherein the overwrite instruction is generated in response to identifying the repeatable set of 4operations.  (See at [p. 476, Section II] As mentioned earlier, CNN layers of DLN models that are trained for classification, have been used as feature extractors by removal of the output layer. We exploit the efficacy of the convolutional layer features to develop an architecture in which easy instances can be classified earlier without activating the latter layers of the DLN network., Further at [p. 477, Section IIIA] Algorithm 1 shows the pseudo code for training the CDLN. The process takes the original DLN N_orig, training data with the corresponding labels as input and produces a conditional deep learning network N_cdl with the optimized number of stages…. The linear classifiers () are then trained on the same training data using the least mean square rule (steps 4-7).,  Further at [pp. 477-478, Section III, Algorithm 1, Algorithm 2, Figure 3, Table I, Table II] Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer., wherein the repeatable source code operations are identified in the deep neural network control flow framework from the given structure of that neural network operations with the formation of the overwrite instructions (i.e., classifier and associated logic) generated in response to the trained neural network structure.) 
However, Panda does not explicitly disclose …by the compiler… In other words, Panda does not explicitly disclose that the compiler identifies the repeatable instructions such as the particular instructions in a control loop.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10 identifying, by the compiler, the repeatable set of operations in the source code, 3wherein the overwrite instruction is generated in response to identifying the repeatable set of 4operations (See at [pp. 3-4, Section 3.2] A wish branch can also be used for a backward branch. We call this a wish loop instruction. … The main difference between the normal branch code (Figure 4a) and the wish loop code (Figure 4b) is that in the wish loop code, the instructions in block X (i.e., the loop body) are predicated with the loop branch condition., Further at [p. 6, Section 3.6] A wish branch binary is an object file consisting of a mixture of wish branches, traditional predicated code, and normal branches. The compiler decides which branches are predicated, which are converted to wish branches, and which stay as normal branches based on estimated branch misprediction rates and compile-time heuristics., Further at [p. 7, Section 4.2.1] To generate predicated code, the ORC compiler first checks whether or not the control-flow graph is suitable for if-conversion in a region boundary., Further at [p. 7, Section 4.2.2, Figure 3, Figure 4] If a branch is suitable for if-conversion, we treat that branch as a wish branch candidate. If the number of instructions in the fall-through block of a branch is greater than N (we set N to 5), the candidate branch is converted to a wish jump and the necessary wish joins are inserted., wherein the compiler evaluates the code to determine candidates for wish branches and wish loops (repeatable code with a backward branch) such that the compiler first identifies the block of code corresponding to a loop (e.g., Figure 4 a) and converts it into a wish loop code (if it passes compiler-based criteria) as well as generates the associated code logic for performing the overwrite (i.e., low confidence mode, evaluation of status of flow, late-exit instructions).)…
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to implement a neural network accelerator for a neural network in which the overwrite instructions for overwriting code instructions are generated in response to compiler identified repeatable code. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by using the compiler to identify instructions that may be optimize accordingly by replacing instruction of the identified blocks determined to be extraneous by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

In regards to claim 6, Panda teaches 1A method comprising:  2generating, by a compiler for a neural network, a plurality of blocks of the plurality of blocks of instructions from source code, wherein each block of instructions includes a set of common instructions 4that are common to each block, and wherein during execution of the plurality of blocks of instructions,  the set of common instructions are performed up to a number 5of iterations based on a condition;  (See at [p. 476, Section II] As mentioned earlier, CNN layers of DLN models that are trained for classification, have been used as feature extractors by removal of the output layer. We exploit the efficacy of the convolutional layer features to develop an architecture in which easy instances can be classified earlier without activating the latter layers of the DLN network., Further at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein operations are reduced (i.e., accelerated) in a deep neural network in which the execution of the deep neural network (CNN) is characterized by a loop (while loop with i less than total number of stages with code within each loop/layer corresponding to a block of code) over each stage i of that neural network (CNN_i) up to the final layer (maximum number of iterations), such that each iteration in that loop corresponds to a different layer of the neural network, such that, within each layer-specific block of code is contained repeatable code corresponding to feature extraction at each layer (e.g., the operation of convolution), such that the reduction (acceleration) in operations is achieved by early exiting of that loop according to a break condition (satisfaction of confidence criteria), such that the set of code within a given loop/block (Algorithm 2) that specifically performs the convolution operation (for a given layer) forms a common set of instructions (common across layers/blocks), and such that instructions for this integrated neural network implementation, classifier, and accelerator/computation reduction framework are generated using a Synopsys design compiler (but also, in a more general sense the associated algorithms – algorithm 2 and the characterization of the neural network architectures – are representative/indicative of source code from which the instructions are generated).)  6generating an overwrite instruction for each of one or more blocks of instructions in the plurality of 7blocks of instructions, wherein execution of the overwrite instruction triggers an overwrite action that causes the 8instructions of each subsequent block of instructions of the plurality of blocks of instructions to be overwritten with no operation… 9……; (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV]  For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein the classifier associated with each layer/block of code comprises the overwrite instructions (generated as indicated previously through the Synopsys Power compiler) that, when executed, perform the analysis/linear classification of the output features from each respective layer including the computation of the confidence measure which form predicate information used for the evaluation of the break condition of the while loop (as well as the operations which overwrite the code) such that if the break condition is satisfied for the first layer/block (i.e., the confidence computed by the classification exceeds a threshold) then all subsequent blocks of code (convolutions and classifiers already compiled) are not activated in the sense that the instructions are overwritten/superseded to not perform an operation (or, in other words, the break condition changes a loop execution control flow relative to the blocks subsequent to the first block).)and  10adding the overwrite instruction within and at an end of each of the one or more blocks of instructions after the set of common instructions.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein each block of code has associated with it the overwrite instructions (corresponding to the classification logic for evaluating the output of a layer) such that this is added to the code of each block (including the common set of instructions associated with convolution/feature extraction) by virtue of being associated with each block and acting on the output of each block (with, as noted, the compiler forming these instructions to synthesize an integrated design – i.e., the integration of the classifiers into the body/framework of the CNN deep neural network code) such that this code lies within and extends to the end of a layer-specific block of code (i.e., it is put in place by the compiler after the convolution operation/common set of instructions for each layer and extends to the end of computations for that block/layer).)
However, Panda does not explicitly teach … with no operation (NOP) instructions that are to be executed during execution of each subsequent block of instructions. In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions, he does not disclose that these instructions are replaced by NOP instructions which are subsequently executed.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches wherein each block of instructions includes a set of common instructions 4that are common to each block, and wherein during execution of the plurality of blocks of instructions…  6generating an overwrite instruction for a block of instructions in the plurality of 7blocks, wherein execution of the overwrite instruction triggers an overwrite action that causes the 8instructions of a subsequent block of the plurality of blocks to be overwritten with no operation 9(NOP) instructions that are to be executed during execution of the subsequent block 1; (See at [p. 1, Section 1] We propose a mechanism in which the compiler generates code that can be executed either as predicated code or non-predicated code (i.e., code with normal conditional branches). The hardware decides whether the predicated code or the non-predicated code is executed based on a run-time confidence estimation of the branch’s prediction. The code generated by the compiler is the same as predicated code, except the predicated conditional branches are NOT removed—they are left intact in the program code. These conditional branches are called wish branches…. Hence, wish branches provide the hardware with a way to dynamically choose between conditional branch prediction and predicated execution depending on accurate run-time information about the branch’s behavior., Further at [p. 2, Section 2.1] If the predicate is TRUE, the instruction performs the computation and stores the result into the destination register. If the predicate is FALSE, the instruction simply moves the old value of the destination register into its destination register, which is architecturally a NOP operation. Hence, regardless of the predicate value, the instruction always writes into the destination register, allowing the dependent instructions to be renamed correctly., Further at [p. 3, Section 3.1] However, in low-confidence-mode, the processor never needs to flush the pipeline, even when the branch prediction is incorrect. Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., Further at [p. 4, Section 3.2] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4, Figure 8] If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required., wherein instructions are generated through a compiler to convert while loops into a wish loop having a predicated termination such that blocks of code indicating successive iteration loops are fetched (according to a prediction of a number required but such that overwrite instructions are also generated by the compiler (1) to form a prediction of which blocks are needed), (2) to determine what confidence mode that prediction corresponds to, (3) to determine (at each iteration) if there are unneeded fetched blocks, and (4) to overwrite/override the blocks of code (i.e., second and third blocks) that are determined to no longer be needed as the result of a late-exit in a low-confidence mode; in other words, as exemplified in section 3.2, the predicate evaluation at block X_3 leads to transition to the block Y, rendering the instructions fetched for blocks X_4 and X_5 to be extraneous and enabling the replacement of those extraneous blocks by nop’s which are nonetheless blocks that are executed subsequently and wherein it is noted that, like Panda, Kim teaches blocks of code that include common set of instructions in the form the code that is repeated during each loop.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to generate and implement overwrite instructions for a neural network for overwriting code instructions with nop instructions based on a (break) condition, the satisfaction of which at one iteration/block of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions and with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

In regard to claim 9, the rejection of claim 6 is incorporated, and Panda further wherein execution of the overwrite instruction triggers the overwrite action when the condition is satisfied.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists.wherein the classifier associated with each layer/block of code comprise the overwrite instructions (generated as indicated previously through the Synopsys Power compiler) that, when executed, perform the analysis/linear classification of the output features from each respective layer including the computation of the confidence measure which form predicate information used for the evaluation of the break condition of the while loop (as well as the operations which overwrite the code) such that if the break condition is satisfied for the first layer/block (i.e., the confidence computed by the classification exceeds a threshold) then all subsequent blocks of code (and classifiers) are not activated in the sense that the instructions are overwritten/superseded to not perform an operation (or, in other words, the break condition change a loop execution control flow relative to the blocks subsequent to the first block).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim for the same reasons as pointed out for claim 6.

In regard to claim 10, the rejection of claim 6 is incorporated, and Panda further 1wherein the common set of instructions corresponds to a 2repeatable set of operations performed in a node or a layer of the neural network.  (See at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein a first block of code (layer i) is executed to generate output features from the CNN associated with the execution of the (repeatable/feature extraction) instructions of the corresponding block/layer/iteration and wherein any two subsequent layers (not necessarily consecutive) form the second and third blocks which, like the first, generate output features from the CNN (i.e., the instructions correspond to common set of code associated with the convolution operation/feature extraction for each layer which inherently also includes any node within that layer).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim for the same reasons as pointed out for claim 6.

In regard to claim 11, the rejection of claim 6 is incorporated, and Panda further teaches 12generating an evaluation instruction that, when executed, causes a determination 3of whether the condition is satisfied; and adding the evaluation instruction to each of the one or more blocks of instructions.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV]For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein the code which implements the linear classifier is generated (trained/compiled) that comprises evaluation instructions since it evaluates the output of the respective layer of the neural network to generate information that may be used in the break condition and for the subsequent determination and action (overwrite instructions) associated with the break condition result.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim for the same reasons as pointed out for claim 6.

1 In regard to claim 12, the rejection of claim 11 is incorporated, and Panda further teaches wherein the overwrite action causes the 2evaluation instruction to be overwritten with a … in each subsequent block of instructions.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer.,  Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the code that determines the evaluation of the break condition (sufficient confidence) code is in place for every layer (see Figure 3b) but rendered inactive (overwritten with no operation) for layers subsequent to layer i if the break condition (execution flow control condition) for layer i is satisfied (i.e., the instructions for the linear classifier and the break condition become inactive). 
However, Panda does not explicitly disclose … a NOP instruction… In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions which include the evaluation instructions (e.g., linear classifier) performed in subsequent layers, he does not disclose that these subsequent evaluation instructions are replaced by NOP instructions.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10 wherein the overwrite action causes the 2evaluation instruction to be overwritten with a NOP instruction in each subsequent block of instructions  (See at [p. 3, Section 3.1] Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., Further at [p. 4, Section 3.2] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4] If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required. , wherein the replacement of the extraneous fetched block by nops also includes replacement of evaluation instructions associated with those blocks in the form of the code that is used to directly determine the predicates to the break condition (e.g., p1 in Figure 4B or Figure 5B) because these instructions are no longer on the correct control flow path.) …
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to generate and implement evaluation instructions for a neural network which are overwritten with nop instructions based on a (break) condition,  the satisfaction of which at one iteration of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions and with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous as well as any associated code not on the correct control flow path by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

In regard to claim 13, the rejection of claim 6 is incorporated, and Panda further  12identifying, …, a repeatable set of operations in the source code for 3the neural network, wherein the overwrite instruction is generated in response to identifying the 4repeatable set of operations.  (See at [p. 476, Section II] As mentioned earlier, CNN layers of DLN models that are trained for classification, have been used as feature extractors by removal of the output layer. We exploit the efficacy of the convolutional layer features to develop an architecture in which easy instances can be classified earlier without activating the latter layers of the DLN network., Further at [p. 477, Section IIIA, Algorithm 1, Algorithm 2, Figure 3, Table I, Table II] Algorithm 1 shows the pseudo code for training the CDLN. The process takes the original DLN N_orig, training data with the corresponding labels as input and produces a conditional deep learning network N_cdl with the optimized number of stages…. The linear classifiers () are then trained on the same training data using the least mean square rule (steps 4-7)., Further at [p. 478, Section IIB, Algorithm 1, Algorithm 2, Figure 3, Table I, Table II]  Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer., wherein the repeatable source code operations are identified in the deep neural network control flow framework from the given structure of that neural network operations with the formation of the overwrite instructions (i.e., classifier and associated logic) generated in response to the trained neural network structure.) 
However, Panda does not explicitly disclose …by the compiler… In other words, Panda does not explicitly disclose that the compiler identifies the repeatable instructions such as the particular instructions in a control loop.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10 identifying, by the compiler, a repeatable set of operations in the source code for 3the neural network, wherein the overwrite instruction is generated in response to identifying the 4repeatable set of operations (See at [pp. 3-4, Section 3.2] A wish branch can also be used for a backward branch. We call this a wish loop instruction. … The main difference between the normal branch code (Figure 4a) and the wish loop code (Figure 4b) is that in the wish loop code, the instructions in block X (i.e., the loop body) are predicated with the loop branch condition., Further at [p. 6, Section 3.6] A wish branch binary is an object file consisting of a mixture of wish branches, traditional predicated code, and normal branches. The compiler decides which branches are predicated, which are converted to wish branches, and which stay as normal branches based on estimated branch misprediction rates and compile-time heuristics., Further at [p. 7, Section 4.2.1] To generate predicated code, the ORC compiler first checks whether or not the control-flow graph is suitable for if-conversion in a region boundary., Further at [p. 7, Section 4.2.2, Figure 3, Figure 4]  If a branch is suitable for if-conversion, we treat that branch as a wish branch candidate. If the number of instructions in the fall-through block of a branch is greater than N (we set N to 5), the candidate branch is converted to a wish jump and the necessary wish joins are inserted., wherein the compiler evaluates the code to determine candidates for wish branches and wish loops (repeatable code with a backward branch) such that the compiler first identifies the block of code corresponding to a loop (e.g., Figure 4 a) and converts it into a wish loop code (if it passes compiler-based criteria) as well as generates the associated code logic for performing the overwrite (i.e., low confidence mode, evaluation of status of flow, late-exit instructions).)…
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to generate and implement, for a neural network, overwrite instructions based on a (break) condition,  in which the overwrite instructions for overwriting code instructions are generated in response to compiler identified repeatable code. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by using the compiler to identify instructions that may be optimize accordingly by replacing instruction of the identified blocks determined to be extraneous by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

In regard to claim 14, the rejection of claim 6 is incorporated, and Panda further teaches 1wherein the compiler adds the overwrite instruction within and at the end of each of the plurality of blocks of instructions after the set of common instructions.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer., Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein each block of code has associated with it the overwrite instructions (corresponding to the classification logic for evaluating the output of a layer) such that this is added to the code of each block (including the common set of instructions associated with convolution/feature extraction) by virtue of being associated with each block and acting on the output of each block (with, as noted, the compiler forming these instructions to synthesize an integrated design – i.e., the integration of the classifiers into the body/framework of the CNN deep neural network code) such that this code lies within and extends to the end of a layer-specific block of code (i.e., it is put in place by the compiler after the convolution operation/common set of instructions for each layer and extends to the end of computations for that block/layer).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim for the same reasons as pointed out for claim 6.

In regards to claim 15, Panda teaches   An integrated circuit device for neural network processing, the integrated circuit device comprising:  3a memory operable to store a plurality of blocks of instructions generated by a 4compiler from source code implementing a repeatable set of operations, wherein the repeatable 5set of operations are performed up to a number of iterations based on a condition, wherein each 6of the plurality of blocks of instructions includes a set of common instructions corresponding to an iteration of the repeatable set of 7operations, wherein the plurality of blocks of instructions include a first block of instructions 8and a second block of instructions,  (See at [p. 476, Section II] As mentioned earlier, CNN layers of DLN models that are trained for classification, have been used as feature extractors by removal of the output layer. We exploit the efficacy of the convolutional layer features to develop an architecture in which easy instances can be classified earlier without activating the latter layers of the DLN network., Further at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein operations are reduced (i.e., accelerated) in a deep neural network in which the execution of the deep neural network (CNN) is characterized by a loop (while loop with i less than total number of stages with code within each loop/layer corresponding to a block of code) over each stage i of that neural network (CNN_i) up to the final layer (maximum number of iterations), such that each iteration in that loop corresponds to a different layer of the neural network, such that, within each layer-specific block of code is contained repeatable code corresponding to feature extraction at each layer (e.g., the operation of convolution), such that the reduction (acceleration) in operations is achieved by early exiting of that loop according to a break condition (satisfaction of confidence criteria), such that the set of code within a given loop/block (Algorithm 2) that specifically performs the convolution operation (for a given layer) forms a common set of instructions (common across layers/blocks), and such that instructions for this integrated neural network implementation, classifier, and accelerator/computation reduction framework are generated using a Synopsys design compiler (but also, in a more general sense the associated algorithms – algorithm 2 and the characterization of the neural network architectures – are representative/indicative of source code from which the instructions are generated).)  and wherein each of the first block of instructions and the second block of instructions includes an 9overwrite instruction added by the compiler within and at an end of each of the first block of instructions and the second block of instructions after the set of common instructions; (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV] For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein each block of code has associated with it the overwrite instructions (corresponding to the classification logic for evaluating the output of a layer) such that this is added to the code of each block (including the common set of instructions associated with convolution/feature extraction) by virtue of being associated with each block and acting on the output of each block (with, as noted, the compiler forming these instructions to synthesize an integrated design – i.e., the integration of the classifiers into the body/framework of the CNN deep neural network code) such that this code lies within and extends to the end of a layer-specific block of code (i.e., it is put in place by the compiler after the convolution operation/common set of instructions for each layer and extends to the end of computations for that block/layer).) and  10one or more execution engines configured to perform operations comprising:  11executing the set of common instructions of first block of instructions;  (See at Further at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the convolution operations/common set of instructions for a given layer of the neural network is executed to generate output features.) after executing the set of common instructions of the first block of instructions, 12executing the overwrite instruction of the first block of instructions;  13 (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [  Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the linear classifier associated with the first block/ith layer is executed to determine a confidence value for evaluation of a break condition.) determining that the condition is satisfied;  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II]  Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., 
wherein if the confidence level computed using the linear classifier exceeds a threshold then a break condition is satisfied.) 14in response to executing the overwrite instruction of the first block of 15instructions and to determining that the condition is satisfied, triggering an overwrite 16action that causes the second block of instructions to be overwritten …. executing the … instructions of the second block of instructions   (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein, if the break condition is satisfied (confidence>threshold) based on the execution of the instructions for analyzing/classifying the CNN outputs of the ith layer (first block), the instructions associated with the code for the subsequent layers (second and third blocks) are not activated – i.e., the instructions are overwritten/replaced by an absence of any operation in the execution control flow.) 
However, Panda does not explicitly teach … with no operation 17(NOP) instructions that are to be executed during execution of the second block of instructions; and  18executing the NOP instructions of the second block of instructions. In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions, he does not disclose that these instructions are replaced by NOP instructions which are subsequently executed.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10in response to executing the overwrite instruction of the first block of 15instructions and to determining that the condition is satisfied, triggering an overwrite 16action that causes the second block of instructions to be overwritten with no operation 17(NOP) instructions that are to be executed during execution of the second block of instructions; and  18executing the NOP instructions of the second block of instructions….  (See at Further at [p. 4, Section 3.2] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4] If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required. , wherein instructions are generated through a compiler to convert while loops into a wish loop having a predicated termination such that blocks of code indicating successive iteration loops are fetched (according to a prediction of a number required but such that overwrite instructions are also generated by the compiler (1) to form a prediction of which blocks are needed), (2) to determine what confidence mode that prediction corresponds to, (3) to determine (at each iteration) if there are unneeded fetched blocks, and (4) to overwrite/override the blocks of code (i.e., second and third blocks) that are determined to no longer be needed as the result of a late-exit in a low-confidence mode; in other words, as exemplified in section 3.2, the predicate evaluation at block X_3 leads to transition to the block Y, rendering the instructions fetched for blocks X_4 and X_5 to be extraneous and enabling the replacement of those extraneous blocks by nop’s which are nonetheless blocks that are executed subsequently.)  …
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to generate and implement overwrite instructions for a neural network which, based on the satisfaction of a break condition  at one iteration of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions , with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

Claim 16 is also rejected because it is just a computer readable memory implementation of the same subject matter of Claim 6 which can be found in Panda and Kim. It is noted that the claim also recites a storage medium with instructions which may be found in Panda (for example, the hardware implementation of the neural network and classifier code (p. 478, Section IV). 

Claim 18/16 is also rejected because it is just a computer readable memory implementation of the same subject matter of Claim 11/6 which can be found in Panda and Kim. 

In regard to claim 19, the rejection of claim 18 is incorporated, and Panda further teaches 12wherein the evaluation instruction precedes the overwrite instruction in each  of the one or more blocks of instructions.  (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer. , Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., Further at [p. 478, Section IV]For hardware execution, we implemented each classifier at the register transfer logic (RTL) level. Synopsys design compiler was used to synthesize the integrated design to a 45nm SOI process from IBM. Finally, Synopsys Power compiler was used to estimate energy consumption of the synthesized netlists., wherein the code which implements the linear classifier comprises the evaluation instructions since it evaluates the output of the respective layer of the neural network to generate information that may be used in the break condition and for the subsequent determination and action/termination (overwrite instructions) associated with the break condition result.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim for the same reasons as pointed out for claim 18.

1 In regard to claim 20, the rejection of claim 18 is incorporated, and Panda further teaches 122wherein the overwrite action causes the evaluation instruction of each subsequent block of instructions to be overwritten with …. (See at [p. 478, Section IIIB] Testing the CDLN Algorithm 2 describes the overall testing methodology for the CDLN. Given a test instance I_test, the methodology produces the class label L_test for it using N_cdt. The output from the linear classifier at every stage is monitored to decide if the input can be classified at the current stage or not. For the worst case (very hard instance), all the CNN layers and the corresponding linear classifiers at every stage will be activated and …will be the class label produced by the final output layer.,  Further at [Algorithm 2, Figure 3, Table I, Table II] Algorithm 2: Methodology to test the CDLN Input: … CDLN N_cdl with the # of linear classifiers or stages in N_cdl. … Obtain the CNN layer feature vectors for I_test (CNN_i) corresponding to a stage/layer i. 2. If a linear classifier … is present at stage i, obtain the output of corresponding to CNN_i. 3. If the confidence value of the output is beyond a certain threshold δ (user defined), then TERMINATE testing at stage i and Output … Class label … . The layers or stages of N_cdl from i+1 onwards are not activated if testing is terminated at stage i 4. If the confidence value of the output is below the threshold δ or output has high confidence value for more than one class label, activate the next stage i+1. 5. Goto step 1 and repeat this until you reach the final layer of the CDLN., wherein the code that determines the evaluation of the break condition (sufficient confidence) code is in place for every layer (see Figure 3b) but rendered inactive (overwritten with no operation) for layers subsequent to layer i if the break condition (execution flow control condition) for layer i is satisfied (i.e., the instructions for the linear classifier and the break condition become inactive). 
However, Panda does not explicitly disclose …with the NOP instruction… In other words, although Panda teaches the overwriting of instructions based on a break condition in a loop by not activating the existing instructions which include the evaluation instructions (e.g., linear classifier) performed in subsequent layers, he does not disclose that these subsequent evaluation instructions are replaced by NOP instructions.
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10 in 2wherein the overwrite action causes the evaluation instruction of the subsequent blocks to be overwritten with the NOP instruction; (See at [p. 3, Section 3.1] Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., Further at [p. 4, Section 3.2] In low-confidence-mode, the processor still predicts the wish loop according to the loop/branch predictor. However, it does not predict the predicate value. Hence, the iterations of the loop are predicated (i.e., fetched but not executed until the predicate value is known) during low-confidence-mode. There are three misprediction cases in this mode: (1) early-exit: the loop is iterated fewer times than it should be, (2) late-exit: the loop is iterated only a few more times by the processor front end than it should be and the front end has already exited when the wish loop misprediction is signalled, and (3) no-exit: the loop is still being iterated by the processor front end when the wish loop misprediction is signaled… One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline., Further at [p. 5, Section 3.5.4, Figure 3, Figure 4] If a wish loop is mispredicted during low-confidence-mode, the processor needs to distinguish between early-exit, late-exit, and noexit. To support this, the processor uses a small buffer in the front end that stores the last prediction made for each static wish loop instruction that is fetched but not yet retired. … When a wish loop is mispredicted and the actual direction is not-taken, the branch misprediction recovery module checks the latest prediction made for the same static wish loop instruction by reading the buffer in the front end. If the last stored prediction is not taken, it is a late-exit case, because the front end must have already exited the loop, so no pipeline flush is required. , wherein the replacement of the extraneous fetched block by nops also includes replacement of evaluation instructions associated with those blocks in the form of the code that is used to directly determine the predicates to the break condition (e.g., p1 in Figure 4B or Figure 5B) because these instructions are no longer on the correct control flow path.) …
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to implement a neural network accelerator for a neural network which implements overwrite instructions for overwriting code instructions, including the evaluation instructions, with nop instructions and evaluates a break condition, the satisfaction of which at one iteration of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions and with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous as well as any associated code not on the correct control flow path by nops instead of flushing the pipeline. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Panda, in view of Kim, and in further view of Tatsuya Iwamoto (US2006/0075394, 6 April 2006), hereinafter referred to as Iwamoto.

In regard to claim 7, the rejection of claim 6 is incorporated, and Panda does not further teach  1wherein the overwrite action causes a direct 2memory access (DMA) engine to overwrite the instructions of each of the subsequent block with the 3NOP instructions.  Panda does not disclose a DMA. 
However, Kim, in the analogous art of designing efficient predicated flow execution in loops, teaches 10 wherein the overwrite action causes a … 2memory access … engine to overwrite the instructions of each of the subsequent block with the 3NOP instructions
 (See at [p. 3, Section 3.1] Like predicated code, the instructions that are not on the correct control flow path will become NOPs since all instructions control-dependent on the branch are predicated., Further at [Section 3.2, p. 4] One example of the late-exit case is when the predictions for the loop branch are TTTTN so the front end fetches blocks X1X2X3X4X5Y… In the late-exit case, the fall-through block Y has been fetched before the predicate for the first extra block X4 has been resolved. Therefore it is more efficient to simply allow X4 and subsequent extra block X5 to flow through the data path as NOPs (with predicate value p1 = FALSE) than to flush the pipeline.,, Further at [p. 3, Section 3.2] A wish branch can also be used for a backward branch. We call this a wish loop instruction. … The main difference between the normal branch code (Figure 4a) and the wish loop code (Figure 4b) is that in the wish loop code, the instructions in block X (i.e., the loop body) are predicated with the loop branch condition., A wish branch binary is an object file consisting of a mixture of wish branches, traditional predicated code, and normal branches. The compiler decides which branches are predicated, which are converted to wish branches, and which stay as normal branches based on estimated branch misprediction rates and compile-time heuristics., Further at [p. 7, Section 4.2.1] To generate predicated code, the ORC compiler first checks whether or not the control-flow graph is suitable for if-conversion in a region boundary., Further at [p. 7, Section 4.2.2, Figure 3, Figure 4] If a branch is suitable for if-conversion, we treat that branch as a wish branch candidate. If the number of instructions in the fall-through block of a branch is greater than N (we set N to 5), the candidate branch is converted to a wish jump and the necessary wish joins are inserted., Further at [p. 5, Section 3.5.3, Figure 3, Figure 4] If both predicate register numbers are the same, the source predicate register of the instruction is assumed to be ready, with a TRUE value when the wish branch is predicted to be taken and with a FALSE value when the wish branch is predicted to be not taken. The special buffer is reset if there is a branch misprediction or if an instruction that writes to the same predicate register is decoded., wherein the replacement of the extraneous fetched block by nops also includes replacement of evaluation instructions associated with those blocks in the form of the code that is used to directly determine the predicates to the break condition (e.g., p1 in Figure 4B or Figure 5B) because these instructions are no longer on the correct control flow path.), wherein the overwrite actions corresponding to instantiation of different branch paths (including jumps) include memory (register) access to detect a misprediction and then decide if a late-exit event has occurred and if a pipeline flush is required and wherein, in general, the management of the branch instruction cache space is a memory access function associated with the overwriting process.)…
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda to incorporate the teachings of Kim to generate and implement overwrite instructions for a neural network for overwriting code instructions with nop instructions, using memory access techniques, based on a (break) condition, the satisfaction of which at one iteration/block of a loop causes the blocks of code in subsequent iterations to be overwritten with nop instructions and with those nop instructions executed for those subsequent iterations. The modification would be obvious because one of ordinary skill would be motivated to improve efficiency of the execution of loops with predicated exit conditions when the number of blocks/iterations in that loop is difficult to predict, such as when the branch/break conditions are strongly input data dependent, by fetching instructions for all predicted blocks but replacing instruction of blocks determined to be extraneous by nops instead of flushing the pipeline under low confidence conditions. (Kim, [Abstract, p. 4, Section 3.2, p. 9, Section 5.2, Figure 10, Figure 12]).
However, Kim and Panda do not explicitly teach … direct memory access2….. (DMA) … Kim does not clearly point out if a DMA method is being used to change to memory in registers such as associated with instructions.
However, Iwamoto, in the analogous art of optimizing instruction management on processers, teaches wherein the overwrite action causes a direct 2memory access (DMA) engine to overwrite the instructions of each of the subsequent block with the 3NOP instructions (See at [0074] The path of execution branching can be represented by a tree structure. It is the position in the tree structure that determines whether the reference is going to be used or is likely to be used, for example based on a probability ranging from 0% to 100%, wherein a 100% probability means that the reference will definitely be used and a 0% probability means that the reference will not be used. Insertion points should be placed after a branch. Then, in step S706, the module or modules are loaded by, for example, a DMA transfer. Loading is preferably performed in a background process to minimize delays in code execution.,  Further at [0075, Figure 7A, Figure 7B] Once an insertion point 724 is determined for a second function B as discussed above, a program module containing function B is loaded by, for example, a DMA transfer 726. The DMA transfer 726 takes Some period of time, shown as T.A.. If the processor is ready to perform function B, for example due to a program jump 728 in function A, it is determined whether the load of program module B is complete as in step S708. As seen in FIG. 7B, the transfer 726 is not complete by the time the jump 728 occurs. Therefore, a wait period Twar occurs until the transfer 726 is complete. The processor may, for example, perform one or more “no operations” (“NOPs) during T_wait., wherein, code that is loaded in response to a jump (branch condition satisfaction) includes the writing by the DMA (Figures 7A, 7B) nop instructions at least during a wait period of time before the module to which the program has executed is fully loaded.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Panda and Kim to incorporate the teachings of Iwamoto to use direct memory access (DMA) to overwrite instructions of subsequent blocks with NOP instructions. The modification would be obvious because one of ordinary skill would be motivated to improve execution efficiency (reduced time) of programs which execute jumps in the instruction code by inserting nops during DMA transfer before the start of the next module to reduce the latency time for loading and unloading modules. (Kim, [0075, 0076, Figure 7B]).

Claim 17/16 is also rejected because it is just a computer readable memory implementation of the same subject matter of Claim 7/6 which can be found in Panda, Kim, and Iwamoto. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Xue et al. (“AMASS: Automated Software Mass Customization via Feature Identification and Tailoring”, EAI Endorsed Transactions on Security and Safety, April, 2019) teach the replacement of function blocks identified with a CNN with NOPs.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983. The examiner can normally be reached M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT LEWIS KULP/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126