DETAILED ACTION
 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
The following claim(s) is/are pending in this Office action: 1-23. 
Claim(s) 1-23 are rejected.  This rejection is NON-FINAL.
 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 18, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
 
Claim Objections
Claims 1-2, 5, 13-14, and 17 stand objected to because of the following informalities: 
(a)      Claims 1 and 13: the limitation “weight” in “adjusting weight associated with the at least one higher level of the neural network based on the evaluation of the contribution” should be proceeded with an indefinite article.  The examiner suggests amending this limitation to recite “adjusting a weight associated with the at least one higher level of the neural network based on the evaluation of the contribution”.
and wherein influence means that a value of a feature in the lower level layers has an effect on a value of a feature in the higher level layers.”
(c)      Claims 5 and 17: The limitation “performance of Principal Components Analysis (PCA)” is missing an indefinite article before Principal Components Analysis. The examiner suggests amending this limitation to recite “performance of a Principal Components Analysis (PCA)”.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
 
            Claims 1-18, 21, and 23 stand rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

(1) the two recitations of “contribution” in “evaluating contribution of lower level features” and “quantification of contribution” are indefinite as it is unclear whether these refer to the same or different “contribution”.  For the purpose of examination, the two recitations of “contribution” are interpreted as the same “contribution”.
(2) the two recitations of “higher level features” are indefinite as it is unclear whether these refer to the same or different “higher level features”.  For the purpose of examination, the two recitations of “higher level features” are interpreted as the same “higher level features”.
(3) Similarly, although claims 1 and 13 further recite “lower level and higher level features” and “lower level features” that are indefinite. More specifically, the “lower level” in “lower level and higher level features” is interpreted as “lower level features” that confuse with “lower level features” in “quantification of contribution of lower level features”. For the purpose of examination, the two recitations of “lower level features” are interpreted as the same “lower level features”.
(b)      Claim(s) 2-12 and 14-18: Claims 2-12 and 14-18 respectively dependent from independent claims 1 and 13 and thus inherit the deficiencies therefrom.  Claims 2-12 and 14-18 are thus also rejected under 35 U.S.C. § 112(b) for at least their respective dependency on independent claims 1 and 13. 
(c)      Claim(s) 2 and 14: The two recitations of “influence” are indefinite and fail to observe antecedent basis where the first “influence” is a verb, and the second “influence” is used as a noun (a subject) without observing antecedent basis. Correction is required.
(d)      Claim(s) 4 and 16:

(2) the limitation “higher level features” in “training the neural network to include capabilities for identification of lower level features that have influence on higher level features” confuses with the claimed “higher level features” in base claim 1.  For the purpose of examination, the two recitations of “higher level features” are interpreted as the same “higher level features”.
(e)      Claim(s) 5 and 17: the limitation “a weight” in “performance of Principal Components Analysis (PCA) to identify a weight …” confuses with the claimed “weight” in base claim 1.  For the purpose of examination, the two recitations of “weight” are interpreted as the same “weight”.
(f)       Claim(s) 6 and 18:
(1) the respective two recitations of each of “one or more nodes,” “values,” and “an output” without antecedent basis are indefinite. For the purpose of examination, these “one or more nodes,” “values,” and “an output” are interpreted as different “one or more nodes,” “values,” and “an output”. 
(2) the claimed limitation “a substantially different value” is indefinite because the disclosure merely describes “a substantially different output value” as “i.e., more than a nominal change in output value” that is also indefinite in that neither “a nominal change” nor “more than a nominal change” is clearly defined.  Therefore, the claimed “a 
(g)      Claim(s) 7:
(1) the two recitations of “support” in “determining support …” and “determining a strength of support …” are indefinite as it is unclear whether these two recitations of “support” refer to the same or different “support”. For the purpose of examination, these two recitations of “support” are interpreted as the same “support”.
(2) the two recitations of “one or more inference decisions” and the three recitations “the one or more inference decisions” are indefinite because it is unclear whether these “one or more inference decisions” refer to the same or different “one or more inference decisions”. For the purpose of examination, these five recitations of “one or more inference decisions” are interpreted as the same “one or more inference decisions”.
(h)      Claim(s) 10 and 21: The limitation “the inference” lacks antecedent basis as claim 10 and its base claims only recite “one or more inference decisions” that do not provide antecedent basis for “the inference”. 
(i)       Claim(s) 12 and 23: The limitation “the more compute intensive model” is a relative term lacking a basis for comparison and thus renders the scope of the claim unclear. 
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 
Claims 19-23 stand rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more. 
Step 1: claims 19-23 are directed to the statutory category of machines. 

Step 2A – Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04(II) and the October 2019 Update, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim.
Claim 19 recites the following limitations: 
wherein the system is to: determine support from one or more features associated with a plurality of layers of a neural network for one or more inference decisions by the neural network; (Mental process: The examiner notes that this limitation merely recites determining whether one or more input features (e.g., factors for consideration for an inference decision) provide support for an inference decision.  The broadest reasonable interpretation of this limitation encompasses a mental process. For example, a human can review, either in his mind or with a pen and paper, the respective weights manually assigned to these input features for an inference output and determine which corresponding input values (which are respectively multiplied by these respective weights) are less relevant (e.g., input values associated with small weights) to the inference output and which other input values are more relevant and hence provide support for the inference output.  Therefore, this limitation is directed to a mental process and hence fails to satisfy Step 2A.)
determine a strength of support for each of the one or more inference decisions; (Mental process: The examiner notes that this limitation merely recites determining how much one or more features support an inference decision.  The broadest reasonable interpretation of this limitation encompasses a mental process. For example, a human can review, either in his mind or with a pen and paper, compute the products of respective weights and corresponding input values to determine the aforementioned more relevant input features for the inference output and determine that the larger a product value between a weight and a corresponding input value is, the more support the corresponding input value provides.  Therefore, this limitation is directed to a mental process and hence fails to satisfy Step 2A.)
identify one or more inference decisions that have low stability based at least in part on the determined strength of support for the one or more inference decisions; and (Mental process: The examiner notes that this limitation merely recites identifying an inference decision that has a fewer number of input features that provide support for the inference decision.  The broadest reasonable interpretation of this limitation encompasses a mental process. For example, a human can compare, either in his mind or with a pen and paper, the respective numbers of supporting input features for multiple inference decisions via the mental processes delineated above and identify one or more inference decisions having the fewest or fewer numbers of supporting input features as one or more low-stability inference decisions. Therefore, this limitation is directed to a mental process and hence fails to satisfy Step 2A.)
reevaluate the one or more inference decisions that are identified as having low stability. (Mental process: The examiner notes that this limitation merely recites reevaluating an inference decision.  The broadest reasonable interpretation of this limitation encompasses a mental process. For example, a human can, in his/her mind or with a pen and paper, re-evaluate an inference decision by, for example, mentally adjusting one or more weights for their corresponding input features. Therefore, this limitation is directed to a mental process and hence fails to satisfy Step 2A.)
Step 2A – Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (a) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (b) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See 2019 PEG Section III{A)(2), 84 Fed. Reg. at 54-55.  
Claim 19 recites the following additional elements: 
one or more processors to process data; and (The examiner notes that this additional element, when analyzed individually and as an ordered combination, merely generally links the mental process of processing data to a particular technological environment or field of use involving one or more processors.  This has been found to be insufficient to integrate the claimed judicial exception to a practical application to satisfy Step 2A Prong Two. See MPEP § 2106.05(h).)
a memory to store data, including data for neural network analysis; (The examiner notes that this additional element, when analyzed individually and as an ordered combination, is directed to an insignificant, well-known extra-solution activity of storing certain data in memory which has been found to be insufficient to integrate the claimed judicial exception to a practical application. See MPEP § 2106.06(g)(1).  Therefore, this limitation also fails to satisfy Step 2A Prong Two.)
 
Step 2B: The examiner asserts that the additional elements do not amount to significantly more than the aforementioned judicial exception because the additional 
Claim 19 recites the following additional elements: 
one or more processors to process data; and (The examiner notes that this additional element merely generally links the mental process of processing data to a particular technological environment or field of use recited at a high level of generality (e.g., one or more processors).  This has been found to be insufficient to amount to significantly more than the claimed judicial exception to satisfy Step 2B. See MPEP § 2106.05(h).)
a memory to store data, including data for neural network analysis; (The examiner notes that this additional element is directed to an insignificant, well-known extra-solution activity of storing certain data in memory recited at a high level of generality.  This has also been found to be insufficient to amount to significantly more than the claimed judicial exception to satisfy Step 2B. See MPEP § 2106.06(g)(1).)
Therefore, claim 19 is directed to a judicial exception without reciting additional elements that integrate the judicial exception into a practical application or amount to significantly more than the claimed judicial exception.  Claim 19 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons. 
 
With respect to claim 20, claim 20 recites the following limitation and is rejected under 35 U.S.C. § 101 for at least the following reasons. 
(mental process: The examiner notes that the broadest reasonable interpretation of this limitation encompasses a human accounting for or considering a number of factors when the human performs an inference decision in his/her mind or with a pen and paper. Therefore, this limitation is merely directed to a mental process and thus fails to satisfy Step 2A.)
          Further, the examiner notes that this limitation does not recite any additional elements, much less additional elements that integrate the aforementioned judicial exception into a practical application or amount to significantly more than the claimed judicial exception.  Claim 20 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
With respect to claim 21, claim 21 recites the following limitation and is rejected under 35 U.S.C. § 101 for at least the following reasons. 
wherein reevaluating the one or more inference decisions that are identified as having low stability includes the system to re-perform the inference for the one or more inference decisions. (Mental process: The examiner notes that but for the additional element of “the system to” that will be analyzed below in Step 2A Prong Two and Step 2B, the broadest reasonable interpretation of this limitation encompasses a human re-performing an inference decision for a previously identified low-stability inference decision (see claim 19, supra) in his/her mind or with a pen and paper.  Therefore, this limitation of claim 21 fails to satisfy Step 2A Prong One.)

          Similarly, the examiner notes that this generally linking a judicial exception to a particular technological environment or field of use involving a generically recited system has also been found to be insufficient to amount to significantly more than the claimed judicial exception to satisfy Step 2B. See MPEP § 2106.05(h).
          Claim 21 thus recites a judicial exception without reciting additional elements that integrate the aforementioned mental process into a practical application or amount to significantly more than the claimed judicial exception.  Claim 21 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
With respect to claim 22, claim 22 recites the following limitation and is rejected under 35 U.S.C. § 101 for at least the following reasons. 
wherein re-performing the inference for the one or more inference decisions includes adding perturbations to input data and sampling weights of neurons from statistical distributions. (Mental process: The examiner notes that but for the additional element of “neurons” that will be analyzed below in Step 2A Prong Two and Step 2B, the broadest reasonable interpretation of this limitation encompasses a human re-performing an inference decision for a previously identified low-stability inference decision (see claim 19, supra) in his/her mind or with a pen and paper by, for example, sampling weight values for input values that are used for the human in making an inference decision in his/her mind or with a physical aid.  Therefore, this limitation of claim 22 fails to satisfy Step 2A Prong One.)
          Further, the examiner notes that this limitation recites the additional element of neurons to link the aforementioned mental process to a particular technological environment or field of use of neural networks. The examiner notes that this has been found to be insufficient to integrate the claimed judicial exception to a practical application to satisfy Step 2A Prong Two. See MPEP § 2106.05(h).
          Similarly, the examiner notes that this generally linking a judicial exception to a particular technological environment or field of use has also been found to be insufficient to amount to significantly more than the claimed judicial exception to satisfy Step 2B. See MPEP § 2106.05(h).
          Claim 22 thus recites a judicial exception without reciting additional elements that integrate the aforementioned mental process into a practical application or amount to significantly more than the claimed judicial exception.  Claim 22 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
With respect to claim 23, claim 23 recites the following limitation and is rejected under 35 U.S.C. § 101 for at least the following reasons. 
wherein re-performing the inference includes performing the inference with a more compute intensive model of the neural network. (mathematical principle: The examiner notes that but for the additional element of a neural network that will be analyzed below in Step 2A Prong Two and Step 2B, this limitation is merely recite a generically recited mathematical principle or concept of using a more compute intensive model for inferencing.  Therefore, this limitation of claim 22 fails to satisfy Step 2A Prong One. See MPEP § 2106.04(I).)
          Further, the examiner notes that this limitation recites the additional element of a neural network to link the aforementioned judicial exception to a particular technological environment or field of use of neural networks. The examiner notes that this has been found to be insufficient to integrate the claimed judicial exception to a practical application to satisfy Step 2A Prong Two. See MPEP § 2106.05(h).
          Similarly, the examiner notes that this generally linking a judicial exception to a particular technological environment or field of use has also been found to be insufficient to amount to significantly more than the claimed judicial exception to satisfy Step 2B. See MPEP § 2106.05(h).
          Claim 23 thus recites a judicial exception without reciting additional elements that integrate the aforementioned mental process into a practical application or amount to significantly more than the claimed judicial exception.  Claim 23 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
 
Claim(s) 1-4 and 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bach et al. US20180018553A1 with publication date of Jan. 18, 2018 (hereinafter Bach).
(Bach, ¶ [0017]: “Another embodiment may have a non-transitory digital storage medium having stored thereon a computer program for performing a method for assigning a relevance score to a set of items”.)
evaluating contribution of lower level features to higher level features in a neural network, (Bach, Abstract: “The task of relevance score assignment to a set of items onto which an artificial neural network is applied is obtained by redistributing an initial relevance score derived from the network output, onto the set of items by reversely propagating the initial relevance score through the artificial neural network so as to obtain a relevance score for each item”; and “for each neuron, preliminarily redistributed relevance scores of a set of downstream neighbor neurons of the respective neuron are distributed on a set of upstream neighbor neurons of the respective neuron according to a distribution function.” The examiner notes that Bach’s redistributing relevance scores from the output or downstream neurons (lower level features) to upstream neurons (higher level features” by reverse propagation teaches this limitation.)

the neural network having a plurality of neural network layers including an input layer, at least one lower level, at least one higher level, and an output layer, (Bach, FIG. 2A (annotated): 

    PNG
    media_image1.png
    642
    1226
    media_image1.png
    Greyscale


¶ [0047]: “input neurons of network 10,” “output neurons 12,” and “the layers formed thereby being intermediate neurons or intermediate layers”. The examiner notes that Bach’s plural “intermediate layers” teach at least one lower level and at least one higher level.)

the evaluation including one or more of: identification of links between lower level and higher level features of the neural network, and (Bach, FIG. 2A reproduced immediately above.  The examiner notes that Bach’s distributing relevance scores by traversing the network backwards from the output to the input teaches the above limitations.)

quantification of contribution of lower level features to higher level features of the neural network; and (Bach, ¶ [0088]: “Layer-wise relevance propagation assumes that we have a Relevance score Rd (l+1) for each dimension z(d,l+1) of the vector z at layer l+1. The idea is to find a Relevance score Rd (l) for each dimension z(d,l) of the vector z at the next layer l which is closer to the input layer such that the following equation holds.” 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

The examiner notes that Bach’s determining a relevance score for each dimension of a vector Z at a higher layer (the next layer l+1) from the relevance score at a lower layer (layer l) teaches the above limitation.)

adjusting weight associated with the at least one higher level of the neural network based on the evaluation of the contribution. (Bach, ¶ [0139]: “The relevance is backpropagated from one layer to another until it reaches the input pixels x(d), and where relevances Rd (1) provide the desired pixel-wise decomposition of the decision ƒ(x).” ¶ [0140]: “Algorithm 2”.  ¶ [0141]: “In such a general case, the weighting terms zij=xiwij from Equation (50) have to be replaced accordingly by a function of hij(xi). ” The examiner notes that Bach’s backpropagation from the output all the way to the input and/or replacing with weighting terms zij=xiwij with hij(xi) based on the relevance score propagation teaches the above limitation.) 

	With respect to claim 2, Bach further teaches the one or more mediums of claim 1, wherein identification of links between lower level and higher level features of the neural network includes examining layers of the neural network from the output layer towards the input layer to identify one or more features in lower level layers of the neural network that influence one or more features in higher level layers of the neural network, (Bach, Abstract: “The task of relevance score assignment to a set of items onto which an artificial neural network is applied is obtained by redistributing an initial relevance score derived from the network output, onto the set of items by reversely propagating the initial relevance score through the artificial neural network so as to obtain a relevance score for each item”; and “for each neuron, preliminarily redistributed relevance scores of a set of downstream neighbor neurons of the respective neuron are distributed on a set of upstream neighbor neurons of the respective neuron according to a distribution function.” See also FIG. 2A –relevance scores of upstream neurons that influence the downstream neurons. 
The examiner notes that Bach’s “downstream neurons” or the “relevance score” thereof teaches the lower level features in lower level layers, that Bach’s “upstream neurons” teach higher level features in higher level layers, and that Bach’s propagating a relevance score from the output to one or more higher layers teach the above limitation.)

wherein influence means that a value of a feature in the lower level layers has an effect on a value of a feature in the higher level layers. (Bach, ¶ [0074]: “Thus, the neural network may, for example, be used to predict success (e.g. number of sold products) of an advertisement campaign (regression task). The relevance scores can be used to identify some influential aspects for the success.” The examiner notes that Bach’s successful prediction teaches a value of a feature in a higher level layer (e.g., output layer), that Bach’s “some influential aspect” identified by using the “relevance scores” teaches “influence” that has an effect on the aforementioned value of a feature in a higher level layer.)

	With respect to claim 3, Bach further teaches the one or more mediums of claim 2, further comprising executable computer program instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: determining a relative level of influence for each lower level feature having an influence on a higher level feature of the neural network. (Bach, FIG. 2A; and ¶ [0088]: “Layer-wise relevance propagation assumes that we have a Relevance score Rd (l+1) for each dimension z(d,l+1) of the vector z at layer l+1. The idea is to find a Relevance score Rd (l) for each dimension z(d,l) of the vector z at the next layer l which is closer to the input layer such that the following equation holds.” 
The examiner first notes that Bach’s aggregating the relevance score for each dimension z(d,l) of the vector z and hence the multiple relevance scores of multiple, respective dimensions teaches a relative level of influence for each lower level feature as the relevance of all these dimensions are normalized into a vector and are thus relative to each other. The examiner further notes that Bach’s “25%, 35%, 20%, 20%, etc.” in FIG. 2A and/or Bach’s determining a lower layer relevance score (Rd (l)) for each dimension of a vector by reversely propagating the high layer relevance score (Rd (l+1)) for each dimension of the corresponding vector teaches the above limitation.) 

Bach further teaches one or more mediums of claim 2, further comprising executable computer program instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: training the neural network to include capabilities for identification of lower level features that have influence on higher level features. (¶ [0283]: “We have also trained a neural network on the MNIST data set. This data set contains images of numbers from 0 to 9. After training the network is able to classify new unseen images. With back-propagating relevance score assignment we can ask why does the network classify an image of a 3 as class ‘3’, in other words what makes a 3 different from the other numbers.” FIG. 2A cited for claim 1, supra. The examiner notes that Bach’s training the neural network via backpropagating the relevance score assignment includes the capabilities of identifying lower level features (e.g., the neurons with associated percentages in one or more hidden layers shown in FIG. 2A, supra).  The examiner further notes that the percentages associated with respective neurons in FIG. 2A, supra, teach influence on higher level features (e.g., the outputs shown in FIG. 2A, supra).)  
 
With respect to claim 13, Bach teaches a method comprising: (same as claim 1)
evaluating of contribution of lower level features to higher level features in a neural network, (Bach, Abstract: “The task of relevance score assignment to a set of items onto which an artificial neural network is applied is obtained by redistributing an initial relevance score derived from the network output, onto the set of items by reversely propagating the initial relevance score through the artificial neural network so as to obtain a relevance score for each item”; and “for each neuron, preliminarily redistributed relevance scores of a set of downstream neighbor neurons of the respective neuron are distributed on a set of upstream neighbor neurons of the respective neuron according to a distribution function.” The examiner notes that Bach’s redistributing relevance scores from the output or downstream neurons (lower level features) to upstream neurons (higher level features” by reverse propagation teaches this limitation.)

the neural network having a plurality of neural network layers including an input layer, at least one lower level, at least one higher level, and an output layer, (Bach, FIG. 2A (annotated): 

    PNG
    media_image1.png
    642
    1226
    media_image1.png
    Greyscale


¶ [0047]: “input neurons of network 10,” “output neurons 12,” and “the layers formed thereby being intermediate neurons or intermediate layers”. The examiner notes that Bach’s plural “intermediate layers” teach at least one lower level and at least one higher level.)

the evaluation including one or more of: identification of links between lower level and higher level features of the neural network, and (Bach, FIG. 2A, supra.  The examiner notes that Bach’s distributing relevance score assignment by traversing the network backwards from the output to the input identifies one or more links between, for example, an output in 18 and one or more lower level neurons in one or more hidden layers and thus teaches the above limitations.)

quantification of contribution of lower level features to higher level features of the neural network; and (Bach, ¶ [0088]: “Layer-wise relevance propagation assumes that we have a Relevance score Rd (l+1) for each dimension z(d,l+1) of the vector z at layer l+1. The idea is to find a Relevance score Rd (l) for each dimension z(d,l) of the vector z at the next layer l which is closer to the input layer such that the following equation holds.” 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

The examiner notes that Bach’s determining a relevance score for each dimension of a vector Z at a higher layer (the next layer l+1) from the relevance score at a lower layer (layer l) teaches the above limitation.)

adjusting weight associated with the at least one higher level of the neural network based on the evaluation of the contribution. (Bach, ¶ [0139]: “The relevance is backpropagated from one layer to another until it reaches the input pixels x(d), and where relevances Rd (1) provide the desired pixel-wise decomposition of the decision ƒ(x).” ¶ [0140]: “Algorithm 2”.  ¶ [0141]: “In such a general case, the weighting terms zij=xiwij from Equation (50) have to be replaced accordingly by a function of hij(xi). ” The examiner notes that Bach’s backpropagation from the output all the way to the input and/or replacing with weighting terms zij=xiwij with hij(xi) based on the relevance score propagation teaches the above limitation.) 

With respect to claim 14, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 15, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 16, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying. 

Claim(s) 5-6 and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bach et al. US20180018553A1 with publication date of Jan. 18, 2018 (hereinafter Bach) in view of Bengio et al., Representation Learning: A Review and New Perspectives (Apr 23 2014) (hereinafter Bengio).
With respect to claim 5, Bach teaches the one or more mediums of claim 1, wherein the quantification of contribution of lower level features to higher level features of the neural network includes performance of an analysis to identify a weight for each of (Bach at ¶ [0103]: Equation (16)
    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
.  ¶ [0104]: “the example given in Equation (16) gives an idea what a message Ri←k (l,l+1) could be, namely the relevance of a sink neuron Rk (l+1) which has been already computed weighted proportionally by the input of the neuron i from the preceding layer l. ” The examiner notes that Bach thus teaches the quantification of contribution of lower level features to higher level features of the neural network includes performance an analysis to identify a weight for each of a plurality of combinations of lower level features that contribute to a higher level feature.)  
Bach does not appear to explicitly teach that the analysis is a principle component analysis (PCA).  
Bengio does, however, teach:
performance of Principal Components Analysis (PCA) to identify a weight for each of a plurality of combinations of lower level features that contribute to a higher level feature. (p. 5, § 3.5, ¶ 3: “It is important to distinguish between the related but distinct goals of learning invariant features and learning to disentangle explanatory factors.”; and “[c]onsiderations such as these lead us to the conclusion that the most robust approach to feature learning is to disentangle as many factors as possible, discarding as little information about the data as is practical. If some form of dimensionality reduction is desirable, then we hypothesize that the local directions of variation least represented in the training data should be first to be pruned out (as in PCA, for example, which does it globally instead of around each example).”  P. 7, left-hand column, ¶ 4: “PCA learns a linear transformation h = f(x) = WTx + b of input x 2                         
                            
                                
                                    R
                                
                                
                                    
                                        
                                            d
                                        
                                        
                                            x
                                        
                                    
                                
                            
                        
                     , where the columns of dx x dh matrix W form an orthogonal basis for the dh orthogonal directions of greatest variance in the training data. The result is dh features (the components of representation h) that are decorrelated.” P. 14, § 7.1, FN15: “Contrary to traditional PCA loading factors, but similarly to the parameters learned by probabilistic PCA, the weight vectors learned by a linear auto-encoder are not constrained to form an orthonormal basis, nor to have a meaningful ordering. They will however span the same subspace.” The examiner notes that Bengio’s use of PCA for disentanglement of features such as weight vectors, when combined with Bach’s teaching above, teaches this limitation.)
Bach and Bengio are analogous art because both pertain to determining relevant interacting factors or features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach’s apparatus and method for quantifying the contribution of lower level features to higher level features (see Bach, supra) to incorporate Bengio’s performing a principal component analysis to identify a weight for each combination of lower-level features contributing to a higher-level feature.  The motivation enables, via a principal component analysis (PCA), learning one or more regions or features (e.g., principal component(s) such as neuron(s) and/or the respective weight(s)) where the probability of relevance concentrates (e.g., manifold(s)) but have a much smaller dimensionality than the original space of the data due to dimensionality reduction provided by PCA so that the riskier nearest neighbor Bengio, p. 17, § 8.1, last paragraph: “However, basing the modeling of manifolds on training set neighborhood relationships might be risky statistically in high dimensional spaces (sparsely populated due to the curse of dimensionality) as e.g. most Euclidean nearest neighbors risk having too little in common semantically.” p. 17, § 8.2, ¶ 1: “Can we learn a manifold without requiring nearest neighbor searches? Yes, for example, with regularized auto-encoders or PCA. In PCA, the sensitivity of the extracted components (the code) to input changes is the same regardless of position x. The tangent space is the same everywhere along the linear manifold.” P. 3, § 3.1, ¶ 7: “Manifolds: probability mass concentrates near regions that have a much smaller dimensionality than the original space where the data lives. This is explicitly exploited in some of the auto-encoder algorithms and other manifold-inspired algorithms described respectively in Sections 7.2 and 8.”)

With respect to claim 6, Bach modified by Bengio teaches the one or more mediums of claim 5, further comprising executable computer program instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising one or more of.  Bach further teaches: 
training the neural network to include one or more nodes to produce values related to generating an output by the neural network; and (Bach, ¶ [0039]: “A neural network is a graph of interconnected nonlinear processing units that can be trained to approximate complex mappings between input data and output data”; and “Each nonlinear processing unit (or neuron) consists of a weighted linear combination of its inputs to which a nonlinear activation function is applied.” The examiner notes that Bach’s nonlinear processing unit, a weighted linear combination of its inputs (or the result of the application of an activation function) of the aforementioned nonlinear processing unit, and output data respectively teaches a node, values related to generating an output, and an output.  The examiner further notes that Bach’s training the neural network for the above teaches the above limitation. )

training the neural network to include one or more nodes to produce values related to causing an output by the neural network to change to a substantially different value. (Bach, ¶ [0283]: “We have also trained a neural network on the MNIST data set. This data set contains images of numbers from 0 to 9. After training the network is able to classify new unseen images. With back-propagating relevance score assignment we can ask why does the network classify an image of a 3 as class ‘3’, in other words what makes a 3 different from the other numbers.” Also see FIG. 2A reproduced for claim 1, supra. 
The Examiner notes that Bach’s training its neural network to identify lower-level neurons (e.g., one or more neurons in any of the hidden layers shown in FIG. 2A show above) that contribute more significantly to the activation of a lower layer neuron (xj) identifying one or more nodes.  The examiner further notes that Bach’s respectively assigning the relevance score to class 1 (boat) having the relevance percentage of 5% and to class 2 (truck) having the relevance percentage of 95% as well as the subsequent back-propagating these two assignments to their respective lower-level nodes teaches identifying and including one or more nodes to produce value(s) related to causing the output of the neural network to change substantially (e.g., from class 1 boat to class 2 truck) and hence teaches this limitation.)

With respect to claim 17, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 18, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying. 


Claim(s) 7-10 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bach et al. US20180018553A1 with publication date of Jan. 18, 2018 (hereinafter Bach) in view of Shahroudnejad et al., IMPROVED EXPLAINABILITY OF CAPSULE NETWORKS: RELEVANCE PATH BY AGREEMENT (27 Feb. 2018) (hereinafter Shahroudnejad).
With respect to claim 7, Bach teaches one or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: (Bach, ¶ [0017]: “Another embodiment may have a non-transitory digital storage medium having stored thereon a computer program for performing a method for assigning a relevance score to a set of items”.)
(Bach, FIG. 2A (annotated): 

    PNG
    media_image4.png
    524
    1002
    media_image4.png
    Greyscale

The examiner notes that Bach’s re-distributing the relevance percentages (e.g., 5%, 95%, 40%, 55%, 25%, 35%, etc. for neurons in different layers) teaches the claimed support from one or more features associated with a plurality of layers of a neural network, that the classification result (e.g., class 1, class 2, class 3, etc.) teaches one or more inference decisions, and that Bach’s re-distributing the relevance score of a downstream (higher level) feature to one or more of its upstream (lower level) features such as neurons respectively associated with 5% - 95%, 25% - 35% - 20% - 20%, etc.) teaches the above limitation.) 

determining a strength of support for each of the one or more inference decisions; (Bach, FIG. 2A, supra. The examiner notes that each of Bach’s re-distributed relevance scores such as 5% - 95%, 25% - 35% - 20% - 20%, etc. teaches a strength of support for each upstream feature for the one or more inference decisions.)
Bach does not appear to explicitly teach: 
identifying one or more inference decisions with low stability based at least in part on the determined strength of support for the one or more inference decisions; and 
reevaluating the one or more inference decisions that are identified as having low stability. 

Shahroudnejad does, however, teach: 
identifying one or more inference decisions with low stability based at least in part on the determined strength of support for the one or more inference decisions; and (Shahroudnejad, FIG. 4: 

    PNG
    media_image5.png
    199
    451
    media_image5.png
    Greyscale

¶ 5, § 3.1 and Eq. (3) in § 3.1 “Relevance Path by Agreement”: § 3.1, ¶ 1: “In other words, while the length (magnitude) of the output vector vj corresponding to capsule j in the CC layer is used to make decisions regarding the input image, the length ||ui|| of the output vector ui from the of ith capsule in the PC layer or an intermediate capsule layer can be interpreted as probability of existence of the feature that this capsule has been trained to detect. More specifically, we can assign to each capsule a set consisting of two segments for explanation purposes: (i) Likelihood values which can be used to explain existence probability of the feature that a capsule detects, and; (ii) Instantiation parameter vector values which can be used to explain consistency among the layers. In other words, when all capsules of an object are in an appropriate relationship with consistence parameters, the higher level capsule of that object will have a higher likelihood. Therefore, explanations can be provided to describe why the network did detect an object.”  § 3.1, ¶ 5: “CapsNet applies non-linear squashing function on output vectors (vj) in each iteration. It actually bounds likelihood of these vectors between 0 and 1, which means that it suppresses small vectors and preserves long vectors in the unit length”

    PNG
    media_image6.png
    67
    374
    media_image6.png
    Greyscale

¶ 6, § 3.1: “Therefore, during agreement iterations, unrelated capsules will become smaller and smaller and the related ones will be remained unchanged.”
The examiner first notes that ¶ [0043] of the present disclosure describes that an inference decision based on few factors has a lower stability score than another inference decision based on more factors.  
The examiner further notes that Shahroudnejad’s identifying an inference decision of “Face prob = 0.1” that exhibits low consistency or inconsistency (e.g., due to the five capsules for the image not being in an appropriate relationship with “consistence parameters”), despite the high probabilities for facial components (e.g., “0.8, 0.9, 0.8, 0.7, and 0.9 shown as the first elements of the facial components in FIG. 4, teaches an inference decision with low stability, and that the low consistency or non-consistency teaches a determined strength of support for an inference decision.)

reevaluating the one or more inference decisions that are identified as having low stability. (Shahroudnejad, FIG. 4 and § 3.1, supra. The examiner notes that after determining the low probability “0.1” due to the low consistency or non-consistency among the five facial features, despite the high facial component probabilities, Shahroudnejad suppresses unrelated features’ outputs by iteratively reducing their output vectors to zero (“0”) until the output has related components.  Therefore, the examiner asserts that Shahroudnejad addresses the problem that “CNNs can not preserve spatial relationship between components” (¶ 1, §3) by iteratively suppressing unrelated feature(s) with its relevance path construction, and that Shahroudnejad’s invoking its relevant path construction for an existing inference decision teaches reevaluating the existing inference decision.)
Bach and Shahroudnejad are analogous art because both pertain to determining relevant features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach’s determining strength of support for inference decision(s) (Bach, supra) to incorporate Shahroudnejad’s teaching of identification of an inference decision with low stability and re-evaluating the identified inference decision based on the determined strength of support (e.g., Shahroudnejad’s “consistence parameters” and/or the presence or absence of an “appropriate relationship with consistence parameters”, supra). The modification provides correct inference decision and explainability for critical decisions where even a single incorrect decision is unacceptable by suppressesing less relevant or irrelevant features with Shahroudnejad’s consistency evaluation for instantiation parameter vector values among layers to squash low-stability decisions (e.g., decisions concerning individual feature(s) that is (are) determined not to be in an appropriate relationship with Shahroudnejad’s consistence parameters) in each iteration for Shahroudnejad’s determination of a correct, critical inference decision and the relevance path without requiring a backward process to construct the relevance path (Shahroudnejad, § 3.1, ¶ 4: “Fig. 4 shows the sets computed based on 5 intermediate component capsules referring to class face (j) in the CC layer. Regarding Item (i), the likelihood part of each of these capsules is relatively high explaining that the input contains all the facial components represented by these 5 component capsules with high probability.  However, the network decision is that there is not a face in the input as the likelihood of the face capsule in the CC layer is relatively low. This can be explained based on the non-consistency among the instantiation parameters (Item (ii)).” § 3.1, left-hand column, ¶ 2: “Therefore, during agreement iterations, unrelated capsules will become smaller and smaller and the related ones will be remained unchanged.  Consequently, introduction of the squashing function results in the coupling coefficients c*j associated with irrelevant capsules to approach zero while coupling coefficient corresponding to the ones responsible for the jth CC to increase. Hence, CapsNets intrinsically construct a relevance path (we refers [sic] to it as the relevance path by agreement concept) which eliminates the need for a backward process to construct the relevance path.” § 5, ¶ 1: “In this paper, we represented the necessity of explainability in deep neural networks especially in critical decisions where a single incorrect decision is even unacceptable.”)

With respect to claim 8, Bach modified by Shahroudnejad teaches the one or more mediums of claim 7, and Bach further teaches: 
wherein the determination of the strength of support for each of the one or more inference decisions is based at least in part on a number of factors upon which each inference decision is supported. (Bach at ¶ [0200]: “If the output of the neural has highly positive scores, then one can expect that most of neuron relevances are positive, too, simply because most neurons are supporting the highly positive prediction of the neural net, and therefore one can ignore the minor fraction of neurons with negative relevances in practice.” The examiner notes that Bach’s teaching of a higher relevance score for an inference decision results from more neurons supporting the inference decision teaches that the strength of support for the inference decision is based at least in part upon a number of factors such as a number of neurons that positively support the inference decision and hence teaches this limitation.)

With respect to claim 9, Bach modified by Shahroudnejad teaches the one or more mediums of claim 8, and Shahroudnejad further teaches: 
wherein a first inference decision supported by a first number of factors is determined to be more stable than a second inference decision supported by a second (Shahroudnejad, FIG. 4 and § 3.1 cited for claim 7, supra. More specifically, abridged and annotated FIG. 4:

    PNG
    media_image7.png
    264
    439
    media_image7.png
    Greyscale

The examiner notes that in the above input on the left, all features have high component probabilities as shown in FIG. 4, supra, but three of the five facial features do not agree with their parent (face).  Therefore, Shahroudnejad’s iteratively suppressing the respective outputs of such features to zero results in only one related feature remaining for the low prediction probability of 0.1 and hence teaches a low stability inference decision having fewer factors.  In addition, an input such as the one on the right where more facial components agree with their parents, the inference decision will be high and thus teaches an inference decision having more factors than the aforementioned input and hence a high stability.)
Bach and Shahroudnejad are analogous art because both pertain to determining relevant features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach in view of Shahroudnejad to further incorporate Shahroudnejad’s teaching that a first inference decision supported by a (Shahroudnejad, supra). The modification provides correct inferences even for critical decisions where even one incorrect decision is unacceptable by determining whether an appropriate relationship exists among the factors and by using a squashing function that results in an higher stability inference decision supported by more factors due to the squashing function’s iteratively increasing the coupling coefficients associated with relevant factors and further results in a lower stability inference decision supported by fewer factors due to the squashing function’s iteratively decreasing the coupling coefficients associated with irrelevant factors to zero (Shahroudnejad, § 3.1, ¶ 4: “Regarding Item (i), the likelihood part of each of these capsules is relatively high explaining that the input contains all the facial components represented by these 5 component capsules with high probability.  However, the network decision is that there is not a face in the input as the likelihood of the face capsule in the CC layer is relatively low. This can be explained based on the non-consistency among the instantiation parameters (Item (ii)).”  § 3.1, left-hand column, ¶ 2: “Therefore, during agreement iterations, unrelated capsules will become smaller and smaller and the related ones will be remained unchanged.  Consequently, introduction of the squashing function results in the coupling coefficients c*j associated with irrelevant capsules to approach zero while coupling coefficient corresponding to the ones responsible for the jth CC to increase. Hence, CapsNets intrinsically construct a relevance path (we refers [sic] to it as the relevance path by agreement concept) which eliminates the need for a backward process to construct the relevance path.” § 5, ¶ 1: “In this paper, we represented the necessity of explainability in deep neural networks especially in critical decisions where a single incorrect decision is even unacceptable.”)

With respect to claim 10, Bach modified by Shahroudnejad teaches the he one or more mediums of claim 7, and Shahroudnejad further teaches: 
wherein reevaluating the one or more inference decisions that are identified as having low stability includes re-performing the inference for the one or more inference decisions. (Shahroudnejad at FIG. 4 and § 3.1 cited above in claim 7. The examiner notes that after determining the low probability of 0.1 despite the high facial component probabilities, Shahroudnejad’s CapsNet suppresses unrelated features’ outputs by iteratively reducing their output vectors to zero (“0”) so that the output has fewer and fewer components.  Therefore, the examiner asserts that Shahroudnejad’s CapsNet addresses the problem that “CNNs can not preserve spatial relationship between components” (¶ 1, §3) by suppressing unrelated feature(s) with its relevance path construction, and that Shahroudnejad’s invoking its relevant path construction for an existing inference decision by agreement between features (e.g., eye, nose, mouth) and their parent feature (e.g., face) teaches re-performing the existing inference decision.)
Bach and Shahroudnejad are analogous art because both pertain to determining relevant features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach in view of Shahroudnejad to further incorporate Shahroudnejad’s teaching of re-performing the inference for an identified inference decision (e.g., Shahroudnejad’s “consistence parameters” and/or the presence or absence of an “appropriate relationship with consistence parameters”, supra). In addition to considering the probabilities of individual features contributing to the inference decision, the modification provides correct inference decision and explainability for critical decisions where even a single incorrect decision is unacceptable by consistency evaluation for instantiation parameter vector values among layers to squash low-stability decisions (e.g., decisions concerning individual feature(s) that is (are) determined not to be in an appropriate relationship with Shahroudnejad’s consistence parameters) in each iteration for Shahroudnejad’s determination of a correct, critical inference decision and the relevance path, without requiring a backward process to reconstruct the relevance path (Shahroudnejad, § 3.1, ¶ 4: “Regarding Item (i), the likelihood part of each of these capsules is relatively high explaining that the input contains all the facial components represented by these 5 component capsules with high probability.  However, the network decision is that there is not a face in the input as the likelihood of the face capsule in the CC layer is relatively low. This can be explained based on the non-consistency among the instantiation parameters (Item (ii)).”  § 3.1, left-hand column, ¶ 2: “Therefore, during agreement iterations, unrelated capsules will become smaller and smaller and the related ones will be remained unchanged.  Consequently, introduction of the squashing function results in the coupling coefficients c*j associated with irrelevant capsules to approach zero while coupling coefficient corresponding to the ones responsible for the jth CC to increase. Hence, CapsNets intrinsically construct a relevance path (we refers [sic] to it as the relevance path by agreement concept) which eliminates the need for a backward process to construct the relevance path.” § 5, ¶ 1: “In this paper, we represented the necessity of explainability in deep neural networks especially in critical decisions where a single incorrect decision is even unacceptable.”)

With respect to claim 19, Bach teaches a system comprising: (Bach, ¶ [0013]: “According to another embodiment, a system for data processing may have: an apparatus for assigning a relevance score to a set of items as mentioned above, and an apparatus for processing of the set of items or data to be processed and derived from the set of items with adapting the processing depending on the relevance scores.”)
one or more processors to process data; and (Bach, ¶ [0334]: “Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit.”)

(Bach, ¶ [0335]: “The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.”)
wherein the system is to: determine support from one or more features associated with a plurality of layers of a neural network for one or more inference decisions by the neural network; (Bach, FIG. 2A (annotated): 

    PNG
    media_image8.png
    523
    999
    media_image8.png
    Greyscale

The examiner notes that Bach’s re-distributing the relevance percentages (e.g., 5%, 95%, 40%, 55%, 25%, 35%, etc. for neurons in different layers) teaches the claimed support from one or more features associated with a plurality of layers of a neural network, that the classification result (e.g., class 1, class 2, class 3, etc.) teaches one or more inference decisions, and that Bach’s re-distributing the relevance score of a downstream (higher level) feature to one or more of its upstream (lower level) features such as neurons respectively associated with 5% - 95%, 25% - 35% - 20% - 20%, etc.) teaches the above limitation.)

determine a strength of support for each of the one or more inference decisions; (Bach, FIG. 2A, supra. The examiner notes that each of Bach’s re-distributed relevance scores such as 5% - 95%, 25% - 35% - 20% - 20%, etc. teaches a strength of support for each upstream feature for the one or more inference decisions.)

Bach does not appear to explicitly teach: 
identify one or more inference decisions that have low stability based at least in part on the determined strength of support for the one or more inference decisions; and 
reevaluate the one or more inference decisions that are identified as having low stability. 

Shahroudnejad does, however, teach: 
identify one or more inference decisions that have low stability based at least in part on the determined strength of support for the one or more inference decisions; and (Shahroudnejad, FIG. 4: 

    PNG
    media_image5.png
    199
    451
    media_image5.png
    Greyscale

¶ 5, § 3.1 and Eq. (3) in § 3.1 “Relevance Path by Agreement”:  “CapsNet applies non-linear squashing function on output vectors (vj) in each iteration. It actually bounds likelihood of these vectors between 0 and 1, which means that it suppresses small vectors and preserves long vectors in the unit length”

    PNG
    media_image6.png
    67
    374
    media_image6.png
    Greyscale

¶ 6, § 3.1: “Therefore, during agreement iterations, unrelated capsules will become smaller and smaller and the related ones will be remained unchanged.”
The examiner first notes that ¶ [0043] of the present disclosure describes that an inference decision based on few factors has a lower stability score than another inference decision based on more factors.  The examiner further notes that Shahroudnejad’s identifying an inference decision of “Face prob = 0.1” and high probabilities for facial components (e.g., “0.8, 0.9, 0.8, 0.7, and 0.9 shown as the first elements of the facial components in FIG. 4) 4 teaches the above limitation for at least the following reasons.  The examiner notes that CapsNet’s “face prob = 0.1” and/or the component probabilities (e.g., “0.8, 0.9, 0.8, 0.7, and 0.9 above) teach strength of support for an inference decision, and that Shahroudnejad’s CapsNet’s identifying the inference decision (“face prob = 0.1”) based on the aforementioned probability teaches the above limitation.)

(Shahroudnejad, FIG. 4 and § 3.1, supra. The examiner notes that after determining the low probability “0.1” despite the high facial component probabilities, Shahroudnejad suppresses unrelated features’ outputs by iteratively reducing their output vectors to zero (“0”) until the output has related components.  Therefore, the examiner asserts that Shahroudnejad addresses the problem that “CNNs can not preserve spatial relationship between components” (¶ 1, §3) by iteratively suppressing unrelated feature(s) with its relevance path construction, and that Shahroudnejad’s invoking its relevant path construction for an existing inference decision teaches reevaluating the existing inference decision.)
Bach and Shahroudnejad are analogous art because both pertain to determining relevant features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach’s determining strength of support for inference decision(s) (Bach, supra) to incorporate Shahroudnejad’s teaching of identification of an inference decision with low stability and re-evaluating the identified inference decision based on the determined strength of support (Shahroudnejad’s “consistence parameters” and/or the presence or absence of an “appropriate relationship with consistence parameters”, supra). The modification provides correct inference decision and explainability for critical decisions where even a single incorrect decision is unacceptable by consistency evaluation for instantiation parameter vector values among layers to squash low-stability decisions (e.g., decisions concerning individual feature(s) that is (are) determined not to be in an appropriate relationship with Shahroudnejad’s Shahroudnejad’s determination of a correct, critical inference decision and the relevance path without requiring a backward process to construct the relevance path (Shahroudnejad, § 3.1, ¶ 4: “Fig. 4 shows the sets computed based on 5 intermediate component capsules referring to class face (j) in the CC layer. Regarding Item (i), the likelihood part of each of these capsules is relatively high explaining that the input contains all the facial components represented by these 5 component capsules with high probability.  However, the network decision is that there is not a face in the input as the likelihood of the face capsule in the CC layer is relatively low. This can be explained based on the non-consistency among the instantiation parameters (Item (ii)).” § 3.1, left-hand column, ¶ 2: “Therefore, during agreement iterations, unrelated capsules will become smaller and smaller and the related ones will be remained unchanged.  Consequently, introduction of the squashing function results in the coupling coefficients c*j associated with irrelevant capsules to approach zero while coupling coefficient corresponding to the ones responsible for the jth CC to increase. Hence, CapsNets intrinsically construct a relevance path (we refers [sic] to it as the relevance path by agreement concept) which eliminates the need for a backward process to construct the relevance path.” § 5, ¶ 1: “In this paper, we represented the necessity of explainability in deep neural networks especially in critical decisions where a single incorrect decision is even unacceptable.”)

With respect to claim 20, it is substantially similar to claim 8 above and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 21, it is substantially similar to claim 10 above and is rejected in the same manner, the same art and reasoning applying. 

Claim(s) 11-12 and 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bach et al. US20180018553A1 with publication date of Jan. 18, 2018 (hereinafter Bach) in view of Shahroudnejad et al., IMPROVED EXPLAINABILITY OF CAPSULE NETWORKS: RELEVANCE PATH BY AGREEMENT (27 Feb. 2018) (hereinafter Shahroudnejad) and further in view of Amini, A., Robust End-to-End Learning for Autonomous Vehicles (June 2018) (hereinafter Amini).
With respect to claim 11, Bach modified by Shahroudnejad teaches the one or more mediums of claim 10, further comprising executable computer program instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising but does not appear to explicitly teach: 
re-performing the inference for the one or more inference decisions including adding perturbations to input data and sampling weights of neurons from statistical distributions. 
Amini does, however, teach: 
re-performing the inference for the one or more inference decisions including adding perturbations to input data and sampling weights of neurons from statistical (Amini, p. 32, § 4.1.1, ¶ 2: “Thus, variational inference (VI) methods have been used to obtain an approximation, q(W), for the posterior to estimate model certainty [22]. From the approximation of the posterior q(W), we obtain a predictive distribution q(Y|X) = f P(Y|X, W)q(W)dW. However, it has also been shown that VI methods can also suffer from prohibitive computational cost [46, 31, 48, 27]. As a result, dropout sampling has emerged as an accurate, computationally efficient way to estimate model uncertainty [51, 19].” p. 32, § 4.1.1, ¶ 3 – p. 33, ¶ 3: p. 32, § 4.1.1, ¶ 2: “Formally, at every training step, t, and for every weight in our neural network, we can sample from a Bernoulli random variable such that 
                
                    
                        
                            z
                        
                        
                            t
                        
                        
                            (
                            w
                            )
                        
                    
                    ~
                    B
                    e
                    r
                    n
                    o
                    u
                    l
                    l
                    i
                    
                        
                            p
                        
                    
                     
                    ∀
                    w
                    ∈
                    W
                
            
Thus, we can obtain a stochastic sample of our weights by multiplying our Bernoulli mask by the weights:                         
                            
                                
                                    W
                                
                                
                                    t
                                
                            
                            =
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    t
                                                
                                                
                                                    (
                                                    w
                                                    )
                                                
                                            
                                            ∙
                                            w
                                        
                                    
                                
                                
                                    w
                                    ∈
                                    W
                                
                            
                        
                    .   Last paragraph, p. 42: “For example, for an image collected from the right camera we perturb the steering wheel angle with a negative number to steer slightly left. On the other hand, if we add a left camera image to the dataset, a positive offset is added to the steering wheel angle to teach the model to steer more to the right. We add all of these three images (center, left, and right) to our dataset for training.” 
The examiner notes that Amini’s sampling every weight in the neural network teaches sampling weights of neurons from statistical distributions, and that Amini’s perturbing the steering wheel slightly left and/or slightly right and adding such image(s) to the training teaches adding perturbations to input data.  The examiner further notes that Shahroudnejad’s § 3.1 and FIG. 4 are cited in claim 10 above to teach re-performing the inference for the inference decision (e.g., the decision having high component probabilities and low face probability as shown in FIG. 4) with its feature suppression in relevance path construction.  Therefore, the examiner asserts that Shahroudnejad modified by Amini teaches the above limitation.)
Bach, Shahroudnejad, and Amini are analogous art because all four pertain to determining relevant features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach in view of Shahroudnejad to incorporate Amini’s sampling weights from statistical distributions and perturbing the input data (see Amini, supra). The modification not only provides an accurate, computationally efficient way to estimate model uncertainty through dropout sampling (Amini, p. 32, § 4.1.1, ¶ 2: “As a result, dropout sampling has emerged as an accurate, computationally efficient way to estimate model uncertainty [51, 19].”) but also injects domain knowledge into the neural network via augmenting training dataset(s) by perturbing the input data in order to teach a model how to recover from various positions (Amini, p. 42, ¶ 2: “Finally, we inject domain knowledge into our network by augmenting the training dataset with images collected from cameras placed approximately 2 feet to the left and right of the main center camera. For these images we correspondingly changed the supervised value to teach the model how to recover from these positions. For example, for an image collected from the right camera we perturb the steering wheel angle with a negative number to steer slightly left. On the other hand, if we add a left camera image to the dataset, a positive offset is added to the steering wheel angle to teach the model to steer more to the right. We add all of these three images (center, left, and right) to our dataset for training.”). 

With respect to claim 12, Bach modified by Shahroudnejad teaches the one or more mediums of claim 10, further comprising executable computer program instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising but does not appear to explicitly teach: 

re-performing the inference with a more compute intensive model of the neural network. 

Amini does, however, teach: 
re-performing the inference with a more compute intensive model of the neural network. (Amini, pp. 31-32, § 4.1, ¶ 1: “Integration of these models into deployable or shared control frameworks necessitates a method for determining predictive uncertainty.” p. 32, § 4.1.1, ¶ 3: “In practice, it is computationally intractable to directly compute the posterior, P(W) from vast, and often noisy, observational data since it involves marginalizing over all possible data in P(WIX). Thus, variational inference (VI) methods have been used to obtain an approximation, q(W), for the posterior to estimate model certainty [22]. From the approximation of the posterior q(W), we obtain a predictive distribution q(YIX) = f P(YIX, W)q(W)dW. However, it has also been shown that VI methods can also suffer from prohibitive computational cost [46, 31, 48, 27]. As a result, dropout sampling has emerged as an accurate, computationally efficient way to estimate model uncertainty [51, 19].”
The examiner notes that both Amini’s stochastic dropout sampling of weights (that multiplies Bernoulli mask by the weights) and the prediction of distribution uncertainty are in addition to the eventual inferencing operations to determine an inference decision and are thus more compute intensive than the inference operation by itself.  Therefore, the examiner asserts that Bach modified by Shahroudnejad and Amini thus teaches the above limitation.)
Bach, Shahroudnejad, and Amini are analogous art because all four pertain to determining relevant features for classification in neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Bach in view of Shahroudnejad to incorporate Amini’s re-performing an inference with a more compute intensive model to estimate certainty of a neural network model that is necessitated by the integration of a neural network model into deployable or shared frameworks (see Amini, supra). The modification not only avoids overfitting but also provides the necessary predictive model uncertainty with a definitive measure of associated confidence with the output (Amini, pp. 31-32, § 4.1, ¶ 1: “While extremely powerful, deep neural networks discussed in Chapter 3 are often trained end-to-end as black boxes, and as such, lack a definitive measure of associated confidence with the output [19, 39]. Integration of these models into deployable or shared control frameworks necessitates a method for determining predictive uncertainty. In the present section, we formally introduce the notion of Bayesian Deep Learning for estimating uncertainty of end-to-end control networks.” P. 32, § 4.1.1, ¶ 4: “Dropout, initially developed to avoid overfitting during training, places independent identically distributed Bernoulli random variables over every neuron to either "drop" or "keep" with some probability [51].”) 

With respect to claim 22, it is substantially similar to claim 11 above and is rejected in the same manner, the same art and reasoning applying. 

With respect to claim 23, it is substantially similar to claim 12 above and is rejected in the same manner, the same art and reasoning applying. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
(a)            Neal, R., BAYESIAN LEARNING FOR NEURAL NETWORKS (1995) teaches, inter alia, various sampling techniques such as Gibbs sampling, sampling using a hybrid Monte Carlo method, sampling using the Langevin and hybrid Monte Carlo methods, etc. for determining relevant and irrelevant inputs of neural networks.  
(b)	Papernot et al., The Limitations of Deep Learning in Adversarial Settings (24 Nov. 2015) teaches, inter alia, formalizing the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between 
(c)	Marino et al., An Adversarial Approach for Explainable AI in Intrusion Detection Systems (2018) teaches an approach to generate explanations for incorrect classifications made by data-driven Intrusion Detection Systems (IDSs) An adversarial approach is used to find the minimum modifications (of the input features) required to correctly classify a given set of misclassified samples. The magnitude of such modifications is used to visualize the most relevant features that explain the reason for the misclassification. The presented methodology generated satisfactory explanations that describe the reasoning behind the mis-classifications, with descriptions that match expert knowledge. The advantages of the presented methodology are: 1) applicable to any classifier with defined gradients. 2) does not require any modification of the classifier model. 3) can be extended to perform further diagnosis (e.g. vulnerability assessment) and gain further understanding of the system.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
 
 
 
/E.C.T./Examiner, Art Unit 2126    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126