DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to the Applicant’s submission filed 19 December 2019, where:
Claims 1-12 are pending.
Claims 1-12 are rejected.
Information Disclosure Statement
3.	Information disclosure statements were submitted on 19 December 2019 and 20 November 2020. The submissions comply with the provisions of 37 CFR 1.97. Accordingly, the Examiner considered the information disclosure statements.
Drawings
4.	The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 
Reference 128 in Figure 1.
Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the Examiner, the Applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
5.	Claims 1, 5, and 9 are objected to because of the following informalities:  
Each of claims 1, 5, and 9 recite the acronym “MSE3” without expanding the first instance of the acronym in the claim followed by a parenthetical containing the acronym.
Appropriate correction is required.
Claim Rejections - 35 U.S.C. § 112(b)
6.	The following is a quotation of 35 U.S.C. § 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
7.	Claims 1-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, line 16, claim 5, line 19, and claim 9, line 18, each recite “the delta Failsafe rewards.” There is insufficient antecedent basis for this limitation in these claims.
Claims 2-4 depend directly or indirectly from claim 1. Claims 6-8 depend directly or indirectly from claim 5. Claims 10-12 depend directly or indirectly from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9, respectively.
8.	Claims 1-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, lines 20-21, claim 5, lines 23-24, and claim 9, lines 22-23, each recite “the target percent belief trustworthiness.” There is insufficient antecedent basis for this limitation in these claims.
Claims 2-4 depend directly or indirectly from claim 1. Claims 6-8 depend directly or indirectly from claim 5. Claims 10-12 depend directly or indirectly from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9, respectively. 
9.	Claims 1-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, lines 20-21, claim 5, lines 23-24, and claim 9, lines 22-23, each recite “the target percent belief trustworthiness.” There is insufficient antecedent basis for this limitation in these claims.
Claims 2-4 depend directly or indirectly from claim 1. Claims 6-8 depend directly or indirectly from claim 5. Claims 10-12 depend directly or indirectly from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9, respectively.
10.	Claims 3, 4, 7, 8, 11, and 12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claims 3, 7, and 11 each recite “the two most extreme states' rewards.” There is insufficient antecedent basis for this limitation in these claims. For facilitating examination, will be considered to read “an identified two of the Failsafe rewards of respective states.”
Claim 4 depends from claim 3. Claim 8 depends from claim 7. Claim 12 depends from claim 11. Claims 4, 8, and 12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 3, 7, and 11, respectively.
11.	Claims 1-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, line 4, recites a singular “an initial Failsafe reward parameter,” while at line 11, recites a plurality of “the Failsafe rewards,” and then at line 13, recites a lower case plurality of “failsafe rewards.” It is unclear “the Failsafe rewards” are intended draw antecedence from the singular “an initial Failsafe reward parameter,” or whether the element is intended to introduce a plurality of “Failsafe rewards” apart from the singular “an initial Failsafe reward parameter,” and still further whether the lower case plurality of “failsafe rewards” is intended to draw antecedence from the capitalized “Failsafe rewards” or intended as a separate term with regard to “Failsafe rewards.” 
Claim 5, line 7, recites a singular “an initial Failsafe reward parameter,” while at line 14, recites a plurality of “the Failsafe rewards,” and then at line 16, recites a lower case plurality of “failsafe rewards.” It is unclear “the Failsafe rewards” are intended draw antecedence from the singular “an initial Failsafe reward parameter,” or whether the element is intended to introduce a plurality of “Failsafe rewards” apart from the singular “an initial Failsafe reward parameter,” and still further whether the lower case plurality of “failsafe rewards” is intended to draw antecedence from the capitalized “Failsafe rewards” or intended as a separate term with regard to “Failsafe rewards.”
Claim 9, line 6, recites a singular “an initial Failsafe reward parameter,” while at line 13, recites a plurality of “the Failsafe rewards,” and then at line 15, recites a lower case plurality of “failsafe rewards.” It is unclear “the Failsafe rewards” are intended draw antecedence from the singular “an initial Failsafe reward parameter,” or whether the element is intended to introduce a plurality of “Failsafe rewards” apart from the singular “an initial Failsafe reward parameter,” and still further whether the lower case plurality of “failsafe rewards” is intended to draw antecedence from the capitalized “Failsafe rewards” or intended as a separate term with regard to “Failsafe rewards.”
Claims 2-4 depend directly or indirectly from claim 1. Claims 6-8 depend directly or indirectly from claim 5. Claims 10-12 depend directly or indirectly from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9, respectively.
12.	Claims 1-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, line 13, claim 5, line 16, claim 9, line 15, each recite the limitation of “a change in failsafe rewards is computed prior to each iteration.” The limitation is indefinite because it is unclear as to what the computed “change” is derived with respect to every possible reward of each state (that is, belief) of the POMDP, with respect to the “initial Failsafe reward parameter,” or with respect to some other basis.
Claims 2-4 depend directly or indirectly from claim 1. Claims 6-8 depend directly or indirectly from claim 5. Claims 10-12 depend directly or indirectly from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9, respectively.
13.	Claims 1-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, line 22, claim 5, line 25, and claim 9, line 24, each recite the term “a lowest MSE3 value” in each of is a relative term which renders the claim indefinite. The term “a lowest MSE3 value” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
Claims 2-4 depend directly or indirectly from claim 1. Claims 6-8 depend directly or indirectly from claim 5. Claims 10-12 depend directly or indirectly from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9, respectively.
14.	Claims 3, 4, 7, 8, 11 and 10 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claims 3, 7, and 11 each recite the term “the two most extreme states' rewards” in each of claims 3, 7, and 11, and is a relative term which renders the claim indefinite. The term “the two most extreme states’ rewards” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
Claim 4 depends from claim 3. Claim 8 depends from claim 7. Claim 12 depends from claim 11. Claims 4, 8, and 12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 3, 7, and 11, respectively.
15.	Claims 3, 4, 7, 8, 11 and 10 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, lines 15-16, claim 5, lines 18-19, and claim 9, lines 17-18, each recite the following limitation: 
“if any element has a change greater than a first predetermined value ∈1,”
This limitation is considered to be unclear and indefinite because the use of the word “if” creates a rebuttable presumption that the limitation “then the delta Failsafe rewards are modified and the iteration is rerun with the new reward values” is only executed upon “a change greater than a first predetermined value ∈1”; however, if this condition does not occur in the respective claim, then none of the further steps/limitations in the claim are executed. 
For purpose of examination, Examiner interprets the limitation “if any element has a change greater than a first predetermined value ∈1” as the condition having occurred. Clarification is required.
Claims 2-4 depend from claim 1. Claims 6-8 depend from claim 5. Claims 10-12 depend from claim 9. Claims 2-4, 6-8, and 10-12 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 5, and 9.
16.	Claims 5-12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Each of the claims 5-12 recite both an apparatus and a process of using the apparatus. For example, claim 5 recites “[a] system comprising a processor and logic stored in one or more nontransitory, computer-readable, tangible media that . . . implement a method of determining a Failsafe iteration solution of a Partially Observable Markov Decision Process (POMDP) model . . . .” Also, claim 9 recites “[a] non-transitory computer readable media comprising instructions stored thereon that, when executed by a system comprising a processor that, when executed by the processor, causes the processor to implement a method of determining a Failsafe iteration solution of a Partially Observable Markov Decision Process (POMDP) model . . . .” 
Claims 6-8, which depend from claim 5, and claims 10-12, which depend from claim 9, also recite both the apparatus and the process of using the apparatus of claims 5 and 9, respectively.
When both an apparatus and a method are claimed in the same claim it is unclear whether direct infringement arises when the apparatus is constructed or when the apparatus is used. Therefore the claims have an indefinite scope.
Claim Rejections - 35 U.S.C. § 101
17.	35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
19.	Claims 1-12 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites a “computer-implemented method,” which is a process, and thus one of the four statutory categories of patentable subject matter. However, claim 1 further recites the limitations of “defining an initial Failsafe reward parameter,” “defining a Failsafe Percent Belief Trustworthiness Target parameter,” “analyzing the resulting policy . . . for each state,” “iteratively adjusting the Failsafe rewards,” “wherein, after each iteration, a realized percent belief trustworthiness for each state is compared to that of a prior iteration and if any element has a change greater than a first predetermined value ∈1, then the delta Failsafe rewards are modified,” “wherein the method continues until a change in each state's percent belief trustworthiness is less than a second predetermined value ∈2,” and “wherein an iteration achieving a lowest MSE3  value is selected as the Failsafe iteration solution.” These limitations recite a mental process, one of the groupings of abstract ideas, (MPEP § 2106.04(a)(2)), which include observation, evaluation, judgment, opinion. (see MPEP § 2106.04(a)(2), subsection III). The claim additionally recites limitations of “wherein a change in failsafe rewards is computed prior to each iteration,” and “wherein, at each iteration, an MSE3 value of each state's distance from the target percent belief trustworthiness is calculated,” each of which is a limitation that falls within a mathematical concept, that is including mathematical relationships, mathematical formulas or equations, mathematical calculations. (see MPEP § 2106.04(a)(2), subsection I). Thus, claim 1 recites an abstract idea.
The abstract idea of claim 1 is not integrated  into a practical application, because the other additional elements beyond the identified judicial exception recited in the claim are that of a computer-implemented method. Applying the abstract idea on a generic computer component (i.e., a computer-implemented method) does not represent integrating the abstract idea to produce a practical application. (MPEP § 2106.04(d)). Also, the additional limitations are directed to “executing the POMDP model . . . resulting in a policy,” “re-executing the POMDP model a predetermined number M of iterations,” and “if any element has a change greater than a first predetermined value ∈1, . . . the iteration is rerun with the new reward values,” which terms of “executing,” “re-executing,” and/or “rerun” a POMDP model are merely words to "apply it" with the judicial exception, or merely using a computer as a tool to perform an abstract idea. (MPEP § 2106.05(f)). Further, generally linking the abstract idea to the intended use of determining a Failsafe iteration solution in a POMDP model representing “Failsafe reward parameters” and “Failsafe Percent Belief Trustworthiness Target parameter” is a field-of-use limitation. “Linking the use of a judicial exception to particular technological environment or field of use," (MPEP 2106.04(d)), cannot integrate the judicial exception into a practical application. Therefore, claim 1 is directed to the abstract idea.
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use (that is, specifying the intended use of the defined “Failsafe reward parameters” and “Failsafe Percent Belief Trustworthiness Target parameter”) does not provide an inventive concept. (MPEP § 2106.05(h)). Also, execution on generic computer components cannot provide significantly more than the abstract idea itself. (MPEP § 2106.05(d)). Also further, there is no nexus between the field-of-use and generic computer components which, when taken in combination, could provide an inventive concept nor significantly more than an abstract idea. Therefore, claim 1 is subject-matter ineligible.
Claim 5 recites a “system,” which is an apparatus, and thus one of the four statutory categories of patentable subject matter. However, claim 5 further recites the limitations of “defining an initial Failsafe reward parameter,” “defining a Failsafe Percent Belief Trustworthiness Target parameter,” “analyzing the resulting policy . . . for each state,” “iteratively adjusting the Failsafe rewards,” “wherein, after each iteration, a realized percent belief trustworthiness for each state is compared to that of a prior iteration and if any element has a change greater than a first predetermined value ∈1, then the delta Failsafe rewards are modified,” “wherein the method continues until a change in each state's percent belief trustworthiness is less than a second predetermined value ∈2,” and “wherein an iteration achieving a lowest MSE3  value is selected as the Failsafe iteration solution.” These limitations recite a mental process, one of the groupings of abstract ideas, (MPEP § 2106.04(a)(2)), which include observation, evaluation, judgment, opinion. (see MPEP § 2106.04(a)(2), subsection III). The claim additionally recites limitations of “wherein a change in failsafe rewards is computed prior to each iteration,” and “wherein, at each iteration, an MSE3 value of each state's distance from the target percent belief trustworthiness is calculated,” each of which is a limitation that falls within a mathematical concept, that is including mathematical relationships, mathematical formulas or equations, mathematical calculations. (see MPEP § 2106.04(a)(2), subsection I). Thus, claim 5 recites an abstract idea.
The abstract idea of claim 5 is not integrated  into a practical application, because the other additional elements beyond the identified judicial exception recited in the claim are that of a system comprising (a) a processor and (b) logic configured to store a plurality of instructions that, when executed by the processor, causes the processor to implement a method, and (c) one or more nontransitory, computer-readable, tangible media. Instructions to apply the abstract idea on generic computer components (i.e., (a) a processor and (b) logic configured to store a plurality of instructions that, when executed by the processor, causes the processor to implement a method, and (c) one or more nontransitory, computer-readable, tangible media) does not represent integrating the abstract idea to produce a practical application. (MPEP § 2106.04(d)). Also, the additional limitations are directed to “executing the POMDP model . . . resulting in a policy,” “re-executing the POMDP model a predetermined number M of iterations,” and “if any element has a change greater than a first predetermined value ∈1, . . . the iteration is rerun with the new reward values,” which terms of “executing,” “re-executing,” and/or “rerun” a POMDP model are merely words to "apply it" with the judicial exception, or merely using a computer as a tool to perform an abstract idea. (MPEP § 2106.05(f)). Further, generally linking the abstract idea to the intended use of determining a Failsafe iteration solution in a POMDP model representing “Failsafe reward parameters” and “Failsafe Percent Belief Trustworthiness Target parameter” is a field-of-use limitation. “Linking the use of a judicial exception to particular technological environment or field of use," (MPEP 2106.04(d)), cannot integrate the judicial exception into a practical application. Therefore, claim 5 is directed to the abstract idea.
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use (that is, specifying the intended use of the defined “Failsafe reward parameters” and “Failsafe Percent Belief Trustworthiness Target parameter”) does not provide an inventive concept. (MPEP § 2106.05(h)). Also, execution on generic computer components cannot provide significantly more than the abstract idea itself. (MPEP § 2106.05(d)). Also further, there is no nexus between the field-of-use and generic computer components which, when taken in combination, could provide an inventive concept nor significantly more than an abstract idea. Therefore, claim 5 is subject-matter ineligible.
Claim 9 recites a “computer readable media,” which is a product, and thus one of the four statutory categories of patentable subject matter. However, claim 9 further recites the limitations of “defining an initial Failsafe reward parameter,” “defining a Failsafe Percent Belief Trustworthiness Target parameter,” “analyzing the resulting policy . . . for each state,” “iteratively adjusting the Failsafe rewards,” “wherein, after each iteration, a realized percent belief trustworthiness for each state is compared to that of a prior iteration and if any element has a change greater than a first predetermined value ∈1, then the delta Failsafe rewards are modified,” “wherein the method continues until a change in each state's percent belief trustworthiness is less than a second predetermined value ∈2,” and “wherein an iteration achieving a lowest MSE3  value is selected as the Failsafe iteration solution.” These limitations recite a mental process, one of the groupings of abstract ideas, (MPEP § 2106.04(a)(2)), which include observation, evaluation, judgment, opinion. (see MPEP § 2106.04(a)(2), subsection III). The claim additionally recites limitations of “wherein a change in failsafe rewards is computed prior to each iteration,” and “wherein, at each iteration, an MSE3 value of each state's distance from the target percent belief trustworthiness is calculated,” each of which is a limitation that falls within a mathematical concept, that is including mathematical relationships, mathematical formulas or equations, mathematical calculations. (see MPEP § 2106.04(a)(2), subsection I). Thus, claim 9 recites an abstract idea.
The abstract idea of claim 9 is not integrated  into a practical application, because the other additional elements beyond the identified judicial exception recited in the claim are that of (a) a non-transitory computer readable media, and (b) a system comprising a processor that executes instructions stored on the media to implement a method. Instructions to apply the abstract idea on generic computer components (i.e., (a) a non-transitory computer readable media and (b) a system comprising a processor) does not represent integrating the abstract idea to produce a practical application. (MPEP § 2106.04(d)). Also, the additional limitations are directed to “executing the POMDP model . . . resulting in a policy,” “re-executing the POMDP model a predetermined number M of iterations,” and “if any element has a change greater than a first predetermined value ∈1, . . . the iteration is rerun with the new reward values,” which terms of “executing,” “re-executing,” and/or “rerun” a POMDP model are merely words to "apply it" with the judicial exception, or merely using a computer as a tool to perform an abstract idea. (MPEP § 2106.05(f)). Further, generally linking the abstract idea to the intended use of determining a Failsafe iteration solution in a POMDP model representing “Failsafe reward parameters” and “Failsafe Percent Belief Trustworthiness Target parameter” is a field-of-use limitation. “Linking the use of a judicial exception to particular technological environment or field of use," (MPEP 2106.04(d)), cannot integrate the judicial exception into a practical application. Therefore, claim 9 is directed to the abstract idea.
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use (that is, specifying the intended use of the defined “Failsafe reward parameters” and “Failsafe Percent Belief Trustworthiness Target parameter”) does not provide an inventive concept. (MPEP § 2106.05(h)). Also, execution on generic computer components cannot provide significantly more than the abstract idea itself. (MPEP § 2106.05(d)). Also further, there is no nexus between the field-of-use and generic computer components which, when taken in combination, could provide an inventive concept nor significantly more than an abstract idea. Therefore, claim 9 is subject-matter ineligible.
Claims 2-4, which depend from claim 1, claims 6-8, which depend from claim 5, and claims 10-12, which depend from claim 9, recite limitations are a mental process, one of the groupings of abstract ideas, (MPEP § 2106.04(a)(2)), (claims 2, 6, and 10: adjusting all states' Failsafe rewards only on the first two iterations; claims 3, 7, and 11: . . . only modifying the two most extreme states' rewards on each iteration; and claims 4, 8, and 12: . . . , modifying the delta Failsafe rewards by dividing by a predetermined value). Claims 4, 8, and 12 also recite elements that are a mathematical concept, (see MPEP § 2106.04(a)(2), subsection I), (claims 3, 7, and 11: after the first two iterations, . . . ; claims 4, 8, and 12: . . . , , modifying . . . by dividing by a predetermined value). Other additional limitations merely recite more details or specifics of the abstract idea, such as conditionally applying the limitation, which is merely more specific to the abstract idea. (claim 4, 8, and 12: when any element  has a change greater than the first predetermined value ∈1, . . . ).  . Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, these claims are directed to a judicial exception as discussed in detail above with respect to claims 1, 5, and 9.
Claim Rejections - 35 U.S.C. § 103
20.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
21.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
22.	This application currently names joint inventors. In considering patentability of the claims the Examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the Examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
23.	Claims 1-12 are rejected under 35 U.S.C. § 103 as being unpatentable over Dressel et al., “Efficient Decision-Theoretic Target Localization,” ICAPS (2017) [hereinafter Dressel] in view of US Published Application 20140195475 to Levchuk et al. [hereinafter Levchuk] and US Published Application 20180012137 to Wright et al. [hereinafter Wright].
Regarding claims 1, 5, and 9, Dressel teaches [a] computer-implemented method (Dressel, left column of p. 77, “6 Jammer Localization,” last partial paragraph, teaches [a] UAV also carries a small ODROID computer1 to execute SARISA and POMDP-lite policies) determining a Failsafe iteration solution of a Partially Observable Markov Decision Process (POMDP) model of claim 1, [a] system comprising a processor and logic stored in one or more nontransitory, computer-readable, tangible media that are in operable communication with the processor (Dressel, left column of p. 77, “6 Jammer Localization,” last partial paragraph, teaches [a] UAV also carries a small ODROID computer to execute SARISA and POMDP-lite policies. The ODROlD passes the selected action to the Pixhawk Hight controller over serial. The flight controller executes the action (that is, a system comprising a processor and logic stored in one or more nontransitory, computer readable, tangible media))), the logic configured to store a plurality of instructions that, when executed by the processor, causes the processor to implement a method of determining a Failsafe iteration solution of a partially Observable Markov Decision Process (POMDP) model of claim 5, [a] non-transitory computer readable media comprising instructions stored thereon that, when executed by a system comprising a processor that, when executed by the processor (Dressel, left column of p. 77, “6 Jammer Localization,” last partial paragraph, teaches [a] UAV also carries a small ODROID computer to execute SARISA and POMDP-lite policies (that is, the “ODROID” computer includes a non-transitory computer readable media)), causes the processor to implement a method of determining a Failsafe iteration solution of a Partially Observable Markov Decision Process (POMDP) model of claim 9, the method comprising:
defining an initial Failsafe2 reward parameter (Dressel, right column of p. 70, “1. Introduction,” second full paragraph, teaches belief-dependent rewards (that is, Failsafe reward parameter), providing compact representations for these rewards without adding many actions or α-vectors (that is, “belief-dependent rewards” are defining an initial Failsafe reward parameter));
defining a Failsafe Percent Belief Trustworthiness Target parameter (Dressel, left column of p. 74, “4.2 Upper Bound,” first partial paragraph, teaches [a] dominant α-vector is used to generate a set of belief-value pairs, initializing the upper bound . . . [that] requires no iteration (that is, the “upper bound” is defining a Failsafe Percent Belief Trust worthiness target parameter)
[Examiner notes that though the “reward parameter” and “target parameter” pertain to a “Failsafe” context, these parameters relate to an a transition from belief b to belief b’ given an action a and observation o, which “parameters” have a BRI covering one of a plurality of beliefs (that is, states) of a POMDP; also a “percent belief” is a normalized unit value of a belief state relative to a plurality of belief states (see, e.g., Dressel, Fig. 1 (referring to “b(s1)”)]);
executing  the POMDP model  with the initial Failsafe reward parameter and the Failsafe Percent Belief Trustworthiness Target parameter as input parameters resulting in a policy (Dressel, left column of p. 71, “2.1 POMDPs,” second paragraph, teaches [t]he solution to a POMDP (that is, executing the POMDP model) is a mapping from belief to action [(that is, a POMDP action is “b(s)”)]. The Bellman update for POMDPs is

    PNG
    media_image1.png
    64
    533
    media_image1.png
    Greyscale

[which] is similar to [the MDP equation], where the states are now beliefs. The transition function τ describes the probability of transitioning from b (that is, initial Failsafe reward parameter) to b’ given action a and observation o); Dressel, left column of p. 71, “2.1 POMDPs,” second paragraph, teaches [i]n an [POMDP], a policy π maps [beliefs] to actions. The expected discounted reward starting from [belief b] and polling policy π is called the value of the [belief b] and is denoted Vπ(b). The goal is to find an optimal policy π* that maximizes the value from every [belief] (that is, resulting in a policy)
[Examiner notes that (b, a, o, b’) is a parameter tuple, which is executing the POMDP model with the initial Failsafe reward parameter and the Failsafe Percent Belief Trustworthiness Target parameter)]);
analyzing the resulting policy for Failsafe selection at the Failsafe Percent Belief Trustworthiness Target parameter for each state (Dressel, Table 2 & caption, teaches reward comparison according to “policy π” (that is, analyzing the resulting policy for Failsafe selection):

    PNG
    media_image2.png
    287
    724
    media_image2.png
    Greyscale

Dressel, left column of p. 75, “5.2 RockSample and Rock Diagnosis,” first paragraph, teaches RockSample is commonly used to test the effectiveness of POMDP solvers (that is, analyzing the resulting policy for Failsafe selection at the Failsafe Percent Belief Trustworthiness Target parameter for each state));
iteratively adjusting the Failsafe rewards (Dressel, right column of p. 71, “2.3 POMDP-lite,” third paragraph, teaches [that] POMDP-lite . . .  encourages exploration by augmenting the reward (that is adjusting the Failsafe rewards) . . . with an exploration reward R∈. This reward is the expected l1 divergence between the current belief b and the next belief b’ . . . . At the current belief b, the MDP has . . . [a] reward function RMDP = Rb + λR∈, where λ is a scale factor encoding the preference of information rewards over original rewards. This MDP is solved and the best action from the observed state is taken. An observation is received, the belief is updated, and a new MDP is solved (that is, “updated” is iteratively adjusting)); and
re-executing  the POMDP model (Dressel, right column of p. 71, “POMDP-lite,” third paragraph, teaches [the] MDP is solved and the best action from the observed state is taken. An observation is received, the belief is updated, and a new MDP is solved (that is, “solved” is re-executing the POMDP model)) . . . ,
wherein a change in failsafe rewards is computed prior to each iteration (Dressel, right column of p. 71, “POMDP-lite,” third paragraph, teaches [the] reward is the expected l1 divergence (that is, “divergence” is a change in failsafe rewards) between the current belief b and the next belief b’:

    PNG
    media_image3.png
    133
    656
    media_image3.png
    Greyscale
[and subsequently] a new MDP is solved (that is, a change . . . is computed prior to each iteration)),
Though Dressel teaches POMDP providing a Failsafe condition that suppresses any other policy action while either awaiting a trustworthy belief state or human intervention, Dressel, however, does not explicitly teach -
* * *
wherein, after each iteration, a realized percent belief trustworthiness for each state is compared to that of a prior iteration and if any element has a change greater than a first predetermined value  ∈1, then the delta Failsafe rewards (are modified and the iteration is rerun with the new reward values,
wherein the method continues until a change in each state's percent belief trustworthiness is less than a second predetermined value  ∈2;
* * *
But Levchuk teaches -
* * *
wherein, after each iteration, a realized percent belief trustworthiness for each state is compared to that of a prior iteration and if  any element has a change greater than a first predetermined value ∈1, then the delta Failsafe rewards (are modified and the iteration is rerun with the new reward values,
wherein the method continues until (Levchuk ¶ 0100 teaches [t]he model can be made to stop iterating when a threshold is met such as stopping when no improvement in the objective function of expected reward is obtained (that is, the method continues until); see also Levchuk ¶ 0116 “Stopping the Process”) a change in each state's percent belief trustworthiness is less than a second predetermined value ∈2 (Levchuk ¶ 0114 teaches comparing new belief state (that is, after each iteration, a realized percent belief trustworthiness for each state is compared to that of a prior iteration) to a threshold, where the belief subspace can be defined as “high expertise achieved with probability between 50% (that is, a first predetermined value ∈1) and 80% (that is, a second predetermined value ∈2).” The thresholds of 50% and 80% define the boundaries of the subspace;
[Examiner notes that the BRI of the “first predetermined value ∈1” and the “second predetermined value ∈2” also covers these values when “first predetermined value ∈1” equals the “second predetermined value ∈2”)]),
* * *
Dressel and Levchuk are from the same or similar field of endeavor. Dressel teaches a carryover of MDP characteristics in a POMDP environment for the handling belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. Levchuk teaches modeling a subject's state and the influence of training treatments, or actions, on that state to create a training policy using Partially Observable Markov Decision Process (POMDP) techniques. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine the teachings of Dressel pertaining to POMDP processes for generating belief / untrustworthiness states with the iteration thresholds of Levchuk.
The motivation for doing so is because both state and effects of actions are modeled as probabilistic using Partially Observable Markov Decision Process (POMDP) techniques, which is well suited to decision-theoretic planning under uncertainty. (Levchuk ¶ 0038).
Though Dressel and Levchuk teach the features of POMDP environment a Failsafe condition that suppresses any other policy action while either awaiting a trustworthy belief state or human intervention that is optimized / converged through iterations, the combination of Dressel and Levchuk, however, does not explicitly teach “a predetermined number M of iterations,” and also does not explicitly teach -
* * *
wherein, at each iteration, an MSE3 value of each state's distance from the target percent belief trustworthiness is calculated, and
wherein an iteration achieving a lowest MSE3  value is selected as the . . . iteration solution.
But Wright teaches “iteration” occurs through a predetermined number M of iterations (Wright ¶¶ 0017-18 teaches value iteration where by [substituting the state of π(b)] into the calculation of V(s) gives the combined step:

    PNG
    media_image4.png
    78
    341
    media_image4.png
    Greyscale

where i is the iteration number. Value iteration starts at i=0 and V0 as a guess [(such as a belief)] of the value function. It then iterates, repeatedly computing V,+ 1 for all states s, until V converges).
Wright also teaches -
* * *
wherein, at each iteration, an MSE3 value of each state's distance from the target percent belief trustworthiness is calculated, and
wherein an iteration achieving a lowest MSE3  value is selected as the . . . iteration solution (Wright, Fig. 2, teaches an average MSE of the learned Q functions (that is, at each iteration, an MSE3 value of each state’s distance . . . is calculated). The behavior policy is varied from 90% to 50% (that is, the target percent belief trustworthiness) of optimal simulating conditions of increasing bias. Error bars are standard error:

    PNG
    media_image5.png
    532
    683
    media_image5.png
    Greyscale

Wright ¶ 0386 teaches [e]ach approach was evaluated based on the average MSE of the Q functions after 50 iterations of learning, comparing to the true Q* function, after 50 iterations of learning (sufficient to ensure convergence (that is, “convergence” is an iteration achieving a lowest MSE3 value is selected as the . . . iteration solution)).
Dressel, Levchuk, and Wright are from the same or similar field of endeavor. Dressel teaches a carryover of MDP characteristics in a POMDP environment for the handling belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. Levchuk teaches modeling a subject's state and the influence of training treatments, or actions, on that state to create a training policy using Partially Observable Markov Decision Process (POMDP) techniques. Wright teaches control system and method, in a Markov decision process, that iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine the teachings of the combination of Dressel and Levchuk pertaining to POMDP processes for generating belief / untrustworthiness states based on iteration thresholds with the mean squared error determinations of Wright. 
The motivation to do so is because of the ability to assess an effectiveness of a state-action resulting in a new state. (Wright ¶ 0388).
Examiner notes that the terms "computer-implemented,” “processor," “computer-readable, tangible media” or “computer readable media,” and “logic” recited in Applicant's claims is interpreted to be a well-known hardware structures. 
Examiner notes that the Applicant’s preambles, respectively, do not afford patentable weight to the Applicant’s claims because the claim preambles are not “necessary to give life, meaning, and vitality” to the claim. Moreover, because the Applicant’s preambles merely state the purpose or intended use of the invention rather than any distinct definition of any of the claimed invention’s limitations, the preambles are not considered a limitation and are of no significance to claim construction.
Regarding claims 2, 6, and 10, the combination of Dressel, Levchuk, and Wright teaches all of the limitations of claims 1, 5, and 9, respectively, as described in detail above. 
Wright teaches -
adjusting all states' Failsafe rewards only on the first two iterations (Wright ¶¶ 0017-18 teaches value iteration where by [substituting the state s of a belief b(s)] into the calculation of V(s) gives the combined step:

    PNG
    media_image4.png
    78
    341
    media_image4.png
    Greyscale

where i is the iteration number (that is, when “i is a value of “two,” then Wright teaches adjusting all states’ Failsafe rewards only on the first two iterations)).
Regarding claims 3, 7, and 11, the combination of Dressel, Levchuk, and Wright teaches all of the limitations of claims 2, 6, and 10, respectively, as described in detail above. 
Dressel teaches, as set out above, “adjusting the Failsafe rewards” (Dressel, right column of p. 71, “2.3 POMDP-lite,” third paragraph, teaches [that] POMDP-lite . . .  encourages exploration by augmenting the reward (that is adjusting the Failsafe rewards) . . . with an exploration reward R∈. This reward is the expected l1 divergence between the current belief b and the next belief b’ . . . . At the current belief b, the MDP has . . . [a] reward function RMDP = Rb + λR∈, where λ is a scale factor encoding the preference of information rewards over original rewards. This MDP is solved and the best action from the observed state is taken. An observation is received, the belief is updated, and a new MDP is solved (that is, “updated” is iteratively adjusting)).
Wright teaches -
after the first two iterations (Wright ¶¶ 0017-18 teaches value iteration where by [substituting the state s of a belief b(s)] into the calculation of V(s) gives the combined step:

    PNG
    media_image4.png
    78
    341
    media_image4.png
    Greyscale

where i is the iteration number (that is, “i is a value of “two,” where Wright teaches adjusting all states’ Failsafe rewards only on the first two iterations)), only modifying the two most extreme states' rewards on each iteration (Wright ¶ 0396 teaches [s]etting l=2 Setting l=2 performs comparably to the default full length setting, [Complex Fitting Q-Iteration]-Bγ, which are representatives of the two extremes of the parameter's range (that is, at l=2, the two most extreme states’ rewards on each iteration)).
Dressel, Levchuk, and Wright are from the same or similar field of endeavor. Dressel teaches a carryover of MDP characteristics in a POMDP environment for the handling belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. Levchuk teaches modeling a subject's state and the influence of training treatments, or actions, on that state to create a training policy using Partially Observable Markov Decision Process (POMDP) techniques. Wright teaches control system and method, in a Markov decision process, that iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine the teachings of the combination of Dressel and Levchuk pertaining to POMDP processes for generating belief / untrustworthiness states, including rewards, based on iteration thresholds with the extremes of the parameter's range determinations of Wright. 
The motivation to do so is because of the ability to assess an effectiveness of a state-action resulting in a new state with regard to other forms of iteration. (Wright ¶ 0388).
Regarding claim 4, 8, and 12, the combination of Dressel, Levchuk, and Wright teaches all of the limitations of claims 3, 7, and 11, respectively, as described in detail above. 
Levchuk, as noted above, teaches “when any element has a change greater than the first predetermined value ∈1, . . . .”
Dressel teaches -
. . . modifying the delta Failsafe rewards by dividing by a predetermined value (Dressel, right column of p. 72, “3.2 Threshold Reward,” first paragraph, teaches [a] disadvantage of the max-norm is that the agent always receives some reward, even at uniform belief: Sometimes, we want an agent to reach a highly concentrated belief as quickly as possible, but the agent might he driven by the max-norm reward to collect rewards at less-concentrated beliefs in the near-term. Spaan, Veiga, and Lima suggested thresholded rewards in the POMDP-IR framework, but this requires an additional guess action per state (2014). Our [thresholded rewards in the] ρPOMDP [framework] does not [require an additional guess action per state]:

    PNG
    media_image6.png
    120
    410
    media_image6.png
    Greyscale

where cρ is the max-norm cutoff (that is, a predetermined value). A belief max-norm below cρ induces no reward. Above cρ, the reward increases linearly until it reaches a maximum value of 1 (that is, the “max-norm cutoff cρ” is modifying the delta Failsafe rewards by dividing by a predetermined value). 
Conclusion
24.	The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
(US Published Application 20110214006 to Meek et al.) teaches automated learning of failure recovery policies based upon existing information regarding previous policies and actions, in which the learning mechanism builds a partially observable Markov decision process (POMDP) model, and computes the new policy based upon the learned model. The new policy may perform automatic fault recovery, e.g., on a machine in a datacenter corresponding to the controlled process.
(Chen et al, “POMDP-Based Decision Making for Fast Event Handling in VANETs,” AAAI (2018)) teaches a Partially Observable Markov Decision Process (POMDP) based approach to balance the trade-off between information gathering and exploiting actions resulting in faster responses. Our model copes with malicious behavior by maintaining it as part of a small state space, thus is scalable for large VANETs.
25.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.L.S./
Examiner, Art Unit 2122

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
        
            
    

    
        1 See "Odroid XU4 - Octacore computer inc PSU" WayBack Machine (2017) <https://web.archive.org/web/20170701195526/https://www.odroid.co.uk/odroid-xu4>
        2 “Failsafe [is] defined as: a decision to suspend any policy decision other than itself for prespecified belief trustworthiness rank. In other words, the Failsafe condition suppresses any other policy action while either awaiting a trustworthy belief state or human intervention.” (Specification ¶ 0017); Examiner points out that in general, to execute a policy decision is to “suspend any policy decision other than itself,” and accordingly, the Failsafe is one action (that is, a = π(b)) of many actions in a POMDP.