DETAILED ACTION
This office action is in response to the application filed on 01/31/2020.
Claims 1-20 are pending.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 9, as is, does not include “instructions” as a part of the claimed system and anticipated by any reference disclosing a system having at least one memory and at least one processor. Applicant is suggested to amend the claim to include “instructions” by replacing “configured to store” with --storing--.    

Information Disclosure Statement
The information disclosure statements filed 1/31/2020 has been placed in the application file and the information referred to therein has been considered.

Drawings
The drawings filed on January 31, 2020 are accepted by the Examiner.

Examiner’s Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirely as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Coppa (Coppa et al., US2021/0011837A1) in view of Spieker (Spieker et al., “Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration”, 2017).
With respect to claims 1, 9, and 17, Coppa discloses:
A system (i.e., Fig.1:1-4-128, “Fuzzer”, “Processor”, “memory”, etc.), a non-transitory computer readable medium with instructions (i.e., Fig.1:108-116 – “Processing circuit”, “Memory”) to perform a method comprising: 
selecting a fuzzer (i.e., “Fuzzer”, see Fig.2, item 104) for execution by each of multiple fuzzing clients (i.e., “electronic Device” 128 – Coppa discloses one fuzzing client as example and also discloses the multiple fuzzing clients/UUTs can be extended, see Fig.2, item 128; also see paragraph [0018],  “systems and methods in accordance with the inventive concepts disclosed herein can extend the use of fuzzing to a variety of UUTs) during a first 5time interval of a fuzzing test of computer software code (i.e., “predetermined duration of time” -see paragraph [0076], “The UUT can be monitored for feedback for a predetermined duration of time subsequent to the test signal being provided to the UUT” and paragraph [0018], “Some fuzzers operate in an open loop manner, in which the fuzzer generates tests cases and provides those tests cases to the UUT”); 
selecting a feedback type (i.e., “type of parameter”) for statistics to be reported by the fuzzing clients at an end of the first time interval of the fuzzing test (see Fig.5:510 – “Monitor Feedback from UUT” and paragraph [0018], “extend the use of fuzzing to a variety of UUTs, such as embedded software, UUTs that are limited in the types of parameters that can be measured…detect feedback regarding electronic devices even if the electronic devices are operating as black box devices” and paragraph [0016], “monitor at least one parameter of the electronic device during a time period subsequent to the test signal being provided to the electronic device, determine, based on the at least one parameter, a detected response of the electronic device to the first test signal”)); 
providing an identification of the fuzzer and the feedback type to each of the fuzzing clients (i.e., Fig.2:216 and 220, “Test Signal” and “Feedback” and Fig.5:505-515. Notes: feedback sent to fuzzer using message from fuzzing client/device discloses the fuzzing client has the provided identification/location of the fuzzer in order to send the message to fuzzer identified by the identification/location);  
10obtaining the statistics at the end of the first time interval of the fuzzing test (i.e., Fig.2, step 220 “Feedback” – statistics information about electronic device, and paragraph [0032], “the fuzzer 104, which can include a response model 204 …The response model 204 can receive feedback 220 regarding the electronic device 128”.); 
determining [one or more rewards] based on the statistics (i.e., Coppa discloses “detected response in feedback”, but not explicitly disclose a reward. see Fig.5:515-520 – generated expected response based on the detected response in feedback; and paragraph [0083], “determine an expected response of the UUT to the test signal. In some embodiments, the response model applies one or more rules, policies, filters, functions, weights, or heuristics to the test signal to generate the expected response. The response model can determine the expected response to include at least one parameter of the UUT that the fuzzer monitors using the feedback received from the UUT”) and 
adjusting multiple weights in multiple stochastic policies based on [the one or more rewards], wherein the weights are used to determine the fuzzer and the feedback type in a subsequent interval of the fuzzing test (i.e. weights – difference between detected response and expected response of the monitored parameters. See  paragraph [0018], “Systems and methods in accordance with the inventive concepts disclosed herein can identify feedback in addition to crash events, and can effectively use such feedback to identify interesting test cases and determine how to generate further test case signals based on the feedback”. Also see Fig.5:525 – “Update Test Signal Based on Expected Response and Detected Response”, and paragraph [0087], “updating the test signal based on at the detected response and the expected response. For example, the test signal can be updated (e.g., a new test signal generated) based on a difference between the detected response and the determined response.”. Notes: determining and updating/generating further test case based on the feedback/detected and expected response discloses the generation of the subsequent/further fuzzing test, and the weights can be the difference between the detected response and the determined response).  
Coppa discloses receiving and determining detected response/feedback, but does not explicitly disclose determining the detected response/feedback are one or more rewards.
Spieker discloses the reward as the feedback from previous test and uses the reward/feedback for test case prioritization and selection (i.e., p.14, right column, section 3.1, “In RL, an agent interacts with its environment by perceiving its state and selecting an appropriate action, either from a learned policy or by random exploration of possible actions. As a result, the agent receives feedback in terms of rewards, which rate the performance of its previous action.”).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate Spieker into Coppa for determining and rating the previous test/actions. One would have been motivated to do so to receive and determine the feedback/rewards and for adapting experience and policy for future action as suggested by Spieker (see, p.14, right column, “With the test execution results, i.e., the test verdicts, a reward is calculated and fed back to the agent. From this reward, the agent adapts its experience and policy for future actions”).


With respect to claims 2, 10, and 18, Coppa discloses:
 wherein the multiple stochastic policies comprise: 
a first stochastic policy associated with multiple fuzzers including the selected fuzzer (i.e., paragraph [0033], “the response model 204 applies one or more rules, policies, filters, functions, weights, or heuristics to the test signal 216 to generate the expected response”); and 
 20a second stochastic policy associated with multiple feedback types including the selected feedback type (i.e., selected feedback type – “type of parameters”. See paragraph [0018], “UUTs that are limited in the types of parameters that can be measured” and  paragraph [0033], “the response model 204 applies one or more rules, policies, filters, functions, weights, or heuristics to the test signal 216 to generate the expected response. The response model 204 can determine the expected response to include at least one parameter of the electronic device 128 that the fuzzer 104 monitors using feedback 220”).  

With respect to claims 3, 11, and 19, Coppa discloses:
wherein: each of the multiple fuzzers is associated with one of the multiple weights in the first stochastic policy (i.e., weights – difference between detected response and expected response of the monitored parameters. See paragraph [0047], “the test signal generator 208 can compare one or more parameters of the detected response to one or more corresponding parameters of the expected response, and generate the test signal 216 based on the difference”); and 
each of the multiple feedback types is associated with one of the multiple weights in the second stochastic policy (i.e., feedback types – types of parameter to be monitored by fuzzer. See paragraph [0039], “The at least one parameter can include various parameters regarding the electronic device 128 described herein, including direct or indirect observables such as message replies, state information, RF emissions, power draw, temperature, heat generation, processor loading, message latencies, reboot events, and system side-effects, among others. The fuzzer 104 can perform various parsing processes to identify the observables from the feedback 220.”).
 
With respect to claims 4, 12, and 20, Coppa discloses:
the fuzzer is selected based on a first fuzzer weight among the multiple weights in the first stochastic policy (i.e., a first fuzzer weight – one of difference between detected response and expected response. See paragraph [0047], “The test signal generator 208 can generate the test signal 216 based on at least one of the detected response determined via feedback 220 and the expected response determined by the response model 204, such as based on a difference between the detected response and the determined response.”); and 
the feedback type is selected based on a first feedback type weight among the multiple weights in the second stochastic policy (i.e., a first feedback type weight – monitored parameter difference. See paragraph [0018], “UUTs that are limited in the types of parameters that can be measured” and  paragraph [0033], “the response model 204 applies one or more rules, policies, filters, functions, weights, or heuristics to the test signal 216 to generate the expected response. The response model 204 can determine the expected response to include at least one parameter of the electronic device 128 that the fuzzer 104 monitors using feedback 220”).  

With respect to claims 5 and 13, Coppa discloses:
wherein the multiple fuzzers comprise two or more of a random data generator, a data mutator, and a generational fuzzer (i.e., paragraph [0032], “the fuzzer 104, which can include a response model 204 and a test signal generator 208” and  paragraph [0074], “The test signal can be provided by a fuzzer…The test signal can include invalid, unexpected, or random data..  .”).

10 With respect to claims 6 and 14, Coppa discloses:
 wherein the multiple feedback types comprise two or more of node coverage, edge coverage, branch taken/not taken coverage, number of tests executed, number of observed errors or exceptions during execution, and average length of a test (i.e., paragraph [0055], “the fuzzer 104 determines a coverage metric, such as a metric of code coverage, using the feedback 220.” , and paragraph [0039], “the response model 204 can identify at least one parameter of the electronic device 128 from the feedback 220 to determine the detected response. The at least one parameter can include various parameters regarding the electronic device 128 described herein, including direct or indirect observables such as message replies, state information, RF emissions, power draw, temperature, heat generation, processor loading, message latencies, reboot events, and system side-effects, among others.”)

15 With respect to claims 7 and 15, Coppa discloses:
during the subsequent interval of the fuzzing test, selecting the fuzzer, selecting the feedback type, providing an identification of the fuzzer and the feedback type to each of the fuzzing clients, obtaining the statistics, determining the one or more rewards, and adjusting the multiple weights in the multiple stochastic policies again (i.e., Fig.5, step 505-525, “providing a test signal to a UUT”, “determining a detected response from the feedback”,  “monitoring feedback from the UUT”, “applying the test signal to a response model to determine an expected response of the UUT to the test signal”, and “updating the test signal based on at the detected response and the expected response”. Also see paragraph [0074-0087], The test signal can be provided by a fuzzer…The test signal can include data to be used by the UUT to execute operations or instructions to be executed by the UUT…the test signal can be updated (e.g., a new test signal generated) based on a difference between the detected response and the determined response…”).

15 With respect to claims 8 and 16, Coppa discloses:
 wherein the multiple weights are adjusted in the multiple stochastic policies [using a policy gradient reinforcement learning algorithm] (i.e., paragraph [0087], “updating the test signal based on at the detected response and the expected response. For example, the test signal can be updated (e.g., a new test signal generated) based on a difference between the detected response and the determined response.”).
Coppa does not explicitly disclose using a policy gradient reinforcement learning algorithm.
However, Spieker discloses the policy gradient reinforcement learning algorithm for test case prioritization and selection (i.e., p.13, left column, section “Reinforcement Learning” and p.14-15, section 3 .1-3.12 – Reinforcement learning for test case prioritization, “using reinforcement learning (RL), called Reinforced Test Case Selection (RETECS)” and Fig.2,  RETECS uses test execution results for learning test case prioritization (solid boxes: Included in RETECS, dashed boxes: Interfaces to the CI environment) – blocks :”Reinforcement Learning Policy”, “Selection & Scheduling”, “Test Execution”, “Evaluation”).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate Spieker’s policy gradient reinforcement learning algorithm into Coppa. One would have been motivated to do so to use the well-tuned policy gradient reinforcement learning algorithm to “learn from its experience of the execution environment and make progressively improvement as suggested by Spieker (i.e., col.13, left column, “Reinforcement Learning” – “Reinforcement learning is well-tuned to design an adaptive method capable to learn from its experience of the execution environment. By adaptive, it is meant, that our method can progressively improve its efficiency from observations of the effects its actions have”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Panikkar et al., (US2022/0156168A1) discloses a method to select test script/case for testing using weights and rewards and fuzzy.
Lin et al., (US2021/0141715A1) discloses a method for produce fuzzing data based on weights.
Dhillon et al., “Reinforcement Learning for Fuzzing Testing Techniques”, discloses reinforcement learning for fuzzing testing including using machine-learned model to generate fuzz test input data and provide fuzz test input data for fuzz test.
Patra et al., “Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data” discloses a method for fuzz testing for application using learning model generated data as inputs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENG WEI whose telephone number is (571)270-1059 and Fax number is (571) 270-2059.  The examiner can normally be reached on M-F 9:00AM-5:00PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S. Sough can be reached on 571-272-6799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Any inquiry of a general nature of relating to the status of this application or proceeding should be directed to the TC 2100 Group receptionist whose telephone number is 571- 272-1000.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
 
/Z.W/Examiner, Art Unit 2192                                                                                                                                                                                                        
/S. SOUGH/SPE, AU 2192/2194