DETAILED ACTION
This action is in response to claims filed 31 August, 2021 for application 16/120802 filed 04 September, 2018. Currently claims 1, 2, 4-8, 10-12, 14-16, 18, 19, and 21-25 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 4-8, 11, 12, 14-16, 18, 19, 21, 22, 24 and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bournenane et al. (Reinforcement Learning in Multi-Agent Environment and Ant Colony for Packet Scheduling in Routers) in view of Keyhanipour et al. (Integration of data fusion and reinforcement learning techniques for the rank-aggregation problem) and Jung et al. (Performance Models for Large Scale Multiagent Systems: Using Distributed POMDP Building Blocks).


Regarding claims 1, 11 and 18, Bournenane discloses: A computer-implemented method, comprising: 
receiving an initial decision from each of a plurality of agents based on the same criteria, wherein at least a subset of the plurality of agents are computing nodes, and wherein the criteria pertain to one or more resource allocation requests… (“The structure of the model identifies two kinds of agents, their responsibilities, and the way they interact. The structure consists of a scheduler agent that deals with management of queues on the basis of available information (resource capacity) and a resource agent that measures the resource amount and gives this information to the scheduler.” P139 §4.2 ¶1, “Scheduling in this system is done as follows: Before performing action selection and then scheduling the different queues based on their QoS, each scheduler agent sends ants moving downstream to control actual situation. They gather information about the availability of the resource and then return to the sender agent with the information. On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone.” P139 §4.2 ¶2, “In the dynamic environment the scheduler take the actual evolution of the process into account. It is allowed to make the decisions as the scheduling process actually evolves and more information becomes available. For that, we consider at each router an agent that can make decision. This decision-maker collects information gathered by mobile agents and then decides which action to perform after learning the current situation.” P137 §2 ¶1, note: the agent at each router is interpreted to be a computing node); 
assigning an equal weight to computing resources specific to each of the plurality of agents (“On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone. After that the scheduler agent regularly sends ants to reserve the previously found best resource capacity because if the reservation is not refreshed, the pheromone evaporates after a while. From time to time, the scheduler agent sends ants to survey the possible new (and better) amount of this resource. If they find a better measurement, the scheduler agent reserves the resource that is needed for the new schedule and the old reserving information evaporates.” P139 §4.2 ¶2, “The three most important resources allocated by the service discipline are bandwidth (which packets get transmitted), speed (when do the packets get transmitted) and buffer space (which packets are discarded).” P137 §1 ¶1, note: resources are computing resources); 
generating, for each of the initial decisions, a revised decision based at least in part on (i) comparing the initial decisions to each other, (ii) modeling aggregate group behavior represented by the initial decisions of the plurality of agents, and (iii) modeling a behavior of each respective one of the agents in view of the initial decisions of the plurality of agents by applying an interactive … Markov Decision Process (“They gather information about the availability of the resource and then return to the sender agent with the information. On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone. After that the scheduler agent regularly sends ants to reserve the previously found best resource capacity because if the reservation is not refreshed, the pheromone evaporates after a while. From time to time, the scheduler agent sends ants to survey the possible new (and better) amount of this resource. If they find a better measurement, the scheduler agent reserves the resource that is needed for the new schedule and the old reserving information evaporates. Ants are used also to distribute information and make their current state known throughout the system. The scheduler agent is not restricted to send the ants in one direction only. The ants are sent towards various directions which are directly connected with the router containing this scheduler agent. The scheduler agent waits until all the ants arrive back with the gathered information and than decides to keep only the ant with the best information, the others terminate. Also, each agent sends ants to distribute information about its current state to the other agents. Every time an ant arrives at a scheduler agent, it gives a reward according to the information that was investigated. This reward is in form of a belief factor which will allow each scheduler agent in the multi-agent system to make update on their scheduling policies. The belief factor is a function of the synthetic pheromone concentration. It reflects the degree of confidence that an agent will consider on the information established by other agents from the same cooperating group. The belief factor might be useful in situations where the information is not reliable due to changes in the environment.” P139 §4.2 ¶2, “The single MDP and Q-learning are defined for the case where only one action is selected with each iteration. In this case the existence of optimal policy π* is guaranteed. Nevertheless, the formalism can be extended to problems where multiple actions can be carried out simultaneously by several agents [14]. In this way, we consider n agents each one of them having learned the optimal solution from its own MDP. The effort of these n agents in individual learning is combined to learn the joint optimal policy of this multi-agent MDP.” P138 §3.3 ¶1 note: the initial pheromone levels are compared for each agent, the best information is preserved (i), a MDP is an aggregate model (ii), and a multi-agent MDP models behavior (iii).); 
updating the weights assigned to the computing resources specific to each respective one of the plurality of agents, based at least in part on an accuracy of the modeling of the behavior with respect to the revised decisions of each of the other agents (“On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone. After that the scheduler agent regularly sends ants to reserve the previously found best resource capacity because if the reservation is not refreshed, the pheromone evaporates after a while. From time to time, the scheduler agent sends ants to survey the possible new (and better) amount of this resource. If they find a better measurement, the scheduler agent reserves the resource that is needed for the new schedule and the old reserving information evaporates…Every time an ant arrives at a scheduler agent, it gives a reward according to the information that was investigated. This reward is in form of a belief factor which will allow each scheduler agent in the multi-agent system to make update on their scheduling policies. The belief factor is a function of the synthetic pheromone concentration. It reflects the degree of confidence that an agent will consider on the information established by other agents from the same cooperating group. The belief factor might be useful in situations where the information is not reliable due to changes in the environment.” P139 §4.2 ¶2); and 
dynamically allocating computing resources in response to the one or more resource allocation requests based at least in part on the updated weights (“They gather information about the availability of the resource and then return to the sender agent with the information. On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone. After that the scheduler agent regularly sends ants to reserve the previously found best resource capacity because if the reservation is not refreshed, the pheromone evaporates after a while.” P139 §4.2 ¶2); 
wherein the method is carried out by at least one computing device (“The packet scheduling in router plays an important role in the sense to achieve QoS differentiation and to optimize the queuing delay, in particular when this optimization is accomplished on all routers of a path between source and destination. In a dynamically changing environment a good scheduling discipline should be also adaptive to the new traffic conditions.” abstract).

However, Bournenane does not explicitly disclose: for performing a rank aggregation process comprising a plurality of rounds;
for each of the plurality of rounds of the ranking aggregation process, performing:
an -interactive partially observable Markov Decision Process;
wherein the allocating comprises allocating additional computing resources to a given one of the agents in a next round of the rank aggregation process to improve the accuracy of the modeling for the given agent with respect to one or more of the other agents; and 
outputting a finalized decision of the rank aggregation process.

(“The proposed method uses the exploration and exploitation capabilities of reinforcement learning methods besides the compressing ability of data fusion models to provide a novel approach in dealing with the rank-aggregation problem. Generally, in its first step, the proposed method provides a combination of local rankers by utilizing data fusion operators such as min, max, average, weighted average, and optimistic/pessimistic exponential ordered weighted averaging. In the second step, our algorithm builds a MDP representation of the rank-aggregation task.” P1136 §3.5 ¶1);
for each of the plurality of rounds of the ranking aggregation process (p1139 fig 4 “number of iterations = 10,000”), performing:
wherein the allocating comprises allocating additional computing resources to a given one of the agents in a next round of the rank aggregation process to improve the accuracy of the modeling for the given agent with respect to one or more of the other agents (“Considering the dynamic nature of Web information resources as well as the constantly changing users’ information needs, the behavior and quality of the local rankers could be completely dynamic over time. The proposed rank-aggregation method, takes advantage of a merit-based aggregation of the decision of the local rankers based on their MAP values, which indicates their average precision in local ranked lists. The proposed method introduces a Markov decision process model for the rank-aggregation problem and applies reinforcement learning methods in order to learn the above mentioned dynamism of the Web environment. Experimental results show that the proposed approach outperforms baseline algorithms, especially in the high ranks, which are the most attractive part of the ranked lists for Web users.”); and 
outputting a finalized decision of the rank aggregation process (“In this setting, every query is associated with a number of ranked lists. Each ranked list is assumed to be the output of a search engine. The task of rank-aggregation is to prepare a better final ranked list by aggregating the multiple input lists. A row in this dataset indicates a query-document pair. Some sample rows of the utilized benchmark dataset are presented in Table 3.” P1138 ¶2).

Bournenane and Keyhanipour are both in the same field of endeavor of Markov Decision Processes for accessing resources and are analogous. Bournenane discloses a MDP with agents for assigning computing resources. Keyhanipour teaches a rank aggregation with MDP. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine modify the MDP with agents for computing resources as discloses by Bournenane with the rank aggregation MDP implementation as taught by Keyhanipour to yield predictable results. One would have been motivated to combine as Keyhanipour states that the two are complementary systems that are good for exploring an environment and ranking agents (Introduction).

Jung teaches: an -interactive partially observable Markov Decision Process (“Distributed POMDP-based model is an appropriate formal framework to model strategy performance in DCSPs since it has distributed agents and the agentView in DCSPs (other agents’ values, priorities, etc.) can be modeled as observations. In DCSPs, the exact state of the system is only partially observable to an agent since the received information for the agent is limited to its neighboring agents.” P301 §4.1 ¶1).

Bournenane, Keyhanipour and Jung are in the same field of endeavor of multi-agent Markov decision process (MDP) and are analogous. Bournenane teaches a MDP where each agent can make a decision and weight a resource. Keyhanipour teaches a rank aggregation with MDP. Jung teaches that the decisions can be a ranking and that the multiagent MDP can be a partially observable MDP (POMDP). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the decision as taught by Bournenane and Keyhanipour with the ranking decisions as taught by Jung and to implement the MDP of Bournenane with the POMDP as taught by Jung. One would have been motivated as the ranking provides greater flexibility to the agents (Jung p298) the POMDP lets an agent see what its neighbors are doing which is useful for a constrained/shared resource system (Jung p297 §1).

Regarding claims 2, 12 and 19, Bournenane discloses: The computer-implemented method of claim 1, wherein said modeling aggregate group behavior comprises applying a Markov Decision Process model to the initial decisions (“The single MDP and Q-learning are defined for the case where only one action is selected with each iteration. In this case the existence of optimal policy π* is guaranteed. Nevertheless, the formalism can be extended to problems where multiple actions can be carried out simultaneously by several agents [14]. In this way, we consider n agents each one of them having learned the optimal solution from its own MDP. The effort of these n agents in individual learning is combined to learn the joint optimal policy of this multi-agent MDP.” P138 §3.3 ¶1).

Regarding claims 4, 15, and 24, Bournenane discloses: The computer-implemented method of claim 1, wherein the plurality of agents are anonymous to each other (“Ants have interaction only with the environment in a standard and simple way and not with each other.” P139 §4.1 ¶2).

Regarding claims 5, 16 and 25, Bournenane does not explicitly disclose: The computer-implemented method of claim 1, wherein the initial decision of each of the plurality of agents comprises a ranking of a set of items.

Jung teaches: wherein the initial decision of each of the plurality of agents comprises a ranking of a set of items (
    PNG
    media_image1.png
    91
    334
    media_image1.png
    Greyscale
p298 last ¶).

Regarding claims 6 and 14, Bournenane discloses: The computer-implemented method of claim 1, wherein said updating comprises calculating a new weight assigned to the computing resources specific to each of the plurality of agents (“On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone. After that the scheduler agent regularly sends ants to reserve the previously found best resource capacity because if the reservation is not refreshed, the pheromone evaporates after a while. From time to time, the scheduler agent sends ants to survey the possible new (and better) amount of this resource. If they find a better measurement, the scheduler agent reserves the resource that is needed for the new schedule and the old reserving information evaporates.” P139 §4.2 ¶2).

Regarding claim 7 and 21, Bournenane discloses: The computer-implemented method of claim 6, wherein said updating comprises calculating the new weight based on a weighting value assigned to each computing resource (“On the basis of this information the scheduler agent chooses a schedule and sends ants to reserve the needed resources by deposing pheromone. After that the scheduler agent regularly sends ants to reserve the previously found best resource capacity because if the reservation is not refreshed, the pheromone evaporates after a while. From time to time, the scheduler agent sends ants to survey the possible new (and better) amount of this resource. If they find a better measurement, the scheduler agent reserves the resource that is needed for the new schedule and the old reserving information evaporates.” P139 §4.2 ¶2).

Regarding claim 8 and 22, Bournenane discloses: The computer-implemented method of claim 1, wherein said receiving the initial decision comprises receiving the initial decision via a device of each respective agent (“In the dynamic environment the scheduler take the actual evolution of the process into account. It is allowed to make the decisions as the scheduling process actually evolves and more information becomes available. For that, we consider at each router an agent that can make decision. This decision-maker collects information gathered by mobile agents and then decides which action to perform after learning the current situation.” P137 §2 ¶1).

Claims 10 and 23 is rejected under 35 U.S.C. 103 as being unpatentable over Bournenane in view Keyhanipour and Jung and further in view of Jones et al. (US 2008/0253613).

Regarding claims 10 and 23, Bournenane does not explicitly disclose: The computer-implemented method of claim 1, wherein one or more subsets of the plurality of agents comprise human users. 

Jones teaches: wherein one or more subsets of the plurality of agents comprise human users (“It is vital for a cohesive team to have convenient, natural, and quick communication.  In stressful situations, where fast paced coordination of actions is required, humans cannot be encumbered with clumsy communication devices and endless streams of communication from the remote vehicles.  This differs from most multi-agent teams which contain no humans and the agents are able to transmit large amounts of data at will.” [0065], “In order to meet these requirements, the present teachings contemplate borrowing from multi-agent systems (MAS), human-robot interaction, and gesture-based communication.” [0070], Fig 12). 

Bournenane, Keyhanipour, Jung and Jones are in the same field of endeavor of multi-agent Markov decision process (MDP) and are analogous. Bournenane teaches a MDP where each agent can make a decision and weight a resource and wherein the agents are software “ants”. Jones teaches that some of the agents can be human. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the agents as taught by Bournenane, Keyhanipour and Jung with the human agents as taught by Jones. One would have been motivated as multi-agent system with humans as taught by Jones allows humans to be integrated into and make decisions in a multi-agent system (Jones [0065]).

Response to Arguments
Applicant’s arguments, see pp11-13, filed 31 August, 2021, with respect to the rejection of claims 1-20 under 35 USC §101 have been fully considered and are persuasive.  The rejection of claims 1-20 under 35 USC §101 has been withdrawn. 
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The amended limitations are taught by new reference Keyhanipour as detailed in the rejection above. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.                                                                                                                                                                              
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246. The examiner can normally be reached M-F: 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ERIC NILSSON/Primary Examiner, Art Unit 2122