DETAILED ACTION

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 12 - 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticpated by VAN SEIJEN et al (US 2018/0165603).
As to claim 12, VAN SEIJEN et al teaches a system for performing machine learners (paragraph [0071]...agent configurations that decompose tasks in different ways. These agent configurations can reduce an overall state space and allow for improved machine learning performance), comprising:
a central parameter server (paragraph [0156]... aggregator 1604and an environment 1606 ; paragraph [0062]...an Integrated Learning System (ILS), which integrates heterogeneous learning agents (such as search-based and knowledge-based) under a central controller through which the agents critique each other's proposals, may be employed ; paragraph [0331]... at least one processing unit 2402 (e.g., a central processing unit) ; paragraph [0348]...the server device 602 may provide data to and from a client computing device) configured to asynchronously assign processing jobs and manage a set of parameters (paragraph [0150]...a multi-advisory model, a single-agent reinforcement learning task can be partitioned into a multi-agent problem (e.g., using a divide and conquer paradigm). All agents can be placed at a same level and be given advisory roles that include providing an aggregator with local Q-values for each available action. A multi-advisory model can be a generalization of reinforcement learning with ensemble models, allowing for both the fusion of several weak reinforcement learners and the decomposition of a single-agent reinforcement learning problem into concurrent subtasks. In some techniques for combining reinforcement learning and ensemble methods, agents are trained independently and greedily to their local optimality, and are aggregated into a global policy by voting or averaging); and
a plurality of model learners (paragraph [0074]...Agent 1 and agent 2 ; paragraph [0156]... advisors 1602 ; paragraph [0348]...a general computing device 604 (e.g., personal computer), tablet computing device 606, or mobile computing device 608, as described above) in communication with the central parameter server and configured to receive the assigned processing jobs and the set of parameters and to solve a gradient therefrom (paragraph [0062]... In a first approach, each agent learns its own network parameters, while treating the other agents as part of the environment. A second approach uses centralized learning and passes gradients between agents. For fully competitive tasks, which are typically a two-agent case, the agents have opposing goals (e.g., the reward function of one agent is the negative of the reward function of the other)),
wherein the central parameter server Is further configured to set a learning rate for each of the assigned processing jobs that is inversely proportional, to a corresponding degree of staleness (paragraph [0151]...local greedy bootstrapping method, called local-max, presents theoretical shortcoming of inverting a max .SIGMA. into a .SIGMA.max into the global Bellman equation. In practice, this inversion causes some states to become attractors. An attractor is a state where advisors are attracting in every direction equally and where the local-max aggregator's optimal behavior is to remain static).

As to claim 13, VAN SEIJEN et al teaches the system, wherein the central parameter server is further configured to calculate the degree of staleness based on a difference between a state of the set of parameters when each processing job is assigned and a state of the set of parameters when each processing job Is completed paragraph [0193]...for all advisors j, X.sub.j=X, using a state-action-reward-state-action (SARSA) update rule for each advisor with respect to the aggregator's maximizing action can be equivalent to applying Q-learning update rule on the global agent. See Rummery et al. On-line Q-learning using connectionist systems, University of Cambridge, Department of Engineering (1994); and Watkins, Learning from Delayed Rewards, PhD thesis, Cambridge University (1989), both of which are incorporated herein by reference. For example, let .sub.x, denote the aggregator's policy in state x'. The Q-learning update rule for the global agent can be decomposed as follows).

As to claim 14, VAN SEIJEN et al teaches the system, wherein the central parameter server includes a memory module (paragraph [0331]...system memory 2404. Depending on the configuration and type of computing device, the system memory 2404 can comprise, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories) for storing the state of the set of parameters when each processing job is assigned (paragraph [0150]...a multi-advisory model, a single-agent reinforcement learning task can be partitioned into a multi-agent problem (e.g., using a divide and conquer paradigm). All agents can be placed at a same level and be given advisory roles that include providing an aggregator with local Q-values for each available action. A multi-advisory model can be a generalization of reinforcement learning with ensemble models, allowing for both the fusion of several weak reinforcement learners and the decomposition of a single-agent reinforcement learning problem into concurrent subtasks. In some techniques for combining reinforcement learning and ensemble methods, agents are trained independently and greedily to their local optimality, and are aggregated into a global policy by voting or averaging)

As to claim 15, VAN SEIJEN et al teaches the system, wherein each of the plurality of model servers (paragraph [0348]...computing system from a remote source, such as a general computing device 604 (e.g., personal computer), tablet computing device 606, or mobile computing device 608) includes a memory module (paragraph [0331]...system memory 2404. Depending on the configuration and type of computing device, the system memory 2404 can comprise, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories) storing the state of the set of parameters when each processing job is assigned (paragraph [0150]...a multi-advisory model, a single-agent reinforcement learning task can be partitioned into a multi-agent problem (e.g., using a divide and conquer paradigm). All agents can be placed at a same level and be given advisory roles that include providing an aggregator with local Q-values for each available action. A multi-advisory model can be a generalization of reinforcement learning with ensemble models, allowing for both the fusion of several weak reinforcement learners and the decomposition of a single-agent reinforcement learning problem into concurrent subtasks. In some techniques for combining reinforcement learning and ensemble methods, agents are trained independently and greedily to their local optimality, and are aggregated into a global policy by voting or averaging)

As to claim 16, VAN SEIJEN et al teaches the system, wherein each of the plurality of model learners (paragraph [0348]...computing system from a remote source, such as a general computing device 604 (e.g., personal computer), tablet computing device 606, or mobile computing device 608) is a separate computer.

As to claim 17, VAN SEIJEN et al teaches the system, wherein each of the plurality of model learners (paragraph [0348]...computing system from a remote source, such as a general computing device 604 (e.g., personal computer), tablet computing device 606, or mobile computing device 608) is a virtual machine hosted on a computer (paragraph [0335]... Such a system-on-a-chip device may include one or more processing units, graphics units, communications units, system virtualization units and various application functionality all of which are integrated (or "burned") onto the chip substrate as a single integrated circuit. When operating via an SOC, the functionality, described herein, with respect to the capability of client to switch protocols may be operated via application-specific logic integrated with other components of the computing device 2400 on the single integrated circuit (chip)).

As to claim 18, VAN SEIJEN et al teaches the system, wherein the computer is in communication with the central parameter server (paragraph [0156]... aggregator 1604and an environment 1606 ; paragraph [0062]...an Integrated Learning System (ILS), which integrates heterogeneous learning agents (such as search-based and knowledge-based) under a central controller through which the agents critique each other's proposals, may be employed ; paragraph [0331]... at least one processing unit 2402 (e.g., a central processing unit) ; paragraph [0348]...the server device 602 may provide data to and from a client computing device) over the Internet (paragraph [0346]... [0346] Data/information generated or captured by the mobile computing device 500 and stored via the system 502 may be stored locally on the mobile computing device 500, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio interface layer 572 or via a wired connection between the mobile computing device 500 and a separate computing device associated with the mobile computing device 500, for example, a server computer in a distributed computing network, such as the Internet).

Allowable Subject Matter
Claims 1 – 11, 19, and 20 are allowable over the prior art of record because the art of record does not disclose or suggest obvious assigning a second processing job to a second model learner, using the central parameter server, wherein the second processing job includes solving a second gradient based on the set of parameters of the first state; performing the first processing job in the first model learner; iterating the set of parameters from the first state to a second state ba sed on the results of the 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075.  The examiner can normally be reached on Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 


/BRANDON S COLE/Primary Examiner, Art Unit 2122