DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This action is in response to claims filed 23 August 2022 for application 16/168,266. Claims 1, 8, and 15 have been amended. Currently claims 1-20 are pending and have been examined.
The objection to the abstract has been withdrawn in view of the amendments made.

	Response to Arguments

Applicant’s arguments, see pages 8-10, filed 23 August 2022, with respect to the feature “wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label” as recited in independent claim 1 (and similarly in independent claims 8 and 15) have been considered but are moot because the new ground of rejection (citing new reference Kong for teaching the new limitation) does not rely on any reference combination applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Furthermore, applicant's arguments, see pages 9 and 10, with respect to the rejection of dependent claims under 35 USC § 103 have been fully considered but they are not persuasive because these claims depend from one of the independent claims 1, 8, or 15 and the combination of cited references teach every element of the amended claims as shown in detail below.

				Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over McGill et al (Deciding How to Decide: Dynamic Routing in Artificial Neural Networks, 2017) in view of Kong et al (Collaborative Deep Reinforcement Learning for Joint Object Search, 2017).
Regarding claim 1
McGill teaches: A system, comprising: a memory; and a processor, operably coupled to the memory and that ([Page 7, column 1, last paragraph] saving memory. [Page 8, Column 2, Last paragraph] computational constraints, memory constraints): 
creates a neural network that routes the neural network to a first layer of neurons that comprises a plurality of neurons ([Abstract] We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks. [Page 2, Column 2, Section 4] We propose three approaches to training dynamically routed networks. [Page 3, Column 1, Section 4.1] over the course of training, converging the training routing policy towards the inference routing policy); 
and performs a plurality of successive training iterations on the neural network, a first iteration of the plurality of successive training iterations comprising both training a router to route among the plurality of neurons of the first layer of neurons, and training a first neuron of the plurality of neurons of the first layer of neurons to produce a given output from a given input ([Page 5, Column 1, Section 4.7] In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iteration. [Page 3, section 4.1] We can then learn the routing parameters and classification parameters simultaneously. In our experiments, the training routing policy. [Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector).
However, McGill does not explicitly disclose: wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label.
Kong teaches, in an analogous system: wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks ([Page 1696, Column 1, Paragraph 2] We present a collaborative multi-agent deep RL algorithm. [Page 1699, Section 3.2.3, Paragraph 1] Intuitively the joint sampling idea can be implemented via simultaneously forward and backward passes through all Qnetworks. However in practice, we adopted an alternative implementation with a concept of virtual agents. For each Q-network of an object class, we assign an actual agent detector. Meantime, for each cross network connection we assign a what we call virtual agent. The virtual agents share weights of the corresponding layers with the actual agents. Figure 3 illustrates this idea for the example of Figure 2. [Page 1699, Section 3.2.3, Paragraph 3] For example, suppose we would like to jointly train person and bicycle detectors. Note: Actual agent detector corresponds to the router and virtual agents correspond to the function blocks.), wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label ([Page 1696, Column 1, Paragraph 2] We present a collaborative multi-agent deep RL algorithm.  [Page 1697, Column 2, Section 3.2.1, Paragraph 1] where r = R(a, s → s′) is the specific reward by taking action a to move state s to s′.  [Page 1702, Column 1, Section 4.5, Last Paragraph ] In this case, our joint model correctly detects. Note: Also see Figure 2 showing 2 agents jointly training and determining the labels "person" and "bicycle").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill to incorporate the teachings of Kong wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label. One would have been motivated to do this modification because doing so would give the benefit of the virtual agents sharing weights of the corresponding layers with the actual agents as taught by Kong [Page 1699, Section 3.2.3, Paragraph 1].

Regarding claim 2
The system of McGill and Kong teaches: The system of claim 1 (as shown above).
McGill further teaches: wherein the processor further performs a second iteration of the plurality of successive training iterations comprising training the router to route among the plurality of neurons of the first layer of neurons, and training the first neuron or a second neuron of the plurality of neurons of the first layer of neurons to produce a second given output from a second given input ([Page 5, Column 1, Section 4.7] In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iteration. simultaneously [Page 3, section 4.1] We can then learn the routing parameters and classification parameters. [Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. The output feature vector is then used as the input to the following layer (which we’ll refer to as the current layer’s sink), if it exists, or as the output of the network as a whole, if it does not).
Regarding claim 3
The system of McGill and Kong teaches: The system of claim 1 (as shown above).
McGill further teaches: wherein the processor performs iterative training on the neural network with a plurality of data pairs, each data pair comprising an input to the neural network, and an intended output from the neural network that corresponds to the input ([Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. The output feature vector is then used as the input to the following layer (which we’ll refer to as the current layer’s sink), if it exists, or as the output of the network as a whole, if it does not. [Page 5, Column 2, Paragraph 1] Our dataset includes the classes “0”, “1”, “2”, “3”, and “4” from MNIST and “airplane”, “automobile”, “deer”, “horse”, and “frog” from CIFAR-10 (see Fig. 4). See also Figure 6 [Page 6] and Equation 1 [Page 2]).
Regarding claim 4
The system of McGill and Kong teaches: The system of claim 1 (as shown above).
McGill further teaches: wherein the processor operates on a first data instance and a second data instance, wherein the router is trained to route the first data instance through a first path of the neural network, and wherein the router is trained to route the second data instance through a second path of the neural network ([Abstract] We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks: graphs of learned transformations through which different input signals may take different paths. [Page 8, Column 1, Last paragraph] We find that dynamic routing is more beneficial when the task involves many low-difficulty decisions, allowing the network to route more data along shorter paths).
Regarding claim 15
McGill teaches: A computer program product for training a neural network, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: create, by the processor, the neural network that routes the neural network to a first layer of neurons that comprises a plurality of neurons ([Page 2, Column 2, Section 4] We propose three approaches to training dynamically routed networks. [Page 3, Column 1, Section 4.1] over the course of training, converging the training routing policy towards the inference routing policy); 
and perform, by the processor, a plurality of successive training iterations on the neural network, a first iteration of the plurality of successive training iterations comprising both training a router to route among the plurality of neurons of the first Page 39 of 42 P201704336US01layer of neurons, and training a first neuron of the plurality of neurons of the first layer of neurons to produce a given output from a given input ([Page 5, Column 1, Section 4.7] In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iteration. [Page 3, section 4.1] We can then learn the routing parameters and classification parameters simultaneously. In our experiments, the training routing policy. [Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. [Page 2, Column 2, Section 4] We propose three approaches to training dynamically routed networks).
However, McGill does not explicitly disclose: wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label.
Kong teaches, in an analogous system: wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks ([Page 1696, Column 1, Paragraph 2] We present a collaborative multi-agent deep RL algorithm. [Page 1699, Section 3.2.3, Paragraph 1] Intuitively the joint sampling idea can be implemented via simultaneously forward and backward passes through all Qnetworks. However in practice, we adopted an alternative implementation with a concept of virtual agents. For each Q-network of an object class, we assign an actual agent detector. Meantime, for each cross network connection we assign a what we call virtual agent. The virtual agents share weights of the corresponding layers with the actual agents. Figure 3 illustrates this idea for the example of Figure 2. [Page 1699, Section 3.2.3, Paragraph 3] For example, suppose we would like to jointly train person and bicycle detectors. Note: Actual agent detector corresponds to the router and virtual agents correspond to the function blocks), wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label ([Page 1696, Column 1, Paragraph 2] We present a collaborative multi-agent deep RL algorithm.  [Page 1697, Column 2, Section 3.2.1, Paragraph 1] where r = R(a, s → s′) is the specific reward by taking action a to move state s to s′.  [Page 1702, Column 1, Section 4.5, Last Paragraph ] In this case, our joint model correctly detects. Note: Also see Figure 2 showing 2 agents jointly training and determining the labels "person" and "bicycle").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill to incorporate the teachings of Kong wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label. One would have been motivated to do this modification because doing so would give the benefit of the virtual agents sharing weights of the corresponding layers with the actual agents as taught by Kong [Page 1699, Section 3.2.3, Paragraph 1].

Regarding claim 16
The system of McGill and Kong teaches: The computer program product of claim 15, wherein the program instructions are further executable by the processor to cause the processor to: perform, by the processor (as shown above).
McGill further teaches: a second iteration of the plurality of successive training iterations comprising training the router to route among the plurality of neurons of the first layer of neurons, and training the first neuron or a second neuron of the plurality of neurons of the first layer of neurons to produce a second given output from a second given input ([Page 5, Column 1, Section 4.7] In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iteration. [Page 3, section 4.1] We can then learn the routing parameters and classification parameters simultaneously. [Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. The output feature vector is then used as the input to the following layer (which we’ll refer to as the current layer’s sink), if it exists, or as the output of the network as a whole, if it does not).

Regarding claim 17
The system of McGill and Kong teaches: The computer program product of claim 15, wherein the program instructions are further executable by the processor to cause the processor to: perform, by the processor (as shown above).
McGill further teaches: iterative training on the neural network with a plurality of data pairs, each data pair comprising an input to the neural network, and an intended output from the neural network that corresponds to the input ([Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. The output feature vector is then used as the input to the following layer (which we’ll refer to as the current layer’s sink), if it exists, or as the output of the network as a whole, if it does not. [Page 5, Column 2, Paragraph 1] Our dataset includes the classes “0”, “1”, “2”, “3”, and “4” from MNIST and “airplane”, “automobile”, “deer”, “horse”, and “frog” from CIFAR-10 (see Fig. 4). See also Figure 6 [Page 6] and Equation 1 [Page 2]).
Regarding claim 18
The system of McGill and Kong teaches: The computer program product of claim 15, wherein the program instructions are further executable by the processor to cause the processor to: operate, by the processor (as shown above).
McGill further teaches: on a first data and a second data, wherein the router is trained to route the first data through a first path of the neural network, and wherein the router is trained to route the second data through a second path of the neural network ([Abstract] We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks: graphs of learned transformations through which different input signals may take different paths. [Page 8, Column 1, Last paragraph] We find that dynamic routing is more beneficial when the task involves many low-difficulty decisions, allowing the network to route more data along shorter paths).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over McGill et al (Deciding How to Decide: Dynamic Routing in Artificial Neural Networks, 2017) in view of Kong et al (Collaborative Deep Reinforcement Learning for Joint Object Search, 2017) and further in view of Devlin et al (Dynamic Potential-Based Reward Shaping, 2012).
Regarding claim 5
The system of McGill and Kong teaches: The system of claim 1 (as shown above), wherein the training component trains the router ([Page 2, Column 2, Section 4] We propose three approaches to training dynamically routed networks).
However, McGill does not explicitly disclose: using a compression reward that is positive based on a determination that a routing decision being made has historically been made by other agents in the past.
Devlin teaches, in an analogous system: using a compression reward that is positive based on a determination that a routing decision being made has historically been made by other agents in the past ([Page 3, Column 2, Section 2.2, Paragraph 1] provide an additional reward representative of prior knowledge. Note: Additional reward corresponds to compression reward that is positive and prior knowledge corresponds to routing decision historically made by other agents in the past).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of McGill and Kong to incorporate the teachings of Devlin to use prior knowledge to provide reward. One would have been motivated to do this modification because doing so would give the benefit of reducing the number of suboptimal actions made and so reducing the time needed to learn as taught by Devlin [Page 3, Column 2, Section 2.2, Paragraph 1].

Claims 6, 7, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over McGill et al (Deciding How to Decide: Dynamic Routing in Artificial Neural Networks, 2017) in view of Kong et al (Collaborative Deep Reinforcement Learning for Joint Object Search, 2017) and further in view of Chen et al (Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition).
Regarding claim 6
The system of McGill and Kong teaches: The system of claim 1 (as shown above).
McGill further teaches: wherein the processor trains a plurality of neural network layers that comprise the first layer of neurons using stochastic gradient descent ([Page 5, Column 1, Section 4.7] We perform stochastic gradient descent with initial learning).
However, McGill does not explicitly disclose: and back propagation.
Chen teaches, in an analogous system: and back propagation ([Page 4, Column 2, Section: Optimization, Paragraph 3] backpropagation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill and Kong to incorporate the teachings of Chen to use back propagation. One would have been motivated to do this modification because doing so would give the benefit of leveraging the REINFORCE algorithm from the reinforcement learning community to estimate gradient utilizing sample approximation to compute the gradients as taught by Chen [Page 4, Column 2, Section: Optimization, Paragraph 3].

Regarding claim 7
The system of McGill and Kong teaches: The system of claim 1 (as shown above).
McGill further teaches: wherein the processor trains the first neuron using stochastic gradient descent ([Page 5, Column 1, Section 4.7] We perform stochastic gradient descent with initial learning).
However, McGill does not explicitly disclose: and back propagation.
Chen teaches, in an analogous system: and back propagation ([Page 4, Column 2, Section: Optimization, Paragraph 3] backpropagation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill and Kong to incorporate the teachings of Chen to use back propagation. One would have been motivated to do this modification because doing so would give the benefit of leveraging the REINFORCE algorithm from the reinforcement learning community to estimate gradient utilizing sample approximation to compute the gradients as taught by Chen paragraph [Page 4, Column 2, Section: Optimization, Paragraph 3].
Regarding claim 19
The system of McGill and Kong teaches: The computer program product of claim 15, wherein the program instructions are further executable by the processor to cause the processor to: train, by the processor, the router (as shown above).
However, McGill does not explicitly disclose: using reinforcement learning.
Chen teaches, in an analogous system: using reinforcement learning ([Abstract] this paper proposes a recurrent attention reinforcement learning framework).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill and Kong to incorporate the teachings of Chen to use reinforcement learning. One would have been motivated to do this modification because doing so would give the benefit of sequential decision-making as taught by Chen [Page 4, Column 1, Paragraph 2].
Regarding claim 20
The system of McGill and Kong teaches: The computer program product of claim 15, wherein the program instructions are further executable by the processor to cause the processor to:Page 40 of 42 P201704336US01train, by the processor (as shown above).
McGill further teaches: a plurality of network layers that comprises the first layer of neurons using stochastic gradient descent ([Page 5, Column 1, Section 4.7] We perform stochastic gradient descent with initial learning).
However, McGill does not explicitly disclose: and back propagation.
Chen teaches, in an analogous system: and back propagation ([Page 4, Column 2, Section: Optimization, Paragraph 3] backpropagation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill and Kong to incorporate the teachings of Chen to use back propagation. One would have been motivated to do this modification because doing so would give the benefit of leveraging the REINFORCE algorithm from the reinforcement learning community to estimate gradient utilizing sample approximation to compute the gradients as taught by Chen [Page 4, Column 2, Section: Optimization, Paragraph 3].

Claims 8-11 are rejected under 35 U.S.C. 103 as being unpatentable over McGill et al (Deciding How to Decide: Dynamic Routing in Artificial Neural Networks, 2017) in view of Kong et al (Collaborative Deep Reinforcement Learning for Joint Object Search, 2017) and further in view of Maes et al (Sequence labeling with Reinforcement Learning and Ranking Algorithms, 2007).
Regarding claim 8
McGill teaches: A computer-implemented method, comprising: creating, by a system operatively coupled to a processor, a neural network that routes the neural network to a first layer of neurons that comprises a plurality of neurons ([Abstract] We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks. [Page 2, Column 2, Section 4] We propose three approaches to training dynamically routed networks. [Page 3, Column 1, Section 4.1] over the course of training, converging the training routing policy towards the inference routing policy); 
and performing, by the system, a plurality of successive training iterations on the neural network, a first iteration of the plurality of successive training iterations comprising both training a router to route among the plurality of neurons of the first layer of neurons, and training a first neuron of the plurality of neurons of the first layer of neurons to produce a given output from a given input ([Page 5, Column 1, Section 4.7] In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iteration. [Page 3, section 4.1] We can then learn the routing parameters and classification parameters simultaneously. In our experiments, the training routing policy. [Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector).
However, McGill does not explicitly disclose: wherein the processor trains the router using a collaborative multi-agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label and supplies a penalty based on a determination that the model did not correctly predict the label.
Kong teaches, in an analogous system: wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks ([Page 1696, Column 1, Paragraph 2] We present a collaborative multi-agent deep RL algorithm. [Page 1699, Section 3.2.3, Paragraph 1] Intuitively the joint sampling idea can be implemented via simultaneously forward and backward passes through all Qnetworks. However in practice, we adopted an alternative implementation with a concept of virtual agents. For each Q-network of an object class, we assign an actual agent detector. Meantime, for each cross network connection we assign a what we call virtual agent. The virtual agents share weights of the corresponding layers with the actual agents. Figure 3 illustrates this idea for the example of Figure 2. [Page 1699, Section 3.2.3, Paragraph 3] For example, suppose we would like to jointly train person and bicycle detectors. Note: Actual agent detector corresponds to the router and virtual agents correspond to the function blocks.), wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label ([Page 1696, Column 1, Paragraph 2] We present a collaborative multi-agent deep RL algorithm.  [Page 1697, Column 2, Section 3.2.1, Paragraph 1] where r = R(a, s → s′) is the specific reward by taking action a to move state s to s′.  [Page 1702, Column 1, Section 4.5, Last Paragraph ] In this case, our joint model correctly detects. Note: Also see Figure 2 showing 2 agents jointly training and determining the labels "person" and "bicycle").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill to incorporate the teachings of Kong wherein the processor trains the router using a collaborative multi- agent reinforcement learning that jointly trains the router and function blocks, wherein the collaborative multi-agent reinforcement learning  generates a performance reward that supplies a reward based on a determination that a model correctly predicts a label. One would have been motivated to do this modification because doing so would give the benefit of the virtual agents sharing weights of the corresponding layers with the actual agents as taught by Kong [Page 1699, Section 3.2.3, Paragraph 1].
Maes teaches, in an analogous system: and supplies a penalty based on a determination that the model did not correctly predict the label ([Page 5, Paragraph 2] Each time the agent fails to predict the correct label it receives a penalty of 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of McGill and Kong to incorporate the teachings of Maes to supply a penalty based on a determination that the model did not correctly predict the label. One would have been motivated to do this modification because doing so would give the benefit of directly decomposing the Hamming Loss over individual actions as taught by Maes [Page 5, Paragraph 2].
Regarding claim 9
The system of McGill, Kong, and Maes teaches: The computer-implemented method of claim 8, further comprising: performing, by the system (as shown above).
 McGill further teaches: a second iteration of the plurality of successive training iterations comprising training the router to route among the plurality of neurons of the first layer of neurons, and training the first neuron or a second neuron of the plurality of neurons of the first layer of neurons to produce a second given output from a second given input ([Page 5, Column 1, Section 4.7] In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iteration. [Page 3, section 4.1] We can then learn the routing parameters and classification parameters simultaneously. [Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. The output feature vector is then used as the input to the following layer (which we’ll refer to as the current layer’s sink), if it exists, or as the output of the network as a whole, if it does not).

Regarding claim 10
The system of McGill, Kong, and Maes teaches: The computer-implemented method of claim 8, further comprising:Page 38 of 42 P201704336US01performing, by the system (as shown above).
McGill further teaches: iterative training on the neural network with a plurality of data pairs, each data pair comprising an input to the neural network, and an intended output from the neural network that corresponds to the input ([Page 2, Column 1, Section 3] In a statically-routed, feedforward artificial neural network, every layer transforms a single input feature vector into a single output feature vector. The output feature vector is then used as the input to the following layer (which we’ll refer to as the current layer’s sink), if it exists, or as the output of the network as a whole, if it does not. [Page 5, Column 2, Paragraph 1] Our dataset includes the classes “0”, “1”, “2”, “3”, and “4” from MNIST and “airplane”, “automobile”, “deer”, “horse”, and “frog” from CIFAR-10 (see Fig. 4). See also Figure 6 [Page 6] and Equation 1 [Page 2]).

Regarding claim 11
The system of McGill, Kong, and Maes teaches: The computer-implemented method of claim 8, further comprising: operating, by the system (as shown above).
McGill further teaches: on a first data and a second data, wherein the router is trained to route the first data through a first path of the neural network, and wherein the router is trained to route the second data through a second path of the neural network ([Abstract] We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks: graphs of learned transformations through which different input signals may take different paths. [Page 8, Column 1, Last paragraph] We find that dynamic routing is more beneficial when the task involves many low-difficulty decisions, allowing the network to route more data along shorter paths).

Claims 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over McGill et al (Deciding How to Decide: Dynamic Routing in Artificial Neural Networks, 2017) in view of Kong et al (Collaborative Deep Reinforcement Learning for Joint Object Search, 2017) and further in view of Maes et al (Sequence labeling with Reinforcement Learning and Ranking Algorithms, 2007) and Chen et al (Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition).
Regarding claim 12
The system of McGill, Kong, and Maes teaches: The computer-implemented method of claim 8, further comprising: training, by the system, the router (as shown above).
However, McGill does not explicitly disclose: using reinforcement learning.
Chen teaches, in an analogous system: using reinforcement learning ([Abstract] this paper proposes a recurrent attention reinforcement learning framework).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill to incorporate the teachings of Chen to use reinforcement learning comprising a performance reward that supplies a reward based on a determination that a model correctly predicts a label. One would have been motivated to do this modification because doing so would give the benefit of sequential decision-making as taught by Chen [Page 4, Column 1, Paragraph 2].
Regarding claim 13
The system of McGill, Kong, and Maes teaches: The computer-implemented method of claim 8, further comprising: training, by the system, the router (as shown above).
McGill further teaches: using stochastic gradient descent ([Page 5, Column 1, Section 4.7] We perform stochastic gradient descent with initial learning).
However, McGill does not explicitly disclose: and back propagation.
Chen teaches, in an analogous system: and back propagation ([Page 4, Column 2, Section: Optimization, Paragraph 3] backpropagation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill to incorporate the teachings of Chen to use back propagation. One would have been motivated to do this modification because doing so would give the benefit of leveraging the REINFORCE algorithm from the reinforcement learning community to estimate gradient utilizing sample approximation to compute the gradients as taught by Chen [Page 4, Column 2, Section: Optimization, Paragraph 3].

Regarding claim 14
The system of McGill, Kong, and Maes teaches: The computer-implemented method of claim 8, further comprising: training, by the system (as shown above).
McGill further teaches: the first neuron using stochastic gradient descent ([Page 5, Column 1, Section 4.7] We perform stochastic gradient descent with initial learning).
However, McGill does not explicitly disclose: and back propagation.
Chen teaches, in an analogous system: and back propagation ([Page 4, Column 2, Section: Optimization, Paragraph 3] backpropagation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of McGill to incorporate the teachings of Chen to use back propagation. One would have been motivated to do this modification because doing so would give the benefit of leveraging the REINFORCE algorithm from the reinforcement learning community to estimate gradient utilizing sample approximation to compute the gradients as taught by Chen [Page 4, Column 2, Section: Optimization, Paragraph 3].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhang et al (2007) discloses Conditional Random Fields for Multi-agent Reinforcement Learning.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 7am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                         
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128