Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 21, 27 and 33 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
The prior art of record, Donnarumma et al. (“Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription”) teaches a programmable neural network based on programs which the programmer send to the interpreter. (secs 2-3)
Rossum et al. (Python Frequently Asked Questions) teaches keys for identifying programs and their corresponding functions. (sec 2.3.6 and sec 3.14)
Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) teaches determining if a called program is terminated and control returns to a calling program. (sec 3.2, pp. 239-240)
Das et al. (Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory) teaches determining next programs to be called. (sec “Stack Control” and sec “Training of the NNPDA”)
However, the prior art of record, taken either alone or in combination, fails to teach or fairly suggest claims 21, 27 and 33, in combination with the remaining features and elements of the claimed invention.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.
Such claim limitation(s) is/are: 
“a subsystem configured to, for each neural network output: determine, from the neural network output, whether or not to end a currently invoked program and to return to a calling program” in claim 19; “the subsystem is configured to return a hidden state of the core neural network to a hidden state of the core neural network when the calling program was selected and provide an embedding for the calling program as part of the next neural network input” in claim 21; “the subsystem … have been trained using execution traces as training data” in claim 24; “the subsystem determines whether or not to end the currently invoked program based on the probability” in claim 37. (Note that pars 58-62 of the present application describe a sufficient structure for performing the claimed function. In addition, Donnarumma teaches determining an action based on the neural network output in simulations which inherently imply performing on a computer.)
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 19-24 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 4, 3, 6 of U.S. Patent No. 10,635,974 B2 (“Reference Patent”). Although the claims at issue are not identical, they are not patentably distinct from each other. This is a nonprovisional nonstatutory double patenting rejection because the patentably indistinct claims have been patented.
Instant application
Reference Patent


a core neural network configured to receive one or more neural network inputs and to generate a respective neural network output for each of the neural network inputs; 

a memory storing, for each program in the set of programs, a key identifying the program and an embedding for the program, 

wherein the embedding for a program is a collection of numeric values that represents the program; and 





a subsystem configured to, for each neural network output: 

determine, from the neural network output, whether or not to end a currently invoked program and to return to a calling program; 

determine, from the neural network output, a next program to be called; and 

determine, from the neural network output, contents of arguments to the next program to be called. 



a core recurrent neural network configured to receive a sequence of neural network inputs and to generate a sequence of neural network outputs; 

a memory storing, for each program in the set of programs, a key identifying the program and an embedding for the program, 

wherein: the embedding for a program is a collection of numeric values that represents the program, and 

the embeddings for the program have been determined through training on a set of training data; and 

a subsystem configured to, for each neural network output: 

determine, from the neural network output, whether or not to end a currently invoked program and to return to a calling program; 

determine, from the neural network output, a next program to be called;

determine, from the neural network output, contents of arguments to the next program to be called; 

receive a representation of a current state of the environment 1; and

generate a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment 2.


receive a representation of a current state of the environment 1; and 

generate a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment 2.

21. (New) The neural network system of claim 20, 
wherein, in response to determining to end a currently invoked program and to return to a calling program, the subsystem is configured to return a hidden state of the core neural network to a hidden state of the core neural network when the calling program was selected and provide an embedding for the calling program as part of the next neural network input.
2. The neural network system of claim 1, 

wherein, in response to determining to end a currently invoked program and to return to a calling program, the subsystem is configured to return a hidden state of the core recurrent neural network to a hidden state of the core recurrent neural network when the calling program was selected and provide an embedding for the calling program as part of the next neural network input.
22. (New) The neural network system of claim 20, wherein generating the next neural network input comprises: 
extracting a fixed-length state encoding from the representation of the current state of the environment using a domain-specific encoder; and 
combining the fixed-length state encoding and the embedding for the next program to 

extracting a fixed-length state encoding from the representation of the current state of the environment using a domain-specific encoder; and 
combining the fixed-length state encoding and the embedding for the next program to 

wherein determining the next program to be called comprises: 
determining, from the neural network output, a program key; and 
selecting a program from the set of programs having a key that is most similar to the program key.
3. The neural network system of claim 1, 

wherein determining the next program to be called comprises: 
determining, from the neural network output, a program key; and 
selecting a program from the set of programs having a key that is most similar to the program key.
24. (New) The neural network system of claim 19, 
wherein the subsystem and the core recurrent neural network have been trained using execution traces as training data.
6. The neural network system of claim 1, 

wherein the subsystem and the core recurrent neural network have been trained using execution traces as training data.

* The superscripts are used for indicating corresponding subject matter between the instant application and the reference patent. 

Similarly, Claims 25-30 are rejected on the ground of nonstatutory double patenting, mutatis mutandis, as being unpatentable over claims 7, 8, 10, 9, 12 of U.S. Patent No. 10,635,974 B2 (“Reference Patent”). 

Similarly, Claims 31-36 are rejected on the ground of nonstatutory double patenting, mutatis mutandis, as being unpatentable over claims 13, 14, 16, 15, 18 of U.S. Patent No. 10,635,974 B2 (“Reference Patent”). 

Claims 37-38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 19-20 of U.S. Patent No. 10,635,974 B2 (“Reference Patent”). Although the claims at issue are not identical, they are not patentably distinct from each other. This is a nonprovisional nonstatutory double patenting rejection because the patentably indistinct claims have been patented.
Instant application
Reference Patent
37. (New) The neural network system of claim 19, wherein:
the neural network output comprises a first portion that is a probability with which the currently invoked program should be ended, and 
the subsystem determines whether or not to end the currently invoked program based on the probability.
19. The neural network system of claim 1, wherein: 
the neural network output comprises a first portion that is a probability with which the currently invoked program should be ended, and 
the subsystem determines whether or not to end the currently invoked program based on the probability.
38. (New) The neural network system of claim 19, wherein, for each neural network input, the core recurrent neural network is further configured to perform operations comprising: 
processing the neural network input to update a current hidden state of the core neural network; and 

applying a first function to the updated hidden state to generate a probability that the currently invoked program should be ended; 
applying a second function to the updated hidden state to generate a key that identifies the next program to be invoked; and 
applying a third function to the updated hidden state to generate arguments for the next program.


processing the neural network input to update a current hidden state of the core neural network; and 

applying a first function to the updated hidden state to generate a probability that the currently invoked program should be ended; 
applying a second function to the updated hidden state to generate a key that identifies the next program to be invoked; and 
applying a third function to the updated hidden state to generate arguments for the next program.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 19-20, 22-23, 25-26, 28-29, 31-32, 34-35, 37 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 19
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of machine.
Step 2A Prong 1:
The limitations of 
“a core neural network configured to receive one or more neural network inputs and to generate a respective neural network output for each of the neural network inputs; 
a memory storing, for each program in the set of programs, a key identifying the program and an embedding for the program, wherein the embedding for a program is a collection of numeric values that represents the program; and 
a subsystem configured to, for each neural network output: 
determine, from the neural network output, whether or not to end a currently invoked program and to return to a calling program; 
determine, from the neural network output, a next program to be called; and 
determine, from the neural network output, contents of arguments to the next program to be called”, as drafted, are a system that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “neural network” and “receiving”, nothing in the claim element precludes the system from practically being performed in the mind. For example, but for the “neural network” and “receiving” languages, the limitations in the context of this claim encompass the user mentally thinking of receiving data and generating output based on the received data; storing a key for a rule and its representation; determining to end a current rule and return to a previous rule; determining a next rule; determining some inputs for the next rule.



Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional elements – the act of receiving data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of receiving data is recited at a high-level of generality (i.e., as a generic act of receiving performing a generic act function of receiving data) such that it amounts no more than a mere act to apply the exception using a generic act of receiving. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
In particular, the claim recites an additional element – using “neural network”. The neural network in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B
The claim is appending a well-understood, routine, conventional activity previously known to the industry, specified at a high level of generality, to the judicial exception - see MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network, e.g., using the Internet to gather data” is Well-Understood, Routine, and Conventional Activity (MPEP 2106.05(d)). As discussed above with respect to integration of the abstract idea into a practical application, the additional element of the act of receiving/transmitting data amounts to no more than a mere act to apply the exception using a generic act of receiving/transmitting. A mere act to apply an exception using a generic act of receiving/transmitting cannot provide an inventive concept. The claim is not patent eligible.
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Similarly, Claims 25 and 31 are rejected under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without adding significantly more than the judicial exception.

Regarding claim 20
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of machine.
Step 2A Prong 1
The limitations of 
“receive a representation of a current state of the environment; and 
generate a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment”, as drafted, are a system that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “neural network” and “receive”, nothing in the claim element precludes the system from practically being performed in the mind. For example, but for the “neural network” and “receive” languages, the limitations in the context of this claim encompass the user mentally thinking of receiving data of an environment; generating input data from a representation for a next rule and the data of an environment.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional elements – the act of receiving data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of receiving data is recited at a high-level of generality (i.e., as a generic act of receiving performing a generic act function of receiving data) such that it amounts no more than a mere act to apply the exception using a generic act of receiving. Accordingly, this additional element does not integrate the abstract 
In particular, the claim recites an additional element – using “neural network”. The neural network in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
The claim is appending a well-understood, routine, conventional activity previously known to the industry, specified at a high level of generality, to the judicial exception - see MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network, e.g., using the Internet to gather data” is Well-Understood, Routine, and Conventional Activity (MPEP 2106.05(d)). As discussed above with respect to integration of the abstract idea into a practical application, the additional element of the act of receiving/transmitting data amounts to no more than a mere act to apply the exception using a generic act of receiving/transmitting. A mere act to apply an exception using a generic act of receiving/transmitting cannot provide an inventive concept. The claim is not patent eligible.
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception 

Similarly, Claims 26 and 32 are rejected under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without adding significantly more than the judicial exception.

Regarding claim 22
Claim 22 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of machine.
Step 2A Prong 1:
The limitations of 
“extracting a fixed-length state encoding from the representation of the current state of the environment using a domain-specific encoder; and 
combining the fixed-length state encoding and the embedding for the next program to be called to generate the next neural network input.”, as drafted, are a system that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “neural network”, nothing in the claim element precludes the system from practically being performed in the mind. For example, but for the “neural network” languages, the limitations in the context of this claim encompass the user mentally thinking of generating a representation of the current state of an environment; combining environment representation and next rule representation to generate input data.



Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element – using “neural network”. The neural network in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Similarly, Claims 28 and 34 are rejected under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without adding significantly more than the judicial exception.


Regarding claim 23
Claim 23 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of machine.
Step 2A Prong 1:
The limitations of 
“determining, from the neural network output, a program key; and 
selecting a program from the set of programs having a key that is most similar to the program key.”, as drafted, are a system that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “neural network”, nothing in the claim element precludes the system from practically being performed in the mind. For example, but for the “neural network” languages, the limitations in the context of this claim encompass the user mentally thinking of determining a key for a rule from a data; selecting a rule which is the closest to the key.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2
In particular, the claim recites an additional element – using “neural network”. The neural network in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Similarly, Claims 29 and 35 are rejected under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without adding significantly more than the judicial exception.


Regarding claim 37
Claim 37 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of machine.
Step 2A Prong 1:
The limitations of 
“the neural network output comprises a first portion that is a probability with which the currently invoked program should be ended, and 
the subsystem determines whether or not to end the currently invoked program based on the probability.”, as drafted, are a system that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “neural network”, nothing in the claim element precludes the system from practically being performed in the mind. For example, but for the “neural network” languages, the limitations in the context of this claim encompass the user mentally thinking of a probability of ending a rule; determining to end a current rule based on the probability. 

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element – using “neural network”. The neural network in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 19-20, 22, 24-26, 28, 30 are rejected under 35 U.S.C. 103 as being unpatentable over Donnarumma et al. (A Programmer-Interpreter neural network architecture for prefrontal cognitive control) in view of Rossum et al. (Python Frequently Asked Questions) further in view of Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) further in view of Das et al. (Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory)

Regarding claim 19
Donnarumma teaches 
A neural network system for invoking one or more programs selected from a set of programs to cause an environment to transition into a different state, the neural network system comprising: 

a core neural network configured to receive one or more neural network inputs and to generate a respective neural network output for each of the neural network inputs; 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY).”; e.g., “programmer” with CTRNN may read on “core neural network”. In addition, e.g., “program” along with “feeds it to the bottom (pre-)motor layer” may read on “output”.)

a memory storing, for each program in the set of programs, a key identifying the program and an embedding for the program, wherein the embedding for a program is a collection of numeric values that represents the program; and 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … the term 
    PNG
    media_image1.png
    48
    417
    media_image1.png
    Greyscale
 constitutes the values of the programs, that are input signals sent from PFC weighted by the sparse C’’mj matrix. We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “The task has an underlying hierarchical structure and has been designed to test the participants’ (or computational model’s) ability to selectively update the content of PFC working memory (here, the program to be used) and use it to select the most appropriate response. … Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer. The PNN implementation includes a hierarchy structured in a top layer and a bottom layer. The top (programmer) layer of 14 neurons is meant to detect and keep in memory the four programs (I = {I1AX, I2BY, I3AX, I4BY}) and its implementations refer to Equations (2).”; e.g., “maintain or switch a program in working memory” may read on “a memory storing, for each program in the set of programs, a key identifying the program” since each program is saved into a specific address. In addition, it is appreciated by one of ordinary skill in the art that e.g., each program is a vector containing binary numbers and thus each vector reads on “embedding for a program”.)

a subsystem configured to, for each neural network output:
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY).”) 

(Note: Hereinafter, if a limitation has brackets (i.e. [ ]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

determine, from the neural network output, whether or not to [end a currently invoked program and to return to a calling program]; 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … the term 
    PNG
    media_image1.png
    48
    417
    media_image1.png
    Greyscale
 constitutes the values of the programs, that are input signals sent from PFC weighted by the sparse C’’mj matrix. We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer.”; Note that Donnarumma teaches determine, from the neural network output, whether or not to select R or L.)

determine, from the neural network output, a [next program to be called]; and 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer. … The bottom (interpreter) layer of 32 neurons acts as an interpreter of sequences and uses the inputs I provided by the programmer layer as well as the sensory input (e.g., 1AABCCXA) to output one of two motor commands {R, L}, represented here as two output neurons.”; Note that Donnarumma teaches determine, from the neural network output, a next command, R or L.)

determine, from the neural network output, contents [of arguments to the next program to be called].
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 3] as cited above; Note that Donnarumma teaches determine, from the neural network output, a next command, R or L.)

(Note: Hereinafter, if a limitation has one or more underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

In the alternative, Rossum can also be interpreted to teach the following limitation:
Rossum teaches 
a memory storing, for each program in the set of programs, a key identifying the program and an embedding for the program, wherein the embedding for a program is a collection of numeric values that represents the program; 
(Rossum, [sec 2.3.6] “The best is to use a dictionary that maps strings to functions.”; [sec 3.14] “The details of Python memory management depend on the implementation.”; e.g., each string may read on the “key identifying the program” and each function may read on the “embedding for the program” since it is appreciated by one of ordinary skill in the art that each program function is a vector containing binary numbers and thus each vector may read on “embedding”.).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma with the pairs of key and program of Rossum. 
Doing so would lead to providing an efficient technique for calling functions based on keys by mapping keys to functions.
(Rossum, [sec 2.3.6] “There are various techniques. • The best is to use a dictionary that maps strings to functions. The primary advantage of this technique is that the strings do not need to match the names of the functions. This is also the primary technique used to emulate a case construct:”).

However, Donnarumma and Rossum do not teach
determine, from the neural network output, whether or not to [end a currently invoked program and to return to a calling program]; 
determine, from the neural network output, a [next program to be called]; and 
determine, from the neural network output, contents [of arguments to the next program to be called].

Dietterich teaches 
determine, from the neural network output, whether or not to end a currently invoked program and to return to a calling program; 
(Dietterich, [table 1] “If any subtask on Kt is terminated in st+1”; [sec 3.2, pp. 239-240] “When a subroutine is invoked, its name and actual parameters are pushed onto the stack. When a subroutine terminates, its name and actual parameters are popped off the stack. Notice (line 16) that if any subroutine on the stack terminates, then all subroutines below it are immediately aborted, and control returns to the subroutine that had invoked the terminated subroutine.”).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma and Rossum with the subroutine termination and the control return to a calling program of Dietterich. 
Doing so would lead to performing a task in a given environment in a fast and compact manner by decomposing the task into a set of sub-tasks. 
 (Dietterich, [sec 1] “The decomposition into subproblems has many advantages. First, policies learned in subproblems can be shared (reused) for multiple parent tasks. Second, the value functions learned in subproblems can be shared, so when the subproblem is reused in a new task, learning of the overall value function for the new task is accelerated. Third, if state abstractions can be applied, then the overall value function can be represented compactly as the sum of separate terms that each depends on only a subset of the state variables. This more compact representation of the value function will require less data to learn, and hence, learning will be faster.”)

However, Donnarumma, Rossum and Dietterich do not teach
determine, from the neural network output, a [next program to be called]; and 
determine, from the neural network output, contents [of arguments to the next program to be called].


determine, from the neural network output, a next program to be called; and 
(Das, [fig 1] [sec “Stack Control”] “PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; e.g., “push” and “pop” may read on the “next program to be called” and they are determined based on Sa which is a neural network output.)

determine, from the neural network output, contents of arguments to the next program to be called.
(Das, [tables 1-3]; [sec “Stack Control”] “Therefore, for the stack shown in Table 1, Sa = 0.6 and the current input is c, then, after the operation, the stack would appear as shown in Table 2”; e.g., “Sa” may read on the “contents of arguments to the next program to be called”.);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma, Rossum and Dietterich with the next programs of Das. 
Doing so would lead to showing an enhancement of the network's learning capabilities and the increased degree of freedom in a higher order networks that improves generalization. 
 (Das, [sec Abstract] “We further show an enhancement of the network's learning capabilities by providing hints. In addition, an initial comparative study of simulations with first, second and third order recurrent networks has shown that the increased degree of freedom in a higher order networks improve generalization but not necessarily learning speed.”)

Regarding claim 20
Donnarumma, Rossum, Dietterich and Das teach claim 19.

the subsystem is further configured to, for each neural network output: (see the rejections of claim 19)

Donnarumma further teaches 
receive a representation of a current state of the environment; and 
(Donnarumma, [figs 1-3] “Task Input”, “Sensorial Input”, “programmer” and “interpreter” [fig 4] “Input Sequences” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer.”; e.g., “input” may read on “current state of the environment”.)

[generate] a next neural network input [from an embedding for the next program to be called] and the representation of the current state of the environment.
(Donnarumma, [figs 1-3] “Task Input”, “Sensorial Input”, “programmer” and “interpreter” [fig 4] “Input Sequences” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer.”; e.g., “input” may read on “current state of the environment”.)

Das further teaches 
generate a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment.
(Das, [fig 1]; [sec “Stack Control”] “PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols”; e.g., “push” and “pop” may read on the “embedding for the next program”, and a next neural network input is generated from them. It is appreciated by one of ordinary skill in the art that each program is a vector containing binary numbers, so each vector may read on “embedding”. In addition, “external inputs to the system” may read on the “current state of the environment” as well.).

Donnarumma, Rossum, Dietterich and Das are combinable with Das for the same rationale as set forth above with respect to claim 1.

Regarding claim 22


generating the next neural network input comprises: (see the rejections of claim 20)

Das further teaches 
extracting a fixed-length state encoding from the representation of the current state of the environment using a domain-specific encoder; and 
(Das, [sec “Neural Network Pushdown Automaton (NNPDA)”] “These external inputs consist of sequences of characters of strings fed in one character at a time.”; [sec “Training of the NNPDA”] “We use a localized representation for Input and Read symbols (thus, a symbol is uniquely represented by a vector which has only one 1 and all other elements 0).”; “a symbol is uniquely represented by a vector which has only one 1 and all other elements 0” may read on the “domain-specific encoder”. In addition, “sequences of characters of strings” and “localized representation for Input” may read on “fixed-length state encoding from the representation of the current state of the environment”.)

combining the fixed-length state encoding and the embedding for the next program to be called to generate the next neural network input.
(Das, [fig 1]; [sec “Stack Control”] “PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols”; e.g., “push” and “pop” may read on the “next program to be called”. In addition, it is appreciated by one of ordinary skill in the art that each program is a vector containing binary numbers, so each vector may read on “embedding”. Furthermore, fig 1 showing that “Input(t-1)” and “Top-of-Stack(t-1)” are used together as input to NNPDA may read on “combining the fixed-length state encoding and the embedding for the next program to be called to generate the next neural network input”.)

Donnarumma, Rossum, Dietterich and Das are combinable with Das for the same rationale as set forth above with respect to claim 1.

Regarding claim 24
Donnarumma, Rossum, Dietterich and Das teach claim 19.

the subsystem and the core recurrent neural network have been trained using execution traces as training data.
(Das, [sec “Neural Network Pushdown Automaton (NNPDA)”] “These external inputs consist of sequences of characters of strings fed in one character at a time.”; [sec “Training of the NNPDA”] “We use a localized representation for Input and Read symbols”; e.g., “sequences of characters” may read on the “execution traces”.).

Donnarumma, Rossum, Dietterich and Das are combinable with Das for the same rationale as set forth above with respect to claim 1.

Regarding claim 25
Claim 25 is a computer storage media claim corresponding to the system claim 19, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given 

Regarding claim 26
Donnarumma, Rossum, Dietterich and Das teach claim 25.
Claim 26 is a computer storage media claim corresponding to the system claim 20, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 20.

Regarding claim 28
Donnarumma, Rossum, Dietterich and Das teach claim 26.
Claim 28 is a computer storage media claim corresponding to the system claim 22, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 22.

Regarding claim 30
Donnarumma, Rossum, Dietterich and Das teach claim 25.
Claim 30 is a computer storage media claim corresponding to the system claim 24, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 24.

Claims 23, 29 are rejected under 35 U.S.C. 103 as being unpatentable over Donnarumma et al. (A Programmer-Interpreter neural network architecture for prefrontal cognitive control) in view of Rossum et al. (Python Frequently Asked Questions) further in view of Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) further in view of Das et al. (Learning Context-free Grammars: Capabilities and Graves et al. (Neural Turing Machines)

Regarding claim 23
Donnarumma, Rossum, Dietterich and Das teach claim 19.

determining the next program to be called comprises: (see the rejections of claim 19)

Donnarumma, Rossum, Dietterich and Das do not teach
determining, from the neural network output, a program key; and 
selecting a program from the set of programs having a key that is most similar to the program key.

Graves teaches 
determining, from the neural network output, a program key; and 
(Graves, [fig 1]; [sec 3, line 1] “A Neural Turing Machine (NTM) architecture contains two basic components: a neural network controller and a memory bank.”; [sec 3.3.1, line 1] “For content-addressing, each head (whether employed for reading or writing) first produces a length M key vector kt that is compared to each vector Mt(i) by a similarity measure K[·, ·] .”; [sec 4.1, para 5, line 1] “The preceding analysis suggests that NTM, unlike LSTM, has learned some form of copy algorithm. To determine what this algorithm is, we examined the interaction between the controller and the memory (Figure 6). We believe that the sequence of operations performed by the network can be summarised by the following pseudocode:”; e.g., the key vector may read on the “program key” since NTM is learning a form of copy algorithm based on it.)


(Graves, [sec 3.3.1, line 1] “For content-addressing, each head (whether employed for reading or writing) first produces a length M key vector kt that is compared to each vector Mt(i) by a similarity measure K[·, ·] .”; [sec 3.1, line 5] “The length M read vector rt returned by the head is defined as a convex combination of the row-vectors Mt(i) in memory: Equation (2)”; [sec 4.1, para 5, line 1] “The preceding analysis suggests that NTM, unlike LSTM, has learned some form of copy algorithm. To determine what this algorithm is, we examined the interaction between the controller and the memory (Figure 6). We believe that the sequence of operations performed by the network can be summarised by the following pseudocode:”; e.g., the “length M read vector” may read on “selecting a program”, and the operations may read on “the set of programs”. In addition, the similarity measure may read on the “key that is most similar to the program key”.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma, Rossum, Dietterich and Das with the program selection based on a key of Graves. 
Doing so would lead to be capable of learning simple algorithms from example data and of using the algorithms to generalise well outside its training regime.
(Graves, [sec 5] “Our experiments demonstrate that it is capable of learning simple algorithms from example data and of using these algorithms to generalise well outside its training regime.”)

Regarding claim 29
Donnarumma, Rossum, Dietterich and Das teach claim 25.
.

Claims 31-32, 34, 36 are rejected under 35 U.S.C. 103 as being unpatentable over Donnarumma et al. (A Programmer-Interpreter neural network architecture for prefrontal cognitive control) in view of Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) further in view of Das et al. (Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory)

Regarding claim 31
Donnarumma teaches 
A method of invoking one or more programs selected from a set of programs to cause an environment to transition into a different state, the method comprising: 

processing a neural network input using a core neural network to generate a neural network output; 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY).”; e.g., “programmer” with CTRNN may read on “core neural network”. In addition, e.g., “program” along with “feeds it to the bottom (pre-)motor layer” may read on “output”.)

determining, from the neural network output, whether or not to [end a currently invoked program and to return to a calling program from the set of programs]; 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … the term 
    PNG
    media_image1.png
    48
    417
    media_image1.png
    Greyscale
 constitutes the values of the programs, that are input signals sent from PFC weighted by the sparse C’’mj matrix. We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer.”; Note that Donnarumma teaches determine, from the neural network output, whether or not to select R or L.)

determining, from the neural network output, a [next program to be called]; and 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer. … The bottom (interpreter) layer of 32 neurons acts as an interpreter of sequences and uses the inputs I provided by the programmer layer as well as the sensory input (e.g., 1AABCCXA) to output one of two motor commands {R, L}, represented here as two output neurons.”; Note that Donnarumma teaches determine, from the neural network output, a next command, R or L.)

determining, from the neural network output, contents [of arguments to the next program to be called].
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 3] as cited above; Note that Donnarumma teaches determine, from the neural network output, a next command, R or L.)

However, Donnarumma do not teach
determining, from the neural network output, whether or not to [end a currently invoked program and to return to a calling program from the set of programs]; 
determining, from the neural network output, a [next program to be called]; and 


Dietterich teaches 
determining, from the neural network output, whether or not to end a currently invoked program and to return to a calling program from the set of programs; 
(Dietterich, [table 1] “If any subtask on Kt is terminated in st+1”; [sec 3.2, pp. 239-240] “When a subroutine is invoked, its name and actual parameters are pushed onto the stack. When a subroutine terminates, its name and actual parameters are popped off the stack. Notice (line 16) that if any subroutine on the stack terminates, then all subroutines below it are immediately aborted, and control returns to the subroutine that had invoked the terminated subroutine.”).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma with the subroutine termination and the control return to a calling program of Dietterich. 
Doing so would lead to performing a task in a given environment in a fast and compact manner by decomposing the task into a set of sub-tasks. 
 (Dietterich, [sec 1] “The decomposition into subproblems has many advantages. First, policies learned in subproblems can be shared (reused) for multiple parent tasks. Second, the value functions learned in subproblems can be shared, so when the subproblem is reused in a new task, learning of the overall value function for the new task is accelerated. Third, if state abstractions can be applied, then the overall value function can be represented compactly as the sum of separate terms that each depends on only a subset of the state variables. This more compact representation of the value function will require less data to learn, and hence, learning will be faster.”)

However, Donnarumma and Dietterich do not teach
determining, from the neural network output, a [next program to be called]; and 
determining, from the neural network output, contents [of arguments to the next program to be called].

Das teaches
determining, from the neural network output, a next program to be called; and 
(Das, [fig 1] [sec “Stack Control”] “PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; e.g., “push” and “pop” may read on the “next program to be called” and they are determined based on Sa which is a neural network output.)

determining, from the neural network output, contents of arguments to the next program to be called.
(Das, [tables 1-3]; [sec “Stack Control”] “Therefore, for the stack shown in Table 1, Sa = 0.6 and the current input is c, then, after the operation, the stack would appear as shown in Table 2”; e.g., “Sa” may read on the “contents of arguments to the next program to be called”.);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma and Dietterich with the next programs of Das. 
Doing so would lead to showing an enhancement of the network's learning capabilities and the increased degree of freedom in a higher order networks that improves generalization. 
 (Das, [sec Abstract] “We further show an enhancement of the network's learning capabilities by providing hints. In addition, an initial comparative study of simulations with first, second and third order recurrent networks has shown that the increased degree of freedom in a higher order networks improve generalization but not necessarily learning speed.”)

Regarding claim 32
Donnarumma, Dietterich and Das teaches claim 31.

Claim 32 is a method claim corresponding to the system claim 20, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 20. 

Donnarumma further teaches 
the embedding for a program is a collection of numeric values that represents the program.
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … the term 
    PNG
    media_image1.png
    48
    417
    media_image1.png
    Greyscale
 constitutes the values of the programs, that are input signals sent from PFC weighted by the sparse C’’mj matrix. We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “The PNN implementation includes a hierarchy structured in a top layer and a bottom layer. The top (programmer) layer of 14 neurons is meant to detect and keep in memory the four programs (I = {I1AX, I2BY, I3AX, I4BY}) and its implementations refer to Equations (2).”; It is appreciated by one of ordinary skill in the art that e.g., each program is a vector containing binary numbers and thus each vector reads on “embedding for a program”.)

Regarding claim 34
Donnarumma, Dietterich and Das teach claim 32.
Claim 34 is a method claim corresponding to the system claim 22, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 22.

Regarding claim 36
Donnarumma, Dietterich and Das teach claim 31.
Claim 36 is a method claim corresponding to the system claim 24, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 24.

Claim 35 is rejected under 35 U.S.C. 103 as being unpatentable over Donnarumma et al. (A Programmer-Interpreter neural network architecture for prefrontal cognitive control) in view of Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) further in view of Das et al. (Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory) further in view of Graves et al. (Neural Turing Machines)

Regarding claim 35

Claim 35 is a method claim corresponding to the system claim 23, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 23.

Claim 37 isrejected under 35 U.S.C. 103 as being unpatentable over Donnarumma et al. (A Programmer-Interpreter neural network architecture for prefrontal cognitive control) in view of Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) further in view of Das et al. (Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory) further in view of Deoras et al. (US 2015/0066496 A1) further in view of Canoy et al. (US 2017/0045894 A1)

Regarding claim 37
Donnarumma, Rossum, Dietterich and Das teach claim 19.

Donnarumma further teaches 
the neural network output comprises a first portion that is a [probability with which the currently invoked program should be ended], and 
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY).”; Note that Donnarumma teaches neural network output which has different portions with different programs.)

the subsystem determines whether or not to [end the currently invoked program based on the probability].
(Donnarumma, [figs 1-2, 4] “input”, “programmer” and “interpreter” [sec 2, pp. 4-5] “Consequently, we implemented the PFC area as a standard CTRNN network, see Equation (2). … This is achieved in the PNN framework, implementing the (pre-)motor area as an interpreter of CTRNNs, by the Equations expressed in (3). … the term 
    PNG
    media_image1.png
    48
    417
    media_image1.png
    Greyscale
 constitutes the values of the programs, that are input signals sent from PFC weighted by the sparse C’’mj matrix. We stress that in our modelization, all the connections of the (pre-)motor areas are fixed connections and thus, the dynamic behaviors that the (pre- )motor areas exhibit are due only to the change of its input, i.e. sensorial data (xlSensor) and programs from PFC (yjPFC).” [sec 3] “Figure 4 shows a depiction of the PNN architecture that solves the task: here the top (PFC) layer sets and maintains the program to be used (1AX, 2BY, 3AX, or 4BY) based on the current context (1, 2, 3 or 4) and feeds it to the bottom (pre-)motor layer, which then executes it to select the right motor response (R or L) based on the current stimulus (AX or BY). Note that both layers receive the input stream, but the top layer uses it to maintain or switch a program in working memory while the bottom layer uses it to select the motor response based on the current program received by the top layer.”; Note that Donnarumma teaches determine, from the neural network output, whether or not to select R or L.)

Dietterich further teaches 
the neural network output comprises a first portion that is a [probability] with which the currently invoked program should be ended, and 
(Dietterich, [table 1] “If any subtask on Kt is terminated in st+1”; [sec 3.2, pp. 239-240] “When a subroutine is invoked, its name and actual parameters are pushed onto the stack. When a subroutine terminates, its name and actual parameters are popped off the stack. Notice (line 16) that if any subroutine on the stack terminates, then all subroutines below it are immediately aborted, and control returns to the subroutine that had invoked the terminated subroutine.”; Note that Donnarumma teaches “the neural network output comprises a first portion that is a [probability with which the currently invoked program should be ended],”.).

the subsystem determines whether or not to end the currently invoked program based on the [probability].
(Dietterich, [table 1] “If any subtask on Kt is terminated in st+1”; [sec 3.2, pp. 239-240] as cited above; Note that Donnarumma teaches “the subsystem determines whether or not to [end the currently invoked program based on the probability]”.).

Donnarumma, Rossum, Dietterich and Das are combinable with Dietterich for the same rationale as set forth above with respect to claim 1.

However, Donnarumma, Rossum, Dietterich and Das do not teach
the neural network output comprises a first portion that is a [probability] with which the currently invoked program should be ended, and 


Deoras teaches
the neural network output comprises a first portion that is a probability with which the currently invoked program should be ended, and 
(Deoras, [figs 1-3]; [pars 28-35] “The labeler component 124 may further comprise an output component 130 that receives output (e.g., probability distribution over assignable labels) from the at least one of the DNN 126 or the RNN 128, and assigns at least one semantic label to at least one word in the sequence of words based upon such output. For instance, in the exemplary sentence set forth above, the output of the at least one of the DNN 126 or the RNN 128 can indicate that the word “Cleveland” has a relatively high probability of being associated with a semantic tag “departure city.” The output component 130 can cause such semantic tag to be assigned to the word “Cleveland” based upon the output of the at least one of the DNN 126 or the RNN 128.”; Note that Dietterich teaches “[the neural network output comprises a first portion that is a probability] with which the currently invoked program should be ended”.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma, Rossum, Dietterich, Das with the next programs of Deoras. 
Doing so would lead to effectively interpreting and understanding a context by analyzing incoming information regarding the current situation.
 (Deoras, [pars 4-7] “a sequence of words set forth in natural language can be received, as well as (optionally) an indication of a domain corresponding to the sequence of words (e.g., airline travel) and (optionally) an intent corresponding to the sequence of tokens (e.g., purchase a ticket). Responsive to receiving the sequence of words, a respective plurality of features can be ascertained for each word in the sequence of words.”)

However, Donnarumma, Rossum, Dietterich, Das and Deoras do not teach
the subsystem determines whether or not to end the currently invoked program based on the [probability].

Canoy teaches
the subsystem determines whether or not to end the currently invoked program based on the probability.
(Canoy, [pars 108-117] “In block 702, the processor of the UAV may halt performance of the flight plan in response to identifying an exception condition based on the continuous real-time sensor data. For example, when determining that continuing to move toward the target landing bay in the same manner as defined by the flight plan could cause a collision (e.g., a calculated probability of colliding with other UAVs or other objects is above a safety threshold, etc.), the UAV may stop executing the instructions of the flight plan at least until the exception is no longer present”; Note that Donnarumma teaches “the subsystem determines whether or not to [end the currently invoked program based on the probability]”.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Donnarumma, Rossum, Dietterich, Das, Deoras with the next programs of Canoy. 
Doing so would lead to providing a safe, efficient and precise system control given an environment.
 (Canoy, [pars 28-38] “Various embodiments provide methods, UAVs, systems, and non-transitory process-readable storage media for safely and efficiently controlling autonomous landings of UAVs, particularly in locations having a plurality of landing bays and populated with a plurality of other UAVs flying independently. In general, a UAV may be configured with sensor data-guided autonomous landing procedures that enable high-precision positioning and orienting within a multi-bay landing zone occupied by a plurality of concurrently active UAVs”)

Claim 38 isrejected under 35 U.S.C. 103 as being unpatentable over Donnarumma et al. (A Programmer-Interpreter neural network architecture for prefrontal cognitive control) in view of Dietterich et al. (Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition) further in view of Das et al. (Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory) further in view of Deoras et al. (US 2015/0066496 A1)

Regarding claim 38
Donnarumma, Rossum, Dietterich and Das teach claim 19.

for each neural network input, the core recurrent neural network is further configured to perform operations comprising: (see the rejections of claim 19)

Dietterich further teaches 
[applying a first function to the updated hidden state to generate a probability] that the currently invoked program should be ended; 
(Dietterich, [table 1] “If any subtask on Kt is terminated in st+1”; [sec 3.2, pp. 239-240] “When a subroutine is invoked, its name and actual parameters are pushed onto the stack. When a subroutine terminates, its name and actual parameters are popped off the stack. Notice (line 16) that if any subroutine on the stack terminates, then all subroutines below it are immediately aborted, and control returns to the subroutine that had invoked the terminated subroutine.”).

Donnarumma, Rossum, Dietterich and Das are combinable with Dietterich for the same rationale as set forth above with respect to claim 1.

Das further teaches 
processing the neural network input to update a current hidden state of the core neural network; and 
(Das, [fig 1] multiple layers; [sec “Stack Control”] “PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The network consists of a set of fully recurrent neurons, called State Neurons which represent the states and permit classification and training of the NNPDA. … The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols … The end symbol facilitates computation by effectively constructing an extra hidden layer”; e.g., “hidden layer” may read on “hidden state”.).

applying functions to the updated hidden state to generate the respective neural network output, the applying comprising: 
(Das, [fig 1] “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” [sec “Stack Control”] “PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The network consists of a set of fully recurrent neurons, called State Neurons which represent the states and permit classification and training of the NNPDA. … The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols … The end symbol facilitates computation by effectively constructing an extra hidden layer”; e.g., “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” may read on “functions”.).

applying a first function to the updated hidden state to generate a [probability] that the currently invoked program should be ended; 
(Das, [fig 1] “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” [sec “Stack Control”] “Operations on the stack are determined by the activation of Action Neuron, Sa. The value of Sa is allowed to vary between +1 and 1. The operations will be described as follows: PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The network consists of a set of fully recurrent neurons, called State Neurons which represent the states and permit classification and training of the NNPDA. … The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols … The end symbol facilitates computation by effectively constructing an extra hidden layer”; e.g., “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” may read on “function”.)

applying a second function to the updated hidden state to generate a key that identifies the next program to be invoked; and 
(Das, [fig 1] “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” [sec “Stack Control”] “Operations on the stack are determined by the activation of Action Neuron, Sa. The value of Sa is allowed to vary between +1 and 1. The operations will be described as follows: PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The network consists of a set of fully recurrent neurons, called State Neurons which represent the states and permit classification and training of the NNPDA. … The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols … The end symbol facilitates computation by effectively constructing an extra hidden layer”; e.g., “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” may read on “function”. In addition, e.g., the value of “Sa” may read on “key” since the value is used for identifying a next program to be invoked.)


(Das, [fig 1] “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” [sec “Stack Control”] “Operations on the stack are determined by the activation of Action Neuron, Sa. The value of Sa is allowed to vary between +1 and 1. The operations will be described as follows: PUSH: If the activation of Action Neuron, Sa is significantly positive the action taken is push. In our simulations we performed push when the magnitude of Sa > 0.1 … Therefore, for the stack shown in Table 1, Sa = 0.6 and the current input is c, then, after the operation, the stack would appear as shown in Table 2. POP: If activation of Action Neuron is sufficiently negative, the action taken is pop”; [sec “Neural Network Pushdown Automaton (NNPDA)”] “The network consists of a set of fully recurrent neurons, called State Neurons which represent the states and permit classification and training of the NNPDA. … The Input Neurons register external inputs to the system”; [sec “Training of the NNPDA”] “I is the activation of the Input Neurons and R is the activation of the Read Neuron and W is the weight matrix of the network. We use a localized representation for Input and Read symbols … The end symbol facilitates computation by effectively constructing an extra hidden layer”; e.g., “Each weight relates the product of Input(t-1), State(t-1) and Top-of-Stack information to the State(t)” may read on “function”. In addition, e.g., the values of “Sa” may read on “arguments for the next program” since the values are used along with push/pop for operating the stack.)

Donnarumma, Rossum, Dietterich and Das are combinable with Das for the same rationale as set forth above with respect to claim 1.

However, Donnarumma, Rossum, Dietterich, Das do not teach


	Deoras teaches
applying a first function to the updated hidden state to generate a probability that the currently invoked program should be ended; 
(Deoras, [figs 1-3]; [pars 28-35] “The labeler component 124 may further comprise an output component 130 that receives output (e.g., probability distribution over assignable labels) from the at least one of the DNN 126 or the RNN 128, and assigns at least one semantic label to at least one word in the sequence of words based upon Such output. For instance, in the exemplary sentence set forth above, the output of the at least one of the DNN 126 or the RNN 128 can indicate that the word “Cleveland” has a relatively high probability of being associated with a semantic tag “departure city.” The output component 130 can cause such semantic tag to be assigned to the word “Cleveland” based upon the output of the at least one of the DNN 126 or the RNN 128.”; Note that Dietterich teaches “[applying a first function to the updated hidden state to generate a probability] that the currently invoked program should be ended”.)

Donnarumma, Rossum, Dietterich, Das are combinable with Deoras for the same rationale as set forth above with respect to claim 37.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Donnarumma
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129