Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This is in response to Applicant’s application 16/946,586 filed on 06/29/2020.
Claims 1 - 18 are currently pending for consideration.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/29/2020 was filed before the mailing date of the non-final office action. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
During patent examination, pending claims must be “given their broadest reasonable interpretation consistent with the specification.” MPEP 2111; See also, MPEP 2173.02. Limitations appearing in the specification but not recited in the claim are not read into the claim. In re Prater, 415 F.2d 1393, 1404-05, 162 USPQ 541, 550-551 (CCPA 1969). See also, In re Zletz, 893 F.2d 319, 321-22, 13 USPQ2d 1320, 1322 (Fed. Cir. 1989) (“During patent examination the pending claims must be interpreted as broadly as their terms reasonably allow’). The reason is simply that during patent prosecution when claims can be amended, ambiguities should be recognized, scope and breadth of language explored, and clarification imposed. An essential purpose of patent examination is to fashion claims that are precise, clear, correct, and unambiguous. Only in this way can uncertainties of claim scope be removed, as much as possible, during the administrative process.

Examiner Note
The Examiner cites particular columns, line numbers and/or paragraph numbers
in the references as applied to the claims below for the convenience of the Applicant(s). Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Medalion et al. (US 20210272559 A1, “Medalion”) in view of Mnih et al. (US 9679258 B2, “Mnih”).
As to the claim 1, Medalion discloses A computer system comprising: 
a processing unit operatively coupled to memory; (Medalion: [0144] Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data are communicated and/or accessed by multiple processes).
an artificial intelligence (AI) platform operatively coupled to the processing unit, the AI platform configured with one or more tools to support random action replay for natural language (NL) learning (Medalion: [0078] Generating the action vector is performed by providing natural language (NL) action statements as input to a second deep learning machine learning (i.e. AI) model. The natural language action statements are related to the natural language issue statements in that each of the natural language action statements correlate to at least one call (i.e. random action replay) from which the natural language issue statements were derived), the one or more tools comprising:
a training manager to train a neural network (Medalion: [0124] training of the deep learning neural network issue machine learning model (904) and the deep learning neuiral network action machine learning model), the training further comprising the training manager to:
explore a NL conversation, the exploration to leverage one or more tuples associated with the NL conversation, each tuple representing an input action, a vector, an output action, and a reward value; (Medalion: [0025-27, 0122] training machine learning models (MLMs ) for a natural language data center conversations… a data structure with a tuple of one or more data entries including a vector… an action vector composed of numbers output action by an action machine learning model or a trained action machine learning model, which took as input action of the natural language action statements… a scalar value (i.e. reward value) of the match for convergence).
The examiner notes that the term “a reward value” in [0041] of the specification recites “the reward value is scalar and identifies if the second output action matches the input term. For example, a reward value of 0 is indicative that the second output action does not match the input action, while a reward value of 1 is indicative that the second output action matches the input action”. Medalion discloses a scalar value of the match for convergence is a “reward value”.
sample a first action from the vector; (Medalion: [0027] the action vector (120) is an embedded representation of the natural language action statements).
However, Medalion may not explicitly disclose all the aspects assess the sampled first action and calculate a first gradient representing a distance of the sampled first action from the vector; and
apply the first gradient to selectively adjust the neural network;
a language manager operatively coupled to the training manager, the language manager to receive and apply NL input to the selectively adjusted neural network, and identify an output corresponding to the received NL input; and
the language manager to execute an identified action corresponding to the identified output.
Mnih discloses assess the sampled first action and calculate a first gradient representing a distance of the sampled first action from the vector; and (Mnih: [col 4 ln 57-67, col 5 ln 1-5] train on the modulus difference between the target generated from the first neural network and the action-value parameter output from the second neural network, adjusting the weights by (stochastic) gradient descent based on the calculated adjustments for faster convergence… to calculate a gradient for updating the weights… performed once with each action).
Mnih discloses apply the first gradient to selectively adjust the neural network; (Mnih: [col 4 ln 57-67] adjusting the weights are varied based on the history of calculated adjustments for faster convergence using the RMS-Prop procedure… the second neural network is trained by incrementally updating (i.e. adjust) its weights).
Mnih discloses a language manager operatively coupled to the training manager, the language manager to receive and apply NL input to the selectively adjusted neural network, and identify an output corresponding to the received NL input; and (Mnih: [col 7 ln 33-67] a control system employing natural language comprising: a data input (i.e. NL input) to receive sensor data; a data output to provide action control data; and a deep neural network having an input layer coupled to data input and an output layer; and an action selector… wherein the input layer defines a sensor data field in one or more dimensions, wherein the output layer defines a value for an action-value function associated with each of a plurality of possible actions for control system to control; and an action selector to select an action responsive to action-value function and to provide corresponding action to data output).
Mnih discloses the language manager to execute an identified action corresponding to the identified output. (Mnih: [col 7 ln 60-67] the output layer defines a value for an action-value function associated with each of a plurality of possible actions for control system to control; and an action selector to select an action responsive to action-value function and to provide corresponding action to data output). 
Thus, one of ordinary skill in the art Medalion before the effective filing date of the claimed invention would have recognized that with both Medalion and Mnih disclosing training machine learning model to generate output with natural language issue statement which are analogous art from the “same field of endeavor”, and, when Mnih's training data is generated by operating with a succession of actions identified using action-value function and to provide corresponding action to data output was combined with Medalion's generating action matrix with issue/action vector with scalar values, the claimed limitation on the assess the sampled first action and calculate a first gradient representing a distance of the sampled first action from the vector; and
apply the first gradient to selectively adjust the neural network;
a language manager operatively coupled to the training manager, the language manager to receive and apply NL input to the selectively adjusted neural network, and identify an output corresponding to the received NL input; and
the language manager to execute an identified action corresponding to the identified output would be obvious. The motivation to combine Medalion and Mnih is to provide a system facilitates scaling to very large data sets because the computation involved in training the second neural network is reduced and effectively continuously updated stochastic gradient with a low computational cost per iteration. (See Mnih [col 3 ln 6-11]).

As to the claim 2, Medalion in view of Mnih discloses The computer system of claim 1, further comprising an interaction manager operatively coupled to the training manager, the interaction manager to create the one or more tuples in an interactive environment with corresponding first and second agents, the interactive environment to identify one or more actions from a distribution of actions as a response to receipt of the input action. (Mnih: [col 64-67, col 9 ln 24-33, ln 65-67 ] an agent interacts with an environment E (i.e. interactive environment), the Atari emulator, in a sequence of actions, observations and rewards (i.e. tuples). The action is passed to the emulator and modifies its internal state and the game score. The environment E may be stochastic… The goal of the each agent is to interact with the emulator in the interactive environment by selecting actions in a way that maximizes future rewards using optimal action-value function Q*(s, a) as the maximum expected return achievable with a policy mapping sequences to actions … identifying the actions for iteration to receipt of the input action with a probability distribution over sequences s and actions a that we refer to as the behavior distribution).
As to the claim 3, Medalion in view of Mnih discloses The computer system of claim 1, further comprising the training manager to re-train the neural network and incorporate a second sampled action from the vector, calculate a second gradient representing a distance of the sampled second action from the vector, and apply the second gradient to selectively adjust the neural network. (Minh: [col 4 ln 57-67] provided a reinforcement learning (i.e. re-train) to the second neural network be trained on the modulus difference between the target actions (i.e. second sampled actions) generated from the first neural network and the action-value parameter output from the second neural network, adjusting the weights of the second neural network by  calculating (stochastic) gradient (i.e. second gradient) descent… adjust the second neural network using RMS-Prop procedure with minibatch in size).
As to the claim 4, Medalion in view of Mnih discloses The computer system of claim 3, further comprising the training manager to assess the first and second gradients, and responsive to identification of a convergence of the first and second gradients terminate training of the neural network. (Minh: [col 1 ln 62-66, col 3 ln 7-28] model-free reinforcement learning algorithms such as Q-learning with non-linear function approximators such as a neural network could cause the Q-network to diverge with better convergence guarantees by stochastic gradient descent … training second neural network is effectively continuously updated with a low computational cost per iteration by employing a stochastic gradient update with first and second gradients… training directly on visual images and/or sound, and thus the reinforcement learning is applied ‘end to end’, from this input to the output actions).
As to the claim 5, Medalion in view of Mnih discloses The computer system of claim 1, further comprising the training manager to utilize a random choice function to select the first action from the vector for sampling. (Mnih: [col 7 ln 41-49]  storing a set of weights of the neural network to create two versions of the neural network, one time-shifted with respect to the other, wherein said determining of the values of the set of action-value functions (i.e. random choice function) for selecting the action from the vector is performed using a later version of neural network versions, and wherein the determining of the target action-value function is performed using an earlier version of the neural network versions).
As to the claim 6, Medalion in view of Mnih discloses The computer system of claim 1, wherein the vector represents actions in an operatively coupled knowledge base and proximity of each of the represented actions to the input action. (Minh: [col 2 ln 16-20, col 9 ln 65-67] a neural network is trained based on the stored experience (i.e. knowledge base), and when the experience is updated with a new (initial state – action vector- resulting state) triple the previous neural network and an entirely new neural network is trained on the updated experience… using a probability (i.e. proximity) distribution over sequences and actions that we refer to as the behavior distribution for iteration of the actions to the input action).
Regarding claims 7-12, and 13-18, these claims recite the computer program product and method performed by the system of claims 1-6, respectively; therefore, the same rationale of rejection is applicable.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENQ-KANG (Kang) CHU whose telephone number is (571)270-7396. The examiner can normally be reached M-F 8-6 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached on 571-270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JENQ-KANG CHU/Examiner, Art Unit 2176                                                                                                                                                                                                        

/ARIEL MERCADO/Primary Examiner, Art Unit 2176