DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“policy unit” and “policy parameter updating unit” in claim 1.
“policy unit” in claim 2.
“policy parameter updating unit” in claim 3
“input acceptance unit”, “dialog state updating unit” and “response candidate generation unit” in claim 4.
“unit” is a generic placeholder, the words preceding “unit” are functional, “configured to” is a transition phrase, and the words following “configured to” are functional.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per Claim 1:
	“the state of a dialog being performed with a user”/”the state of a dialog being performed with a user and a policy parameter” in lines 4-5 of claim 1 lack antecedent basis.  Dialogs commonly have states but “the state” does not have an “a state” to refer to.  At a minimum, it is not clear if Applicant meant to refer to an inherent “the state” characteristic of a dialog.
	“based on the state of a dialog being performed with the user and a policy parameter” in lines 4-5 of claim 1 is unclear because it is not clear if this phrase is supposed to refer to:
	1. “based on the state of a dialog being performed with the user and based on a policy parameter”
	Or
	2. “based on the state of a dialog”, where the dialog is “being performed with the user” and is performed with “a policy parameter”.
	“the response candidates” in lines 5-6 of claim 1 is ambiguous when “response candidates” in line 3 of claim 1 refers to a subset of “a set of response candidates” in line 4 of claim 1. “response candidates included in a set of response candidates” can refer to every response candidate in the set of response candidates, but can also refer to a subset of response candidates which are included in the set of response candidates.  When “response candidates included in a set of response candidates” refers to a subset of the response candidates in the set of response candidates, it is not clear which plural set of response candidates (the set of response candidates? Or the subset of the set of response candidates) is the one that “the response candidates in lines 5-6 of claim 1 is supposed to refer to. 
	“the state of the dialog” in lines 7-8 of claim 1 lacks antecedent basis (same issue as discussed above pertaining to “the state of a dialog being performed with a user”/”the state of a dialog being performed with a user and a policy parameter” in lines 4-5 of claim 1)

	As per Claim 2:
	“the state of the dialog” in line 2 of claim 2 lacks antecedent basis (same issue as discussed above pertaining to “the state of a dialog being performed with a user”/”the state of a dialog being performed with a user and a policy parameter” in lines 4-5 of claim 1)
	“the structure of a logical expression” in lines 3-4 of claim 2 lacks antecedent basis.
	“based on the structure of a logical expression that each includes, and sets the score using the state of the dialog after encoding and the response candidates after encoding” in lines 3-5 of claim 2 is confusing claim language and it is not clear what this phrase is supposed to mean.  Is “that each includes” in line 4 of claim 2 supposed to refer to “vectors” or to “a logical expression”?  What does either the vectors or the logical expression “each include”?  Which of “the policy unit” or “a logical expression” or “vectors” “sets the score”?
	“the score” in line 4 of claim 2 is ambiguous (one score is set for each response candidate in the “response candidates included in a set of response candidates” in line 3 of claim 1)
	“the state of the dialog” in lines 4-5 of claim 2 lacks antecedent basis.
	“the response candidates” in line 5 of claim 2 is ambiguous if “response candidates included in a set of response candidates” in claim 1 refers to a subset of the “set of response candidates”.
	“the response candidates after encoding” at the end of claim 2 is unclear because it is not clear if this phrase refers to encoded response candidates or to where the original/unencoded response candidates are used to set a score “after encoding”.

	As per Claim 3:
	“the state of the dialog” in line 2 of claim 3 lacks antecedent basis.
	“the structure of a logical expression that the dialog includes” in line 3 of claim 3 lacks antecedent basis.
	“the state of the dialog” in line 4 of claim 3 lacks antecedent basis.
	“the state of the dialog after encoding” in line 4 of claim 3 is unclear because it is not clear if this phrase refers to the encoded “state of the dialog” or to where reinforcement learning processing is executed using the original/unencoded state of the dialog at a time that is “after encoding”.
	Due to the issue discussed in the previous paragraph, “and the obtained reward” in lines 4-5 of claim 3 can refer to where the obtained reward is also used to execute reinforcement learning processing (which is likely what Applicant meant to claim) or to something else that the use of the state of the dialog is “after” (in addition to being “after encoding”), and as claimed, it is not clear which interpretation Applicant meant to claim.

	As per Claim 4:
	“the state of a dialog being performed with the user” in lines 4-5 of claim 4 lacks antecedent basis.
	“the user” in line 5 of claim 4 is ambiguous (claim 1 recites “a user” in line 1 of claim 1 and claim 4 recites “a user” in line 3 of claim 4, and it is not clear which user “the user in line 5 of claim 4 is supposed to refer to when the two “a user” recitations refer to different users).
	“the response candidates” at the end of claim 4 is ambiguous if “response candidates included in a set of response candidates” in claim 1 refers to a subset of the “set of response candidates”.

	Claims 5-6 include the issues of claim 1.
	Additionally, as claimed “referring to the set scores, selecting one of the response candidates as a dialog act” in step (a) of claims 5-6 appears intended to be substantively the same as “referring to the set scores, to select one of the response candidates as a dialog act of the apparatus” in lines 4-5 of claim 1.  As claimed, however, “selecting one of the response candidates as a dialog act” in step (a) of claims 5-6 can be interpreted as a separate step relative to “referring to the set scores”, and so it is not clear if, in claims 5-6, “referring to the set scores” is supposed to be a separate step relative to “selecting one of the response candidates as a dialog act”, or if Applicant meant to claim something like “referring to the set scores in order to select one of the response candidates as a dialog act”.

	Claims 7-9 include the issues of claims 2-4, respectively.
	Claims 10-12 include the issues of claims 2-4, respectively.

The dependent claims include the issues of their respective parent claims.

Allowable Subject Matter
Claims 1 and 5-6 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
Claims 2-4 and 7-12 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

	As per Claim(s) 1 (and similarly claim[s] 5-6, and consequently claim[s] 2-4 and 7-12 which depend on claim[s] 1, 5, and 6), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) A dialog apparatus for responding to a dialog act of a user, the dialog apparatus comprising: a policy unit configured to set a score to each of response candidates included in a set of response candidates based on the state of a dialog being performed with the user and a policy parameter, and referring to the set scores, to select one of the response candidates as a dialog act of the apparatus; and a policy parameter updating unit configured to obtain a reward in the state of the dialog using a reward function that, as the reward, returns an evaluation of a behavior performed in a specific circumstance as a quantitatively represented numeric value, and to update the policy parameter based on the obtained reward (i.e. where each of a plurality of response candidates are scored based on dialog/conversation state and a policy parameter, and where the policy parameter is updated based on a numeric value reward which is obtained in the state of the dialog using a reward function)
JP 2012-038287 (X reference in Search Report) teaches “The score calculation unit 15 normally calculates the score before the dialog text output unit 17 outputs the dialog text (not necessarily immediately before). In addition, it is suitable for the score calculation part 15 to calculate a score whenever the user input information reception part 14 receives user input information. Here, the score calculation unit 15 is to calculate a score by, for example, an arithmetic expression “score = f (user state information, weight vector)”. For example, f is “score = user state information × weight vector”. That is, the score calculation unit 15 uses the evaluation information managed in association with the sentence pattern information and the user state information that dynamically changes in order to determine the sentence pattern information of the sentence to be output next by the dialogue apparatus 1. Are used to calculate a score for each information recommendation method” and “The reward calculation unit 273 selects a spot included in the user input information using the expected value of the degree of match calculated by the random selection match value calculation unit 271 and the match degree calculated by the selected spot match level calculation unit 272. To calculate the reward” and “The learning unit 28 uses the reward to update the weight vector corresponding to the method identifier of the dialog device 1 and the information recommendation method storage unit 12 of the dialog device 1. For example, when the reward is a positive number, the learning unit 28 updates the weight vector corresponding to the method identifier of the dialog device 1 so that the information recommendation method included in the dialog text information is more easily selected. This weight vector is a weight vector of the information recommendation method storage unit 12. Here, updating means that the learning unit 28 may directly rewrite the weight vector in the information recommendation method storage unit 12 or may instruct the dialog device 1 to update. When the interactive device 1 receives an update instruction, the interactive device 1 rewrites the weight vector. The method and degree by which the learning unit 28 updates the weight vector does not matter. Usually, the larger the reward is, the learning unit 28 updates the weight vector corresponding to the technique identifier of the dialogue apparatus 1 according to the magnitude of the reward so that the information recommendation technique included in the dialog sentence information is more easily selected. To do. For example, when the reward is a negative number, the learning unit 28 updates the weight vector corresponding to the technique identifier of the dialogue apparatus 1 so that the information recommendation technique included in the dialogue sentence information is more difficult to be selected. For example, the learning unit 28 updates the weight vector according to a later-described Natural Actor Critic (NAC) algorithm, which is one of natural policy gradient methods. NAC is described in “Otake Yatsuya, Masaru Sugiyama: How to make a strong robotic game player, Mainichi Communications (2008).” Since it is a well-known technique, detailed explanation is omitted. NAC is a method for optimizing policies and is one of natural policy gradient methods. In the policy gradient method, instead of directly estimating the value function for the state S or estimating the action value function Q (S, A), the reward of the dialogue episode obtained by the policy before the update is used. Update policy π directly by natural gradient method to increase” (see PE2E translation).  This reference appears to describe where a score is calculated based on a weight vector and user state information and where a reward is used to update the weight vector.  This reference describes “The user state information storage unit 13 is information indicating a user's state, a preference vector that is information indicating a user's preference with respect to one or more determinants, and a knowledge vector indicating user's knowledge with respect to one or more determinants The user status information including The user state information may include an attribute vector that is information indicating one or more attribute values of the user. User attribute values include, for example, sex (male or female), age group (10's, 20's, 30's, baby boom junior, etc.), occupation, hometown, supporting political party, and the like” (see PE2E translation) which appears to suggest where user state information is information about a user and not information about a conversation/dialog state.  It is also not entirely clear that one score is set for each of a plurality of response candidates based on the dialog state and the weight vector.  The scores are calculated for each of “information recommendation methods” that possess “sentence pattern information” (see Google Translation) but it is not clear that the information recommendation methods or sentence pattern information are candidate responses.
M. -H. Su, K. -Y. Huang, T. -H. Yang, K. -J. Lai and C. -H. Wu, "Dialog State Tracking and action selection using deep learning mechanism for interview coaching," 2016 International Conference on Asian Language Processing (IALP), 2016, pp. 6-9, doi: 10.1109/IALP.2016.7875922. teaches receiving a dialog state and reward, updating a policy during learning, and outputting an action according to a learnt policy (Section IV B., particularly first paragraph).  This reference also describes where reward is a numerical value (page 8, left column).  This reference does not appear to specifically describe using a dialog state and the policy to set scores for each of a plurality of response candidates and selecting one of the response candidates based on the set scores.
2018/0232436 (62/459820 filed February 16, 2017 supports Specification) teaches “When the dialog mixer 130 is called, it accepts the base dialog states provided in the input. When the triggering event is new input, the dialog mixer 130 determines if the user is triggering a new dialog. A new dialog corresponds to a new dialog manager, e.g., a new schema or a new search in a dialog schema. If the user is triggering a new dialog, the dialog mixer 130 fetches the corresponding schema and initializes the dialog manager for the schema. The dialog mixer 130 then distributes the output of the natural language parser, also referred to as an analyzer, to all dialog managers. When the triggering event is a backend response, the dialog mixer 130 loads the dialog manager that corresponds with the backend response and applies the backend response to the dialog managers that request them, respectively. The dialog mixer 130 may solicit the dialog managers for backend requests and new state tokens. Each dialog manager solicited generates some kind of response, even if it is an error or failure response. In some implementations, the dialog manager 130 may also issue a backend request. The dialog mixer 130 rolls up each dialog manager's output, whether a system response or a backend request, into a response candidate. Each candidate has some combination of a system response(s) and/or a backend request(s), and a provisional dialog state. In some implementations, the dialog mixer 130 may perform second phase candidate generation. In second phase candidate generation the dialog mixer 130 may derive a composite candidate response from two or more individual schemas. The dialog mixer 130 provides the candidate response(s), a respective dialog state for each candidate response, and annotations for each candidate response back to the dialog host 120, where the responses are ranked, pruned, and potentially a response is triggered and provided to the input/output devices 110” (paragraph 35, see paragraph 30 of Provisional 62/459820) 
2014/0272884 teaches “Thus, using the array of rankers, each ranker, or subsets of rankers, may be tuned or trained for representing a particular domain or area of knowledge. The combination of the array of rankers may thus be used to provide a question and answer ranking mechanism that is applicable to multiple domains or areas of knowledge. This leads to a question and answer system that provides high quality answer results in a multiple-domain or even open-domain environment. One key advantage in a multiple-domain or open-domain QA system is improved performance. Such improved performance is achieved by the QA system of the illustrative embodiments in that multiple rankers are utilized which have been iteratively trained based on a reward value basis where the reward value is based on the ranks of candidate answers rather than the confidence scores associated with the candidate answers. This is important in that when different rankers are used in a heterogeneous array of rankers, the confidence scores may be computed differently by each ranker and thus, the confidence scores are not comparable across the different rankers. Hence, it is more accurate to base the reward value, indicative of the correctness of the operation of the ranker, based on the ranks of the candidate answers, their correspondence with the golden answer set, and the computed quality of the ranker itself over multiple iterations of the training” (paragraph 26).
2020/0142888 teaches “In some implementations, the control model and/or the generative model can be trained at least in part based on reinforcement learning. In some of those implementations, the control model and the generative model are trained separately, but in combination with one another. In training the control model and/or the generative model based on reinforcement learning, generated variants may be submitted to a search system, and responses (and optionally lack of responses) from the search system can indicate rewards. For example, for a response, to a query variant, that is an “answer” response, a reward can be assigned that is proportional (or otherwise related) to a quality of the answer response (e.g., as indicated by a response score, provided by the search system, for the “answer” response). In some of those examples, where no response is provided in response to a query variant and/or when the response is deemed (e.g., based on output from the search system) to not be an “answer” response, no reward will be assigned. In other words, only the last “answer” response will be rewarded and intermediate actions updated based on such reward (e.g., with a Monte-Carlo Q learning approach). In this manner, Q function learning, or other reinforcement function learning, can occur based on rewards that are conditioned on responses provided by a search system that is interacted with during the reinforcement learning. In implementations of reinforcement learning described herein, the state at a given time step is indicated by one or more of the state features (e.g., such as those described above), and the action can be either a query variant (i.e., generate a further query variant) or provide an “answer” response. Each action of the action space can be paired with a string that defines the corresponding question or “answer” response” (paragraph 17).  This reference does not qualify as prior art.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 5/20/2022
/ERIC YEN/Primary Examiner, Art Unit 2658