DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
The “collecting unit”, “searching unit”, “providing unit”, “update unit” in claims 11, 13, 14, 15-16, 19-20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  

Claim Objections
Claims 6, 16, 9, and 19 are not formally objected to.  Claims 6 and 16 appear to be method/apparatus equivalent claims and claims 9 and 19 also appear to be method/apparatus equivalent claims.  The dependencies of these claims differ relative to their parent claims (6 depends on 1, 16 depends on 15, 9 depends on 8, 19 depends on 17).  Applicant may, at Applicant’s discretion, amend the dependencies to be consistent.  

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5, 6, 8, 9, 10, 13, 15, 16, 17, 18, 19, 20, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per Claim 13: 
“the spoken utterances responsive to the query text” (plural utterances responsive to the query text) lacks antecedent basis.  Claims 1 and 11 only recite one spoken response utterance for the query text.

As per Claim 5 (and similarly claim 15), “the spoken response utterance in the query text-spoken response utterance sets” (a singular utterance in plural sets) and “the query text-spoken response utterance sets” (plural sets) lack antecedent basis.

	As per Claims 6 and 16: 
“the query text-spoken response utterance sets stored in the database” (lines 3-4 of claim 6 and lines 3-4 of claim 16) lacks antecedent basis.  This phrase in claim 16 lacks antecedent basis because “the query text-spoken response utterance sets” in claim 15 lacks antecedent basis.  This phrase in claim 6 lacks antecedent basis (claim 6 depends on claim 1)
In claim 6, “the query text-spoken response utterance sets” (last 2 lines) also lacks antecedent basis.

	As per Claim 17, “the searching result at the searching unit” (lines 3-4) lacks antecedent basis

of a spoken utterance of a user who utters the query text” lacks antecedent basis (claims 7 and 17 recite “an utterance intention of the query text”)
	Claims 8 and 18 also recites “a user who utters the query text” which is unusual.  Users typically utter speech that is converted into text, and does not typically utter text (which is information/data, not uttered speech).

	As per Claims 9 and 19, “the spoken response utterance” (in line 4 of claim 9 and in line 5 of claim 19) is ambiguous, because it can refer to either the “new spoken response utterance” (in which case there should be no frequency of the “new spoken utterance” for a predetermined period of time because the new spoken utterance is new) or to “a spoken response utterance” which is included in the query text-spoken response utterance set (in line 2 of claim 7 and lines 3-4 of claim 17, which also should not have a providing frequency for a predetermined period of time because the “new spoken utterance” is not generated unless “there is no query text-spoken response utterance set including a spoken response utterance for the query text in the database” [i.e. there logically should be no frequency if the spoken response utterance does not exist]).  “the spoken response utterance” (in line 4 of claim 9 and in line 5 of claim 19) is, therefore, ambiguous, and is also unclear because both possible interpretations do not appear to be capable of having the providing frequency for a predetermined period of time claims in claims 9 and 19.
	“the spoken response utterance” in “the providing frequency of the spoken response utterance in the database” at the end of claims 9 and 19 is also ambiguous.
in the database” at the end of claims 9 and 19 also lack antecedent basis because claims 9 and 19 recite “a providing frequency of the spoken response utterance for a predetermined period of time”, and does not specify that “the spoken response utterance” is “in the database” (and as discussed in the previous paragraph, “the spoken response utterance” appears to be unable to be “in the database” because it is either “new” [did not previously exist] or does not exist [because the “new spoken response utterance” is only generated if there is “there is no query text-spoken response utterance set including a spoken response utterance for the query text in the database”)

	As per Claims 10 and 20:
“the query text-new spoken response utterance set in the database” lacks antecedent basis.  “in the database” at the end of claims 9 and 19 appears to refer to “the spoken utterance” and not to “a query text-new spoken response utterance set”, “a query text-spoken response utterance set including a spoken response utterance for the query text in a database which is constructed in advance” in claims 1 and 11 is not a “query text-new spoken response utterance set”, and a “query text-new spoken response utterance set in the database” does not seem to be something that should exist such that it can be updated, because a new spoken response utterance logically is something that did not previously exist.
“the spoken response utterance” (line 5 of claim 10 and lines 4-5 of claim 20) is ambiguous and is unclear (same issue as discussed in the rejection of claims 9 and 19).



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 11, is/are rejected under 35 U.S.C. 103 as being unpatentable over Binder et al. (US 2014/0222436), hereafter Binder, in view of Guo et al. (US 2019/0354630), hereafter Guo, and Lang et al. (US 2017/0242651), hereafter Lang.

As per Claims 11 and 1, Binder suggest (along with its method equivalent) A speech processing apparatus, comprising: a collecting unit configured to collect a user's spoken utterance including a query; a first processor configured to generate a query text as a text conversion result for the user's spoken utterance including the query; a… unit configured to… a… response… for the query text… ; and a providing unit configured to, when there is a… response… for the query text… provide… response… (paragraphs 42-43, 50, 62-63, 71, 73, 76-78; [all paragraphs and Figures are cited for each limitation with “key” paragraphs and Figures pertaining to each limitation identified below, i.e. all other paragraphs and Figures not specifically referenced for any particular limitation are eligible to provide context and additional support]

“comprising: a collecting unit configured to collect a user's spoken utterance including a query;”: paragraphs 42-43, 71; a microphone [“collecting unit”], receives/”collects” a spoken natural language question from a user [“a user’s spoken utterance including a query”].  
“a first processor configured to generate a query text as a text conversion result for the user's spoken utterance including the query;”: paragraphs 42-43, 62, 71, 73, 76-77, 105; a hardware speech-to-text processing module [i.e. “a first processor”] generates a text version of the user’s spoken question received by the microphone of the user device [“query text” which is a “text conversion result” of performing speech-to-text “for the user’s spoken utterance including the query”].  Paragraphs 73 and 76-77 describe the speech-to-text processing module and where a request containing speech input is forwarded to the speech-to-text processing module for speech-to-text conversions, and where speech-to-text processing generates a sequence of words as a result.  Paragraphs 62 describes where various functions of the user device 104 may be implemented in hardware, and paragraph 105 describes where modules [particularly sound detectors] may include hardware, software, and/or any combinations thereof [suggesting a hardware embodiment where the speech-to-text processing module].

“and a providing unit configured to, when there is a… response… for the query text… provide… response…”: paragraphs 42-43, 62, 71, 73, 76-78; a hardware and/or software module/”unit” audibly outputs/”provides”, to the user, the determined response to the user’s spoken question [determined based on processing the text version of the user’s question to determine what the user is asking for].  Providing/outputting a response logically occurs “when there is a… response” because a response logically cannot be provided if it does not exist.).
Binder does not, but Guo suggests a searching unit configured to search whether there is a query text-… response…set including a… response… for the query text in a database which is constructed in advance; and a providing unit configured to, when there is a query text-… response… set including a… response… for the query text in the database, provide the… response… included in the query text-… response… set (paragraph 7, 21, 59, 62-67; Figure 5;
Binder describes receiving a spoken question from a user and providing an informational answer including information that a user asked for [paragraphs 42-43], and as discussed above one embodiment in Binder is where the digital assistant which answers a user’s question is implemented on a client user device [paragraphs 42-43, 50, 63].
Guo similarly describes providing an answer to a user’s question, and suggests where the answer to the user’s question is an answer of a question/answer pair matching the user’s question.  Paragraph 62 describes “In 506, Q&A application 135 receives a query including a question from user 136. In 508, Q&A application 135 determines whether the question and an associated answer is found in local cache 306. If a match is found between the question included in the query and a question/answer pair in cache 306, Q&A application 135 answers the query with the matching answer and process 500 ends”, and paragraph 21 describes “Upon receiving a question from a user, the Q&A system first checks to determine whether the question matches a question in the local cache. If the question is found in the local, an associated answer stored in the local cache is provided to the user”, and paragraph 59 describes “Question/answer cache 308 is configured to locally store a number of question/answer pairs including frequently asked questions and associated answers”.  These paragraphs suggests where the “associated answer”/”matching answer” is the answer in the 
Guo suggests “a searching unit configured to search whether there is a query text-… response…set including a… response… for the query text in a database which is constructed in advance;”: where the hardware and/or software module/”unit” which processes the “query text” produced by performing speech-to-text processing on the user’s spoken question in order to determine an appropriate answer/”response” that corresponds to the “query text” [suggested by Binder] more specifically determines the appropriate answer/”response” by “searching whether there is a” question/answer pair [“query text”-“response” “set”, where the question is suggested to be a “text” question/”query” since questions are commonly expressed using words, and words, in data form, are commonly text] in a local-cache-of-question/answer-pairs/“a database which is constructed in advance” whose question matches the “query text” produced by performing speech-to-text processing on the user’s spoken question, where determining that there is a question/answer pair matching the “query text” determines that there is a pair including a response for the “query text” [at least suggested to be the case because a question is logically paired with its associated answer, and if the user’s question 
“and a providing unit configured to, when there is a query text-… response… set including a… response… for the query text in the database, provide the… response… included in the query text-… response… set”: a hardware and/or software module/”unit” that audibly outputs/”provides”, to the user, the determined response to the user’s spoken question provides the associated-answer/”response” included in the question/answer pair [the matching “query text”-“response” “set”] whose question matches the speech-to-text-generated “query text” of the user’s spoken question “when there is a” matching “query text… response… set including a… response… for the query text in the database” [a question is logically paired with its associated answer, and if the user’s question matches the question in a pair, then the associated answer is logically an answer for the user’s question which matches the pair’s question].)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of question answering with another because the prior art teaches the claimed invention except for the substitution of question answering which does not necessarily match a user’s question to a question of a question/answer pair of a plurality of question/answer pairs in a database and which does not necessarily output the answer of the question/answer pair whose question matches the user’s question with question answering which does.  Guo teaches that question answering which matches a user’s question to a question of a question/answer pair of a plurality of question/answer pairs in a database and which outputs the answer of the question/answer pair whose question matches the user’s 
	Binder, in view of Guo, do not, but Lang suggests a searching unit configured to search whether there is a query text-spoken response utterance set including a spoken response utterance for the query text in a database which is constructed in advance; and a providing unit configured to, when there is a query text-spoken response utterance set including a spoken response utterance for the query text in the database, provide the spoken response utterance included in the query text-spoken response utterance set (paragraph 82;
	Guo does not specifically describe that the answers in the question/answer pairs in the local cache are “spoken response utterances”.
Lang [paragraph 82] describes processing a voice command and returning a response to be played back, and describes where a response that may be “text to be converted to speech” [which suggests where an output response can be in the form of 
Lang suggests “a searching unit configured to search whether there is a query text-spoken response utterance set including a spoken response utterance for the query text in a database which is constructed in advance; and a providing unit configured to, when there is a query text-spoken response utterance set including a spoken response utterance for the query text in the database, provide the spoken response utterance included in the query text-spoken response utterance set”: the associated answers in the question/answer pairs [“query text-… response… set[s]”] in the local cache of the Binder/Guo combination are pre-recorded speech responses [e.g. instead of text answers to be converted to speech], such that the question/answer pairs are “query text-spoken response utterance” sets/”pairs” that each include “spoken response utterance”, and such that the audible output of an answer to the user’s question audibly outputs the pre-recorded speech response of the matching question/answer pair.)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of locally stored response with another because the prior art teaches the claimed invention except for the substitution of a locally stored response which is not necessarily speech with a locally stored response which is.  Lang teaches that a locally stored response which is speech was known in the art.  One of ordinary skill in the art could have substituted one type of locally stored response with another to obtain the predictable results of a user device including a digital assistant which receives a spoken question from a user, performs speech-to-text processing on the user’s spoken question, determines an .

Claims 2, 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Binder, in view of Guo and Lang, as applied to claim 1 and 11, above, and further in view of Kushida et al. (US 2002/0065652), hereafter Kushida.

As per Claims 12 and 2, Binder, in view of Guo and Lang, do not, but Kushida suggests (along with its method equivalent) wherein the first processor is configured to transmit the user's spoken utterance including the query to an external server, and receive, from the external server, a conversion result corresponding to the query text for the user's spoken utterance including the query (paragraphs 38, 40, 43; Figure 2;
As discussed in the rejection of claim 11, “the first processor” is a hardware speech-to-text processing module [i.e. “a first processor”] that generates a text version of the user’s spoken question received by the microphone of the user device [“query text” which is a “text conversion result” of performing speech-to-text “for the user’s spoken utterance including the query”]

Kushida thus suggests “wherein the first processor is configured to transmit the user's spoken utterance including the query to an external server, and receive, from the external server, a conversion result corresponding to the query text for the user's spoken utterance including the query”: where, instead of locally performing the speech recognition that converts the user’s spoken question into a text version of the user’s spoken question, the “first processor” of the Binder/Guo/Lang combination’s client user device sends the user’s spoken question to a server, and the server performs speech-to-text recognition and returns the speech-to-text recognition result [“conversion result corresponding to the query text for the user’s spoken utterance including the query” in the sense that the “result” of speech-to-text “conversion” is “the query text for the user’s spoken utterance including the query”] to the user device.)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speech recognition with another because the prior art teaches the claimed invention except for the .

Claims 4-5 and 14-15, is/are rejected under 35 U.S.C. 103 as being unpatentable over Binder, in view of Guo and Lang, as applied to claims 1 and 11, above, and further in view of Rossides (US 5,359,508).

As per Claims 14 and 4, Binder suggests an… unit… (paragraphs 42-43, 50, 62-63, 71, 73, 76-78;
Paragraphs 62 describes where various functions of the user device 104 may be implemented in hardware, and paragraph 105 describes where modules [particularly sound detectors] may include hardware, software, and/or any combinations thereof [suggesting a hardware embodiment where the speech-to-text processing module].)
Binder, in view of Guo and Lang, do not, but Rossides suggests (along with its method equivalent) an update unit configured to update information about the query text-spoken 20response utterance set in the database, after the spoken response utterance has been provided (col. 12, lines 12-19;
Rossides [col. 12, lines 12-19] describes where any answer that has not been requested or any question that has not been asked for a period of time is automatically deleted, which at least suggests tracking the amount of time since an answer has been requested or an amount of time since an question has been asked.
Rossides thus suggests “an update unit configured to update information about the query text-spoken 20response utterance set in the database, after the spoken response utterance has been provided”: where one of the hardware and/or software “units” in the user device in the Binder/Guo/Lang combination is a hardware and/or software “unit” that tracks, for each of the question-pre-recorded-speech-response pairs 
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods 

As per Claims 15 and 5, Binder, in view of Guo and Lang, do not, but Rossides suggests (along with its method equivalent) wherein the update unit is configured to update the information about the query text-spoken response utterance set in the database or delete the information about the query text-spoken response utterance set from the database, based on at least one of a relative providing frequency of the spoken response utterance included in the query text-spoken response utterance sets, an available storage capacity of the database, or a predetermined number of updated query text-spoken response utterance sets (col. 12, lines 12-19;
Same combination as discussed in the rejection of claims 14 and 4.
Rossides further suggests “wherein the update unit is configured to update the information about the query text-spoken response utterance set in the database or delete the information about the query text-spoken response utterance set from the database, based on at least one of a relative providing frequency of the spoken response utterance included in the query text-spoken response utterance sets, an available storage capacity of the database, or a predetermined number of updated query text-spoken response utterance sets”: As claimed, “based on at least one of a relative providing frequency… or a predetermined number of updated query text-spoken response utterance sets” can be interpreted as referring to the “delete the information about the query text-spoken response utterance set form the database” only [not to both 
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (Binder, Guo, and Lang suggest the limitations of claims 1 and 11, and Rossides teaches a function which deletes answers based on a period of time and deletes questions that have not been asked for a period of time).  One of ordinary skill in the art could have combined the elements as claimed by known methods (by adding the time-based question and answer deleting function of Rossides to the set of functions performed by the Binder/Guo/Lang combination user device), and that in combination, each element merely performs the same function as it does separately (the time-based deletion is a separate function relative to the collecting of user speech, the speech-to-text processing, the matching of a user’s question to a question-answer pair, and the providing of the answer of the matching question-answer pair to the user).  The combination is the predictable results of a user device including a digital assistant which .

Claims 7-8 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Binder, in view of Guo and Lang, as applied to claims 1 and 11, above, and further in view of Taubman et al. (US 9,679,568), hereafter Taubman.

As per Claims 17 and 7, Binder suggests (along with its method equivalent) a… processor configured to,… analyze an utterance intention of the query text,… a… response… for the analyzed utterance intention of the query text,…15… and transmit the… response… to the providing unit (paragraphs 42-43, 50, 62-63, 71, 73, 76-78;
The combination [thus far] is as discussed in the rejection of claims 11 and 1.
As discussed in the rejection of claim 11, Binder suggests “a… unit configured to… a… response… for the query text… ;”: a hardware and/or software module/”unit” which processes the “query text” produced by performing speech-to-text processing on 
Paragraphs 62 describes where various functions of the user device 104 may be implemented in hardware, and paragraph 105 describes where modules [particularly sound detectors] may include hardware, software, and/or any combinations thereof.  These portions suggest where differing functions can be implemented using respective hardware and/or software.
Binder thus suggests “a… processor configured to,… analyze an utterance intention of the query text,… a… response… for the analyzed utterance intention of the query text,…15… and transmit the… response… to the providing unit”: a hardware “processor” that interprets the “query text” produced by performing speech-to-text processing on the user’s spoken question to deduce the user’s intent [i.e. to determine what information the user vocally asked for], that determines a response for the deduced/”analyzed” user-intent [”utterance intention of the query text” since it is an 
To be clear, this portion of this rejection of claims 17 and 7 is applied to assert that Binder generally suggests where the functions discussed in the previous paragraph can be performed by a hardware element [“processor”], and does not change the combination applied to reject claims 11 and 1.  Guo will be discussed next to describe how Guo suggests where the suggested processor discussed in the previous paragraph is, instead, a “second processor” relative to the elements of the Binder/Guo/Lang combination applied to reject claims 11 and 1.)
Binder does not, but Guo suggests a second processor configured to, when there is no query text-…response… set including a… response… for the query text in the database as the searching result at the searching unit, analyze an utterance intention of the query text,…a… response… for the analyzed utterance intention of the query text… and transmit the… response… to the providing unit (paragraph 7, 21, 59, 62-67; Figure 5;
The combination [thus far] is as discussed in the rejection of claims 11 and 1.
As discussed in the rejection of claims 11 and 1, above, Guo suggests where the response determination suggested to be performed by “a searching” hardware and/or software “unit” [in Binder], is, instead, response determination that matches the text 
As discussed in the portion of this rejection of claims 17 and 7 based on Binder, Binder generally suggests a hardware “processor” that interprets the “query text” produced by performing speech-to-text processing on the user’s spoken question to deduce the user’s intent [i.e. to determine what information the user vocally asked for], that determines a response for the deduced/”analyzed” user-intent [”utterance intention of the query text” since it is an intent expressed by a user via uttering speech, and which is an intention determined from the text produced by performing speech-to-text processing on the user’s spoken question], and that provides/”transmits” the determined response to the hardware and/or software module/”unit” that audibly outputs/”provides”, to the user, the determined response to the user’s spoken question [i.e. the “providing unit” as discussed in the rejection of claims 11 and 1].
Paragraphs 62-67 and Figure 5 of Guo further describe where, “If a match is not found between the question included in the query” [paragraph 63, suggested to be where no match is found between the question in the query and questions of question/answer pairs in the local cache given that paragraph 62 describes “If a match is found between the question included in the query and a question/answer pair in cache”], Guo’s system performs another kind of response determination, specifically sending a request to cloud service[s] and providing an answer received from cloud server[s] to a user [paragraph 64-65, 67].  Guo thus suggests where another “backup” 
Guo thus suggests “a second processor configured to, when there is no query text-…response… set including a… response… for the query text in the database as the searching result at the searching unit, analyze an utterance intention of the query text,…a… response… for the analyzed utterance intention of the query text… and transmit the… response… to the providing unit”: where the processor suggested by Binder [as discussed in the portion of this rejection of claims 17 and 7 based on Binder] is another hardware element [i.e. a “second processor” relative to the speech-to-text processing “first processor”] in the Binder/Guo/Lang combination user device [a separate element because it performs a different question answering function relative to the question/answer pair matching, speech-to-text processing, receiving of a user’s question, and audible output of a response to the user’s question], and is a “second processor” whose functions are performed when there is no question/answer pair in the local cache that matches the text produced by performing speech-to-text processing on the user’s spoken question [“when there is no query text-…response… set including a… response… for the query text in the database as the searching result at the searching unit”])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of question answering with another because the prior art teaches the claimed invention except for the substitution of question answering which is not performed when a user’s question does not match any question/answer pairs in a database with question answering which is.  
Binder, in view of Guo, do not, but Lang suggests a second processor configured to, when there is no query text-spoken response utterance set including a spoken response utterance for the query text in the database as the searching result at the searching unit, analyze an utterance intention of the query text,…a… response text for the analyzed utterance intention of the query text, 15generate a new spoken response utterance as a speech conversion result for the… response text, and transmit the new spoken response utterance to the providing unit (paragraph 82;
As discussed in the rejection of claims 11 and 1, Lang suggests where the associated answers in the question/answer pairs [“query text-… response… set[s]”] in the local cache of the Binder/Guo combination are pre-recorded speech responses [e.g. instead of text answers to be converted to speech], such that the question/answer pairs are “query text-spoken response utterance” sets/”pairs” that each include “spoken response utterance”, and such that the audible output of an answer to the user’s question audibly outputs the pre-recorded speech response of the matching question/answer pair.
Lang thus suggests “when there is no query text-spoken response utterance set including a spoken response utterance for the query text in the database as the searching result at the searching unit”: determining that there is no matching question/answer pair determines that there is no question-pre-recorded-speech-response pair among the question-pre-recorded-speech-response pairs in the local cache.
Binder also does not specifically teach where the answer determined in paragraphs 42-43 for the user’s question is in the form of text to be converted into speech, and Lang further describes where responses can be “text to be converted to speech” [paragraph 82].
Lang thus further suggests “…a… response text for the analyzed utterance intention of the query text, 15generate a new spoken response utterance as a speech conversion result for the… response text, and transmit the new spoken response utterance to the providing unit”: where the response determined by the “second processor” suggested by Binder and Guo [the “backup” question answering hardware module that determines an intent for a user’s question and an answer for the user’s question when the user’s question does not match any question/answer pairs] is more specifically a text response which is converted into speech, and where the speech response generated by performing text-to-speech on the text response is the response that is provided to the “providing unit” for audible output.  Converting text to speech is at least suggested to “generated a new spoken response utterance” because text to speech commonly/conventionally produces/generates speech data and if the speech data already existed, then text to speech conversion would be unnecessary.)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of locally stored response with another because the prior art teaches the claimed invention except for the substitution of a locally stored response which is not necessarily speech with a locally stored response which is.  Lang teaches that a locally stored response which is speech was known in the art.  One of ordinary skill in the art could have substituted one type of locally stored response with another to obtain the predictable results of a user device including a digital assistant which receives a spoken question from a user, performs speech-to-text processing on the user’s spoken question, determines an answer for the user’s spoken question based on the speech-to-text processing results, and audibly provides the answer to the user (as suggested by Binder) where the answer for the user’s spoken question is determined by matching the speech-to-text processing 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of response with another because the prior art teaches the claimed invention except for the substitution of a response which is not necessarily text which is converted into speech with a response which is.  Lang teaches that a response which is text which is converted into speech was known in the art.  One of ordinary skill in the art could have substituted one type of response with another to obtain the predictable results of a user device including a digital assistant which receives a spoken question from a user, performs speech-to-text processing on the user’s spoken question, determines an answer for the user’s spoken question based on the speech-to-text processing results, and audibly provides the answer to the user (as suggested by Binder) where the answer for the user’s spoken question is determined by matching the speech-to-text processing results to the question of one of a plurality of question/answer pairs stored in a local cache of the user device, and determining the answer of the matching question/answer pair to be the 
	Binder, in view of Guo and Lang, do not, but Taubman suggests generate a new response text for the analyzed utterance intention of the query text, 15generate a new spoken response utterance as a speech conversion result for the new response text (col. 9, line 3 – col. 11, line 2;
	The combination [thus far] is as discussed in the portion of this rejection of claims 17 and 7 based on Lang, including where the “second processor” determines a text response/answer to the user’s spoken question and converts the text response/answer into speech which is to be audibly output.
	Taubman [col. 9, line 3 – col. 11, line 2] describes determining an answer to a user-spoken question, including performing speech to text on a user’s spoken question [col. 9, lines 20-40] and also describes generating an answer based on question-answer pairs “in addition to, or in lieu of, generating an answer based on search results” [similar to Guo, see col. 10, lines 9-60].  Taubman more specifically describes where answers are “generated” which at least suggests where the generated answer is “new” [because 
	Taubman thus suggests “generate a new response text for the analyzed utterance intention of the query text, 15generate a new spoken response utterance as a speech conversion result for the new response text”: where the text response which is determined by the Binder/Guo/Lang combination and which is converted into speech is more specifically a “generated” response [suggested to be a “new” response because the word “generate”, based on its plain meaning, suggests that what is “generated” did not previously exist and is thus “new”])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of response with another because the prior art teaches the claimed invention except for the substitution of a response which is not necessarily generated with a response which is.  Taubman suggests that a response which is generated was known in the art.  One of ordinary skill in the art could have substituted one type of response with another to obtain the predictable results of a user device including a digital assistant which receives a spoken 

	As per Claims 18 and 8, Binder suggests (along with its method equivalent) wherein the second processor is configured to analyze the utterance intention of a spoken utterance of a 20user who utters the query text by performing syntactic analysis or semantic analysis on the query text (paragraphs 42-43, 50, 62-63, 71, 73, 76-78;

	Paragraphs 42-43 of Binder describe deducing a user’s intent and also where a user’s spoken question is accurately answered, which at least suggests that the digital assistant accurately determines the meaning of what the user intended to ask.  Also, in Binder, paragraph 77 describes where speech-to-text processing obtains a sequence of words as a result, paragraph 78 describes associating the token sequence with one or more “actionable intents”, paragraph 42 describes interpreting natural language input in spoken and/or textual form to deduce user intent, and paragraph 43 describes where the digital assistant determines and outputs an appropriate answer to the user’s informational request [suggesting an embodiment where the user’s spoken question is converted into text via speech-to-text processing, where the text version of the user’s question is interpreted to determine what the user is asking for, and where the digital assistant determines an appropriate response based on the interpretation of the text version of the user’s question].
Binder thus suggests “wherein the second processor is configured to analyze the utterance intention of a spoken utterance of a 20user who utters the query text by performing syntactic analysis or semantic analysis on the query text”: where the “second processor” deduces/”analyzes” the intent of the user’s spoken question [“the utterance intention of a spoken utterance of a user who utters the query text”] by .
	
Allowable Subject Matter
Claim 3 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
As per Claim(s) 13 (and similarly clam 3), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 11 and 13 together, including (i.e. in combination with the remaining limitations in claim[s] 11 and 13) wherein the searching unit is configured to search a query text-spoken response utterance set group in which the spoken response utterances responsive to the query text are clustered in the database, and randomly determine a query text-spoken response utterance set in the query 15text-spoken response utterance set group to be provided.
2020/0065389 (continuation of PCT/CN2018/109471, filing date is after effective filing date of this application, PCT filing date is before effective date of this application, subject matter supported by PCT Specification [see Google Patents translation]) teaches “In one embodiment, when a prestored question similar to the sentence vector of the test sentence is matched from the question library, an answer corresponding to the prestored question is further obtained and sent to the user rd to last full paragraph]).  This reference does not describe where the answers for the prestored question are clustered.
2009/0259642 teaches “In the topic identification phase, answers are clustered using any of a wide variety of relatively simple clustering techniques. For instance, cosine measure, which is a known similarity measure, can be used to estimate answer similarity and an empirically determined threshold can be used to merge answers into clusters. This is indicated in step 1 of Table 3” (paragraph 42)
2009/0012926 teaches “According to this example, the score calculation means 305 determines a matching degree between the group of the style and the topic of the inputted query and the group of the style and the topic of the query of the question-answer pairs. The search result presentation means 306 narrows the question-answer pairs on the basis of the matching degree”.  This reference describes narrowing down question-answer pairs but does not teach or suggest randomly selecting one of the narrowed-down set of question-answer pairs.  Additionally, it would not be obvious to apply random selection to the narrowed down question-answer pairs because this would appear to randomly select between one of a plurality of questions, all of which would be a question of the same topic but at least some of which would most likely not 
In 2011/0153312, paragraph 153 describes questions which are mapped to a prototypical answer, matching, and then providing the relevant prototypical answer (not specifically one to one match of question and answer).  This reference describes question and answer pairs, but appears to group/”cluster” questions, and not answers.
2021/0049195 (continuation of PCT/2018/018616, filed 5/14/18) teaches “Whereas, when the answering device 10 may not retrieve the answer corresponding to the inputted inquiry from the database, the answering device 10 executes chat handling (S3), and causes transition to the inquiry reception again (S4). For example, the answering device 10 identifies a category of the content inputted as the inquiry, randomly selects a message corresponding to the category, and outputs the message to continue the conversation. More specifically, for example, the answering device 10 prepares in advance a plurality of categories of "greeting type" such as "Hello", "anger type" such as "Don't be silly", and the like, and response messages corresponding to the categories. Then, the answering device 10 executes morphological analysis or the like of the content inputted as the inquiry, judges which category the inquiry corresponds to, and outputs a message corresponding to the category” (paragraph 27) and “Specifically, for example, when "Hello" is inputted to the chatbot screen 30, which is an unexpected inquiry that is registered in the keyword of the FAQ list DB 13, the chat handling unit 26 refers to the chat handling DB 14 to identify the category "greeting 
2016/0247068 teaches “conversations between the existing chatting systems and users lack personality. For each of the users, answers to one question are always the same or randomly selected from several answers, regardless of context of the users and their individual factors. Embodiments of the application take full advantage of contexts in the user models and the users' individual factors, so that answers to the same questions proposed by different users may be different. Therefore, conversations between users and the chatting robots are more real and flexible” (paragraph 67).  This reference appears to teach away from random selection from several answers.
2011/0191099 teaches “As indicated, each trigger is associated with at least one response. All possible constructions of a sentence from a trigger are considered to be equivalent to each other. How a trigger and its associated responses may be prepared will be described in greater detail later in reference to FIG. 2. Once server 110 receives a user input, it may queue the input in an event queue and process the input when the event reaches the top of the queue; server 110 may also process the input immediately. The reasoning and response module 118 analyses the user input and attempts to find a matching trigger from the collection of triggers stored in server database 122. If a match is found, the reasoning and response module 118 finds from server database 122 all corresponding responses. In case there is only one response, the response is sent to client 112. If multiple responses are found, a response may be selected based on a pre-
2013/0323689 teaches “receiving multiple audible answers for a single blank within a prerecorded fill-in-the-blank story; randomly selecting a specific audible answer from the multiple audible answers for the single blank;”
2019/0081980 teaches “FIG. 5 illustrates an architecture of the IoTLearner module in accordance with some embodiments. Specifically, the architecture of the IoTLearner module is depicted in FIG. 5 to fetch raw responses 110 from the database 104 and record each transaction to the database 104. Every incoming request to the honeypot is forwarded to this module, and the selected response is returned to the client based on the Req_Rsp Mapping 508. A core part of the module is a selection engine shown as a selector component 504, which normalizes the request and fetches the potential responses list 110 from the scanning result. In random selection mode, it just randomly selects one from the candidate list and returns it using a select response component 506. In MDP selection mode (as further described below), it first locates the state in the graph from the normalized request using a state locator component 502, 

As per Claim 6 (and similarly claim 16), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1 and 6 together, including (i.e. in combination with the remaining limitations in claim[s] 1 and 6) after the providing of the spoken response utterance, analyzing a relative providing frequency history of the query text-spoken response 15utterance sets stored in the database for every predetermined period of time; and deleting, from the database, a query text-spoken response utterance set having a relatively low relative providing frequency among the query text-spoken response utterance sets, as an analysis result.
As per Claim 19 (and similarly claim 9, and consequently claims 10 and 20 which depend on claims 9 and 19) the prior art of record does not teach or suggest the combination of all limitations in claim(s) 11, 17, and 19 together, including (i.e. in combination with the remaining limitations in claim[s] 11, 17, and 19) an update unit configured to, after the providing of the new spoken response utterance, relatively compare a providing frequency of the new spoken response utterance and a providing frequency of the spoken response utterance for a predetermined period of time. and update a query text-new spoken response utterance set for the new spoken39 response utterance having a higher providing frequency than the providing frequency of the spoken response utterance in the database
Rossides (col. 12, lines 14-19) describes deleting answers that have not been requested and questions that have not been asked for a period of time and where answers an data-requests “whose demand rate drops too low” are deleted.
2005/0289164 teaches “wherein the sorting the files and/or lower folders further comprises deleting one of the files or the lower folders in response to the frequency of the one of the files or the lower folders being smaller than a predetermined value and not being used for a predetermined amount of time” and “sorting and displaying a file or a folder (hereinafter referred to as "a file") based on frequency of use by a user, and, more particularly, to a method of, and an apparatus to perform, sorting files based on 
2013/0260352 teaches “In some implementations, the test engine 210 may calculate the confidence of a received answer as follows. For each answer x, let N(x) be a number of times that the answer x was provided by an entity from the entity group in response to the question 105. Let N be the total number of answers 106 provided to the question 105, and N(y) be the number of times that a different answer y was provided. Let f(.epsilon., N) be a confidence function where .epsilon. is the confidence threshold. If N(x)-N(y)>f(.epsilon., N), then the answer x exceeds the confidence threshold. Otherwise, the answer x does not exceed the confidence threshold” (paragraph 30)
2010/0191759 teaches “In some embodiments, a query response includes one or more bits to associate it with a particular query. For example, the query response may include the same identifier that was transmitted in the query to which the query response corresponds. In some embodiments, the information provided in the query response is sufficient to determine whether the detected query response is a response to the query of step 610. In some embodiments the first communications device maintains a count of the query responses which are determined to be a reply to its transmitted query of step 610. This count or the number of query response which are determined to be a reply to its transmitted query, may and sometimes is, used by the first communications device in making one or more decisions. Consider as an example, that the transmitted query of step 610 is a search query for 
	10083213 teaches “Using traditional customer support question and answer databases, the determination as to which previously answered question and answer pair, or pairs, are most likely to result in the searching user being satisfied with the answer provided is made largely, if not entirely, based on the feedback data, or ranking data, associated with the previously answered question and answer pair data provided by the original asking user and/or subsequent searching users as discussed above. As 
2005/0268160 “A method adapted for storing products within a data retrieval system, the products comprising subscriber purchasable items, the method comprising: for a first publication of a product, storing in a primary storage device promotional information and meta data associated with the first published product; for all products, storing in the primary storage device product elements expected to be viewed frequently; and adapting a use of the primary storage device in response to viewing frequency of the stored product elements, wherein at least a portion of a play track of an infrequently viewed product is removed from the primary storage device”


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
20190026647 teaches “Programmers would help with developing programming code and computer logic (i.e., the rules for human/computer interaction) for determining the intent behind learner questions and generating a response to the questions”
2019/0370629 teaches “among the question, the second chatbot, the second amount of confidence, and the second intent in a memory cache included in the first chatbot, (ii) generating a response to the question based on the second intent, and (iii) presenting the response to the user from the first chatbot”
2012/0143895 teaches “The search engine 214 can be a conventional search engine that receives queries and returns lists of responsive documents. Queries from a user device 206 can be passed first through search engine 214, or the queries can be transferred directly to query and answer matching service 216 for identification of an answer to a query. The query and answer matching service 216 identifies an answer that corresponds to a query. This can be done by matching the query to a query format previously identified by query format identification component 218. After matching a query to a query format, a corresponding answer from answers database 212 can be identified based on the entity and the attribute from the query format. Alternatively, since the query format is determined ahead of time, answers database 212 can include a list of queries for matching with a potential query. The various entries in the list of queries can be associated with an answer, so that matching of a query in the query list automatically identifies the corresponding answer. The corresponding answer can then be provided to a user via user device 206” (paragraph 51)
2015/0294002 teaches “In a specific implementation, the content of the query includes the contents of the HTTP header and possibly the body. In analyzing headers to define the query, there are certain headers which can be ignored for the purposes of caching such as the authorization or referrer headers as these do not uniquely identify a request, rather some extra data that is unique to the client. For example to uniquely 
2002/0059164 teaches “A communication center has a system for managing agent-hosted sessions including systems for storing queries and responses to queries in sessions in associated pairs, and for comparing newly-arriving queries with stored query-response pairs. In the event of a match or near match, stored responses to oft-repeated queries are provided to agents for use, relieving agents of the tasks of responding manually to often-repeated queries. Parsing and sentence structure tools are used, and in some cases aid stations with knowledge workers are provided to allow editing and extra help for agents” (Abstract).  This reference describes matching/near-matching new queries with stored query-response pairs, but provides the responses to agents (not to users).
2005/0283474 teaches “A method of providing information to a user from a knowledge database containing question and answer pairs, the method comprising: extracting questions from a multiplicity of electronic communications originating from client terminals; for each of said electronic communications, displaying the extracted question on a display of an originating client terminal and enabling the sender of the communication to select or deselect a question using input means of the client terminal; classifying each selected question based upon the content of the selected question; entering the questions into the database together with the respective classifications; receiving answers corresponding to the entered questions from client computer terminals belonging to recipients of the electronic communications, and entering the 
2006/0069546 teaches “To the chagrin of many in the Artificial Intelligence community, the programs that have been most successful in the Loebner Contest, and come closest to imitating a human conversation exchange, do not try to understand human utterances in any meaningful way. Instead the most successful programs operate on a very straight forward principle known as Case Based Reasoning. This consists of simply having a large database of queries and stored responses to those queries. When asked a question, the program looks in its database for a match to that query and, if it finds one, responds with the prepackaged, associated response. By having a large enough database such programs can, under the right circumstances, appear to be responding like a real person. The winner of the 2003 Loebner Contest and currently holder of the title "most human computer program" is a software package called Jabberwock. Jabberwock has a database of about 1.8 million responses to questions. However, even with such a large database of responses the winning Jabberwock was still judged to be "probably a machine", falling well short of passing the Turing Test” (paragraph 5).  This reference appears to teach away from matching a question and providing a prepackaged, associated response.
2007/0094032 teaches “The result set 1015 of candidate questions corresponding to the user query utterance are presented to NLE 190 for further processing as shown in FIG. 4D to determine a "best" matching question/answer pair. An NLE/DBProcessor interface module coordinates the handling of user queries, analysis of noun-phrases (NPs) of retrieved questions sets from the SQL query based on the user query, comparing the retrieved question NPs with the user query NP, etc. between NLE 190 and DB Processor 188”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249.  The examiner can normally be reached on M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  






EY 8/28/2021
/ERIC YEN/Primary Examiner, Art Unit 2658