DETAILED ACTION
This Office Action is in response to Applicant’s argument filed in the reply on 11/5/2021.  Claims 1-5, 7-9, 12-16 were amended. As such claims 1-16 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2017-153883, filed on 8/9/2017.

Response to Arguments

	Applicant’s remark with respect to the claim objections raised in the previous office action were persuasive in view of amendment. The objections are withdrawn.
Applicant's arguments filed with respect to the 35 USC 101 rejections raised in the previous office action have been fully considered, but they are not persuasive. The amendment with respect to addition of “circuitry configured to” or “computer executable command” amount to insignificant addition since mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea is not indicative of integration into a practical application -see MPEP 2106.05(f). Furthermore reference to “circuitry” is attributed to generic computer and as such it also amount to insignificant addition. . 
Applicant’s arguments with respect to the prior art rejections raised in the previous office action have been considered but are moot because the new ground of rejection does not rely on the combination of references that are currently applied. 
Applicant at the end of page 10 and page 11 of the remarked filed on 11/5/2021 argues:
The Office Action alleges that Kamatani describes voiceless sections (V). Kamatani appears to describe,  [0044] Assume that a speaker continuously utters the contents”(Tokyotonaide (V) hoteruwo yoyakushi (V) tainodesuga yoihoteruwo mitsukete (V) moraemasenka)" and that the simultaneous speech processing apparatus 100 acquires this utterance. A symbol (V) indicates a position recognized as a voiceless section (pause) by the simultaneous processing apparatus 100. However, Kamatani is silent with regard to "speech signals being taken for the parts of the utterance of the user but not for the pause." The alleged descriptions Hakkani-Tur of a matching degree, in Weng of "before and after the pause in terms of a semantic unit," in Minamino of a matching degree, in Binder of a sensor, in Moniz of an amount of information obtained for each of the input hypotheses, in Kuo of a second score, in Jang of an interval between the pauses, in Iwata of only an effective utterance history out of the past utterance histories, in Mochida of information regarding a content of the utterance of the user and a time at which the utterance is made, in Lloyd of sensor information, in Divay of keeping track of user utterances, in Yasavur of extraction information, and in Zhao of a maximum number of past utterance histories would not have cured that deficiency of Kamatani. 

Examiner respectfully disagrees because the new ground of rejection does not rely on the combination of references that are currently applied. Please see prior art section for more detail including updated citations and obviousness rationale.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter without significantly more. The claims as whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea.
	The independent claims 1 and 12 recites: “An information processing apparatus comprising circuitry configured to couple parts of an utterance of a user before and after a pause in accordance with a matching degree between the parts of the utterance of the user before and after the pause in terms of a semantic unit, the pause being included in the utterance of the user, speech signals being taken for the parts of the utterance of the user but not for the pause; and convert the coupled parts of the utterance of the user into a computer-executable command.” Also claim 13 and 16 recites “An information processing apparatus comprising: circuitry configured to extract, from past utterance histories of a user, a past utterance history that matches spoken-language understanding processing on an utterance including a pause; transmit the extracted past utterance history to an information processing apparatus along with a speech signal corresponding to a current utterance of the user, the speech signal not including the pause, the information processing apparatus performing the spoken-language understanding processing and converting the current utterance of the user into a computer-executable command.“

This judicial exception is not integrated into a practical application. Even though claims 1, 12 and 13, 16 do not recites any dependency to processors, and a storage device, or programs, however the as filed applicant’s specification relies on executing the controller via a 
Likewise, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. References made to “circuitry configured to” or “computer executable command” amount to insignificant addition since mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea is not indicative of integration into a practical application -see MPEP 2106.05(f). Furthermore reference to “circuitry” is attributed to generic computer and as such it also amount to insignificant addition. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer which due to lack of specificity, is considered as a general purpose computer (or processor) -see Par. 0260 
Claims 2, recites “wherein the circuitry is configured to calculate, on a basis of a current utterance and a past utterance history, the matching degree between the parts of the utterance before and after the pause in terms of the semantic unit, determine, on a basis of the calculated matching degree, whether or not to couple the parts of the utterance before and after the pause, and couple the parts of the utterance before and after the pause in a case where it is determined that the parts of the utterance before and after the pause are coupled.” Calculating, on the basis of a current utterance and past utterance is a mental process which can be carried out with a pen and paper by a human. Once the degree of similarities are determined (by human mind) it can be decided whether the utterances can be coupled or not, which is a mental process.  The claim does not include additional elements that are sufficient to 
	Claims 3, is directed toward human activity. It recites: “wherein the processor calculates the matching degree between the utterances before and after the pause in terms of the semantic unit, using sensor information along with the current utterance and the past utterance history, the sensor information being obtained from a sensor.” Calculating a matching degree is a mental process and based on predetermined formula that a human can accomplish with pen and paper. Using a “sensor” information is not defined specifically, and amounts to pre solution activity (data gathering), and as such is not by itself constitute an additional element that is sufficient to integrate the judicial exception into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
	Claims 4, is directed toward human activity. It recites “wherein the circuitry is configured to calculate a score for each of input hypotheses, the input hypotheses being each obtained by hypothetically coupling the parts of the utterance before and after the pause, calculate a matching degree for each of the input hypotheses between the parts of the utterance before and after the pause on a basis of the calculated score for each of the input hypotheses, and select one input hypothesis from among a plurality of input hypotheses on a basis of the calculated matching degree for each of the input hypotheses.”. Calculating and assigning a particular score for each hypothesis to signify the suitability of the utterances for 
Claims 5, is directed toward human activity. It recites “wherein the circuitry is configured to calculate the score in accordance with an amount of information obtained for each of the input hypotheses”. Calculating and assigning a particular score for each hypothesis based on secondary information is a mental process. The scores can be calculated by a pen and paper by a human based on a predetermined relationship in terms of formula where the formula takes into account other source of information is a simple mental process. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claims 6, is directed toward human activity. It recites “wherein the score includes, for each of the input hypotheses, at least one or more of a first score that is obtained from a ratio of use of a function parameter corresponding to an intention of the user, a second score that is obtained from a language model related to information regarding a phrase in the utterance of the user, a third score that is obtained from a temporal interval between the current utterance and the past utterance history, or a temporal interval between the past utterance histories, or a fourth score that is obtained from a number of combinations of the current utterance and the 
Claims 7, is directed toward human activity. It recites “wherein the matching degree includes at least one of a relevance or a grammatical connection between the parts of the utterance before and after the pause”. Having considered a new factor in defining the similarity 
Claims 8, is directed toward human activity. It recites “wherein the circuitry is configured to determine, on a basis of an interval between the pauses, whether or not to couple the parts of the utterance before and after the pause”. As mentioned before the interval bf a given pause can be mentally measured by a human as it can be mental process. Once the pause interval is determined a human can decide based on a predetermined agreement or definition to choose to couple the utterances or not, which is mental process. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claims 9, is directed toward human activity. It recites “wherein the circuitry is configured to use only an effective utterance history out of the past utterance histories”. Deciding which utterance is effective or not is a concept that human can use and apply to the utterances at hand and use the one that is considered effective for further action. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claims 10, is directed toward human activity. It recites “wherein the past utterance history includes information regarding a content of the utterance of the user and a time at 
Claims 11, is directed toward human activity. It recites “wherein the sensor information includes image data or position information, the image data being obtained by imaging an object, the position information indicating a position of the user”. Considering other factors in in view of the utterance to be chosen is a simple agreement that can be performed by a human. An image of an object which would define certain attribute of the object (size, weight etc.) or the position can be carried out by a human.  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claims 14, is directed toward human activity. It recites “wherein the circuitry is configured to extract the past utterance history on a basis of extraction information including a time interval between pauses,  information regarding a falter in wording,  information regarding a speaker, or line-of-sight information regarding the user”. Considering other factors in utterance extraction process is a simple agreement that can be performed by a human. Time interval between the pauses were already discussed earlier where a human can accomplish this. Using filler or falter to decide on such selection is also a simple task which a human by attentively listening can select. Also any particularity of the speaker can by definition be used 
Claims 15, is directed toward human activity. It recites “wherein the circuitry is configured to transmit the past utterance history on a basis of a maximum number of the past utterance histories for transmission or a maximum data size of the past utterance history for transmission”. Already discussed the transmission earlier which can be carried out by human with a paper.  Counting the number of past utterances is a mental process and can be carried out by a human. Also size in terms of how big or long is the utterance, is an action that human can handle by considering the length of the speech by simply counting the duration in his mind. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
	
Therefore, claims 1 - 16 are not patent eligible under 35 USC 101.



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:


(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Maas et al. (US10854192B1)(hereinafter "Maas").

Regarding claims 1 and 12, Maas teaches an information processing apparatus comprising circuitry configured to couple parts of an utterance of a user before and after a pause in accordance with a matching degree between the parts of the utterance of the user before and after the pause in terms of a semantic unit, the pause being included in the utterance of the user, speech signals being taken for the parts of the utterance of the user but not for the pause; and. (Mass, Col. 23, lines 20 – 48:”As illustrated in FIGS. 8A and 8B, the first portion of the audio data may identify the desired domain/command [music, messaging, etc.] whereas the second portion of the audio data may identify what should be executed/i.e., the subject of the command. Thus, the first portion may be referred to as the carrier and the second potion may be referred to as the payload. In certain circumstances, however, the carrier and the payload may arrive as a result of different distinct utterances or exchanges with the system. For example, as illustrated in FIG. 8C, a first utterance may correspond to first audio data 830 received by the system. The system may process that first audio data to determine first text 832 and may determine a domain based on the first text which the system has determined is the carrier portion. The system may then prompt the user to provide second audio corresponding to a second utterance that includes the payload corresponding to the carrier. The system may process second audio data corresponding to the second utterance using the domain specific endpointing configuration 710 of the domain determined in the carrier utterance to determine the end of the second utterance and determine second text 844 corresponding to the second utterance. Thus, the system may identify a domain corresponding to a first utterance and apply domain specific endpointing to a second utterance that relates to the first utterance. In this example, the first utterance and second utterance may be considered part of a single command [that may actually involve even more than two utterances depending on the command and system configuration].”, and Col. 24, lines 15 – 23:”The domain-specific endpointing configuration may correspond to a threshold pause length during a particular utterance after which an endpoint is declared. The pause length may correspond, for example, to a number of consecutive non-speech frames immediately preceding an endpoint or a number of consecutive non-speech nodes immediately prior to an endpoint.”, and Col. 15, line 56 – Col. 16, line 9:”In the present system, however, the system may be configured to make a domain determination based on a first portion of an utterance, and may then make a later endpointing determination for a second portion of the utterance based on the domain determined in the first portion of the utterance. This may result in improved endpointing as utterances for one domain may be expected to have different endpointing/pause characteristics of a different domain. For example, “what is the weather” may be a complete utterance by itself, but it also may be followed by additional words, such as “in Seattle,” “in January,” “tomorrow,” etc. where certain pauses may be included between different portions of the utterance. Similarly, “play music” may be a complete utterance by itself, but it also may be followed by additional words, such as “by Sting,” “something fun,” etc. where again certain pauses may be included between different portions of the utterance. Further, commands from send a message to mom” may typically be followed by additional speech with the message contents, but those contents may follow a pause between “mom” and the remainder of the message.”). Note: In the process of speech recognition and while going thru intent determination, slots are filled based on semantically/rules related where it can be equated with “matching degree”.
convert the coupled parts of the utterance of the user into a computer-executable command. (Maas, Col. 7, lines 46 – 58:”The NLU process takes textual input [such as processed from ASR 250 based on the utterance 11] and attempts to make a semantic interpretation of the text. That is, the NLU process determines the meaning behind the text based on the individual words and then implements that meaning. NLU processing 260 interprets a text string to derive an intent or a desired action from the user as well as the pertinent pieces of information in the text that allow a device [e.g., device 110] to complete that action. For example, if a spoken utterance is processed using ASR 250 and outputs the text “call mom” the NLU process may determine that the user intended to activate a telephone in his/her device and to initiate a call with a contact matching the entity ‘mom.’”, and Col. 3, lines 31 – 39:” The system may then cause a command to be executed [162] using some combination of the first text and second text. As discussed below, the first audio data and second audio data may correspond, respectively, to a first portion of audio data and a second portion of audio data of a single utterance. Alternatively, the first audio data and second audio data may correspond, respectively, to a first utterance and second utterance that correspond to a single command to be executed by the system.”). Note: With respect to executable command, once the intent is executed then the executable command limitation is met.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 2, 4 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over  Maas, as applied to claim 1, 2, 1 respectively, and in further view of Minamino (JP4465564B2).

Minamino was applied in the previous office action.
Regarding claim 2, Maas, teach an information processing apparatus.
Maas does not teach the information processing apparatus according to claim 1, wherein the circuitry is configured to calculate, on a basis of a current utterance and a past utterance history, the matching degree between the parts of the utterance before and after the pause in terms of the semantic unit, determine, on a basis of the calculated matching degree, whether or not to couple the parts of the utterance before and after the pause, and couple the parts of the utterance before and after the pause in a case where it is determined that the parts of the utterance before and after the pause are coupled.
Minamino teaches wherein the circuitry is configured to calculate, on a basis of a current utterance and a past utterance history, the matching degree between the parts of the utterance before and after the pause in terms of the semantic unit (Minamino, Par. 0141:” In addition, the word preliminary selection unit 13 can calculate the language score of each word based on a bigram that defines the probability that the word is linked to the immediately preceding word by referring to the word connection information.”) Note. Words are part of the utterance not the entire utterance.
determine, on a basis of the calculated matching degree, whether or not to couple the parts of the utterance before and after the pause, and (Minamino, Par. 0017:” … for example, in the process of obtaining an acoustic score, when the acoustic score obtained in the middle is equal to or lower than a predetermined threshold, the score calculation is terminated [no coupling]. There are a pruning technique and a linguistic pruning technique for narrowing down words to be scored based on a language score.”, and Par. 0150:” Furthermore, the matching unit 14 calculates the language score of the selected word from the word preliminary selection unit 13 by referring to the grammar database 19B. That is, the matching unit 14 recognizes the word immediately before the selected word from the word preliminary selection unit 13 and the previous word by referring to the word connection information, for example, and the probability based on the bigram or trigram. Then, the language score of the selected word from the word preliminary selection unit 13 is obtained.”, and Par. 0010:” for example, in the case of the HMM method, the acoustic score is determined based on the probability [probability of appearance] that a series of feature amounts output from the feature extraction unit 3 is observed from the acoustic model constituting the word model. Calculated every time. Further, for example, in the case of bigram, the language score is obtained based on the probability that the word of interest and the word immediately preceding the word are linked [connected]. Then, a speech recognition result is determined based on a final score [hereinafter, referred to as a final score as appropriate] obtained by comprehensively evaluating the acoustic score and the language score for each word.” Note, each word is considered part of the utterance not the whole utterance.).
couple the parts of the utterance before and after the pause in a case where it is determined that the parts of the utterance before and after the pause are coupled. (Minamino, Par. 0010:” Further, for example, in the case of bigram, the language score is obtained based on the probability that the word of interest and the word immediately preceding the word are linked [connected]. Then, a speech recognition result is determined based on a final score [hereinafter, referred to as a final score as appropriate] obtained by comprehensively evaluating the acoustic score and the language score for each word.”, and Par. 0124:” That is, when there is only one halfway node in the information connection information, the control unit 11 selects that halfway node as the node of interest. Further, when there are a plurality of intermediate nodes in the information connection information, the control unit 11 selects one of the plurality of intermediate nodes as a node of interest. Specifically, for example, the control unit 11 refers to the time information that each of the plurality of intermediate nodes has, and the time indicated by the time information is the oldest (starting voice section) or the newest (The one at the end of the speech segment) is selected as the node of interest. Alternatively, for example, the control unit 11 accumulates an acoustic score and a language score included in an arc that forms a path from the initial node to each of a plurality of intermediate nodes, and the accumulated value (hereinafter, as appropriate, a partial accumulation). The node that is the end of the path having the largest score or the smallest path is selected as the node of interest.”). Note, each word is considered part of the utterance not the whole utterance.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas in view of Minamino to calculates, on a basis of a current utterance and a past utterance history, the matching degree between the utterances before and after the pause in terms of the semantic unit, determines, on a basis of the calculated matching degree, whether or not to couple the utterances before and after the pause, and couples the utterances before and after the pause in a case where it is 

Regarding claim 4, Maas does not teach the information processing apparatus according to claim 2, wherein the circuitry is configured to calculate a score for each of input hypotheses, the input hypotheses being each obtained by hypothetically coupling the parts of the utterance before and after the pause, 51calculate a matching degree for each of the input hypotheses between the parts of the utterance before and after the pause on a basis of the calculated score for each of the input hypotheses, and select one input hypothesis from among a plurality of input hypotheses on a basis of the calculated matching degree for each of the input hypotheses.  
Minamino teaches wherein the circuitry is configured to calculate a score for each of input hypotheses, (Minamino, Par. 0008:” the matching unit 4 accumulates the appearance probabilities of the feature quantities for the word strings corresponding to the connected word models, uses the accumulated value as a score, and sets the word string having the highest score as the speech recognition result.”)
the input hypotheses being each obtained by hypothetically coupling the parts of the utterance before and after the pause, (Minamino, Par. 0010:” for example, in the case of the HMM method, the acoustic score is determined based on the probability [probability of probability that the word of interest and the word immediately preceding the word are linked [connected]. Then, a speech recognition result is determined based on a final score [hereinafter, referred to as a final score as appropriate] obtained by comprehensively evaluating the acoustic score and the language score for each word.”)
calculate a matching degree for each of the input hypotheses between the parts of the utterance before and after the pause on a basis of the calculated score for each of the input hypotheses, (Minamino, Par. 0141:” In addition, the word preliminary selection unit 13 can calculate the language score of each word based on a bigram that defines the probability that the word is linked to the immediately preceding word by referring to the word connection information. ”).
and select one input hypothesis from among a plurality of input hypotheses on a basis of the calculated matching degree for each of the input hypotheses. (Minamino, Par. 0150:” Furthermore, the matching unit 14 calculates the language score of the selected word from the word preliminary selection unit 13 by referring to the grammar database 19B. That is, the matching unit 14 recognizes the word immediately before the selected word from the word preliminary selection unit 13 and the previous word by referring to the word connection information, for example, and the probability based on the bigram or trigram. Then, the language score of the selected word from the word preliminary selection unit 13 is obtained.”, and Par. 0157:” … for example, a word string corresponding to an arc constituting a path having the largest final score is output as a speech recognition result for the user's utterance”, and Par. 0158:” the word preliminary selection unit 13 selects one or more words following the already obtained word in the word string that is a candidate for the speech recognition result, and the matching unit 14 selects the selected word (For the selected word), a score is calculated, and based on the score, a word string that is a candidate for the speech recognition result is constructed. Then, the reevaluation unit 15 corrects the word connection relationship between the words in the word string that is a candidate for the speech recognition result, and the control unit 11 uses the corrected word connection relationship as a word as the speech recognition result.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas in view of Minamino for each of input hypotheses, the input hypotheses being each obtained by hypothetically coupling the utterances before and after the pause, 51calculates a matching degree for each of the input hypotheses between the utterances before and after the pause on a basis of the calculated score for each of the input hypotheses, and selects one input hypothesis from among a plurality of input hypotheses on a basis of the calculated matching degree for each of the input hypotheses, in order to evaluate enormous number of word strings, the most probable speech recognition result is selected from such enormous word strings in terms of the amount of calculation and the memory capacity to be used, since efficient determination is a very important issue, as evidence by Minamino (See Par. 0016).


Minamino teaches wherein the matching degree includes at least one of a relevance or a grammatical connection between the parts of the utterance before and after the pause.  (Minamino, Par. 0139:” Further, the word preliminary selection unit 13 calculates the language score of the word corresponding to each word model based on the grammatical rules stored in the grammar database 19A. That is, the word preliminary selection unit 13 obtains the language score of each word based on, for example, a unigram.", and Par. 0140: "The word preliminary selection unit 13 refers to the word connection information to calculate the acoustic score of each word to the word immediately before the word [the word corresponding to the arc whose terminal node is terminated]. It can be done using a dependent crossword model.", and Par. 0141:” In addition, the word preliminary selection unit 13 can calculate the language score of each word based on a bigram that defines the probability that the word is linked to the immediately preceding word by referring to the word connection information.”, and Par. 0010:” Further, for example, in the case of bigram, the language score is obtained based on the probability that the word of interest and the word immediately preceding the word are linked [connected]. Then, a speech recognition result is determined based on a final score [hereinafter, referred to as a final score as appropriate] obtained by comprehensively evaluating the acoustic score and the language score for each word.” Note, each word is considered part of the utterance not the whole utterance.).
.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over  Maas and Minamino, as applied to claim 2, and in further view of Binder (US20210151070A1).

Binder was applied in the previous Office Action.
Regarding claim 3, Maas, and Minamino teach an information processing apparatus.
Maas, and Minamino do not teach the information processing apparatus according to claim 2, wherein the processor calculates the matching degree between the utterances before and after the pause in terms of the semantic unit, using sensor information along with the current utterance and the past utterance history, the sensor information being obtained from a sensor.  
Binder teaches using sensor information along with the current utterance and the past utterance history, the sensor information being obtained from a sensor. (Binder, Par. 0097:” Alternatively, or in addition, disambiguation module 350 determines a noun to which the pronoun refers as a name of an entity, an activity, or a location identified in a previous speech input associated with a previously tagged digital photograph. Alternatively, or in addition, disambiguation module 350 determines a noun to which the pronoun refers as a name of a person identified based on a previous speech input associated with a previously tagged digital photograph.”, and Par. 0098:” In some implementations, disambiguation module 350 accesses information obtained from one or more sensors (e.g., proximity sensor 214, light sensor 212, GPS receiver 213, temperature sensor 215, and motion sensor 210) of a handheld electronic device (e.g., user device 104) for determining a meaning of one or more of the terms. In some implementations, disambiguation module 350 identifies two terms each associated with one of an entity, an activity, or a location. For example, a first of the two terms refers to a person, and a second of the two terms refers to a location. In some implementations, disambiguation module 350 identifies three terms each associated with one of an entity, an activity, or a location.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, and Minamino in view of Binder for the processor calculates the matching degree between the utterances before and after the pause in terms of the semantic unit, using sensor information along with the current utterance and the past utterance history, the sensor information being obtained from a sensor in order to tune a voice trigger to improve its accuracy in recognizing the voice of a particular user, as evidence  by Binder (See Par. 0140).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over  Maas, and Minamino, as applied to claim 4, and in further view of Moniz (US10446147B1).

Moniz was applied in the previous Office Action.
Regarding claim 5, Maas, and Minamino teach an information processing apparatus.
Maas, and Minamino do not teach the information processing apparatus according to claim 4, wherein circuitry is configured to calculate the score in accordance with an amount of information obtained for each of the input hypotheses.
Moniz teaches wherein circuitry is configured to calculate the score in accordance with an amount of information obtained for each of the input hypotheses. (Moniz, Col. 12, line 64 – Col. 13, line 11:” The output of each recognizer may be an N-best list of intents and slots representing the particular recognizer's top choices as to the meaning of the utterance represented in the text data 300, along with scores for each item in the N-best list. For example, for text data 300 of “tell me why you told me the weather for Seattle,” the analysis domain recognizer 263-A may output an N-best list in the form of:
[0.95] DeterminePreviousProcessingIntent PreviousIntent: GetWeatherIntent Location: Seattle, Wash.
[0.02] DeterminePreviousProcessingIntent PreviousIntent: GetSongIntent ArtistName: Seattle
(66) [0.01] DeterminePreviousProcessingIntent PreviousIntent: GetSongIntent AlbumName: Seattle
(67) [0.01] DeterminePreviousProcessingIntent PreviousIntent: GetSongIntent SongName: Seattle”, and Col. 13, line 12 -32:” where the NER component 262-A of recognizer 263-A has determined that for different items in the N-best list the word “Seattle” corresponds to a slot. [Though different items in the N-best list interpret those slots differently, for example labeling “Seattle” as a location in one choice, an artist name in another choice, an album name in another choice, and a song name in another choice.] The IC component 264-A of recognizer  intent of the utterance represented in the text data 300 is a DeterminePreviousProcessingIntent [and selected that as the intent for each item on the analysis N-best list]. The recognizer 263-A also determined a score for each item in the list representing the recognizer's confidence that the particular item is correct. As can be seen in the example, the top item has the highest score. Each recognizer of the recognizers 335 may operate on the text data 300 substantially in parallel, resulting in a number of different N-best lists, one for each domain [e.g., one N-best list for music, one N-best list for video, etc.]. The size of any particular N-best list output from a particular recognizer is configurable and may be different across domains.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, and Minamino in view of Moniz to calculate the score in accordance with an amount of information obtained for each of the input hypotheses, in order to execute specific functionality in order to provide data or produce some other output called for by a user, as evidence by Moniz (See Col. 11, lines 23 -24).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over 
Mass, Minamino, Moniz as applied to claim 5, and in further view of Kuo (US20160253989A1).
Kuo was applied in the previous office action.

Maas, Minamino, and Moniz do not teach the information processing apparatus according to claim 5, wherein the score includes, for each of the input hypotheses, at least one or more of a first score that is obtained from a ratio of use of a function parameter corresponding to an intention of the user, a second score that is obtained from a language model related to information regarding a phrase in the utterance of the user, a third score that is obtained from a temporal interval between the current utterance and the past utterance history, or a temporal interval between the past utterance histories, or a fourth score that is obtained from a number of combinations of the current utterance and the past utterance history.
Kuo teaches a second score that is obtained from a language model related to information regarding a phrase in the utterance of the user (Kuo, Par. 0099:” … and performs one or more language model [LM] scoring calculations at 944 for both the reference results [e.g., test utterances 902]) and for the recognition results provided by the speech recognition component [or “build”]. As describe above, a language model typically determines a probability [or score] that an associated segment of speech is a particular word or sequence of words.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, Minamino, and Moniz in view of Kuo to incorporate a second score that is obtained from a language model related to information regarding a phrase in the utterance of the user, in order to provide substantial .

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over  Maas, Minamino as applied to claim 2, and in further view of Jang (US20170213569A1).

Jang was applied in the previous office action.
Regarding claim 8, Maas, and Minamino teach an information processing apparatus.
Maas, and Minamino do not teach the information processing apparatus according to claim 2, wherein the circuitry is configured to determine, on a basis of an interval between the pauses, whether or not to couple the parts of the utterance before and after the pause.
Jang teaches wherein the circuitry is configured to determine, on a basis of an interval between the pauses, whether or not to couple the parts of the utterance before and after the pause.  (Jang, Par. 0140:” The real utterance and the recognized sentence may not be complete because the electronic device 200 performs speech recognition over a long interval. In this case, the user may split the sentence into a sequence of chunks to change the EPD period of the electronic device 200 and, in this case, the electronic device 200 may automatically present candidate split positions in the sentence. Although the utterance has been recognized as one sentence, the electronic device 200 may check the pause periods between the phrases or words constituting the sentence. Accordingly, the electronic device 200 may show some short pause periods between the phrases or words as the candidate split positions.”, and Par. 0147:” Referring to part [a] of FIG. 11, the electronic device 200 may recognize speech, convert the speech to text, and display the text on the display 540. The electronic device 200 may recognize the utterances as two sentences of “Annyeonghaseyo” 1110 and “Mannaseo bangapsimnida” 1120 and may display the sentences separately. The electronic device 200 may display the text phrases with the corresponding pause periods of the displayed speech. If the EPD period of the electronic device 200 is shorter than the pause period between the utterances “Annyeonghaseyo” 1110 and “Mannaseo bangapsimnida” 1120, the electronic device 200 recognizes the utterances as two sentences [couple]. Although the two sentences are separated from each other, they may be handled as one sentence. Accordingly, the user may set the EPD period of the electronic device 200 to the pause period following the latter one of the two sentences to adapt the speech recognition function to the personal utterance characteristic.”, and Par. 0152:” The electronic device 200 may display the emerged sentence on the display 540 along with the operation of changing the EPD period.”, and Par. 0129:” Part [b] of FIG. 8 shows a screen display in which the sentences are displayed separately according to the user's selection. The electronic device 200 may display “Annyeonghaseyo” 830 and “Mannaseo bangapsimnida” 840 as separate sentences. The electronic device 200 may also display the pause periods of “100” 835 following “Annyeonghaseyo” and the pause period “300” 845 following “Mannaseo bangapsimnida” 840. The electronic device 200 may delete the session concerning “Annyeonghaseyo Mannaseo bangapsimnida” and manage the sessions of 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, and Minamino in view of Jang to determine, on a basis of an interval between the pauses, whether or not to couple the utterances before and after the pause, in order to improve the accuracy of the speech recognition result, as evidence by Jang (see Par. 0068).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over  Maas, Minamino as applied to claim 2, and in further view of Iwata (JP2009053303A).

Iwata was applied in the previous Office Action.
Regarding claim 9, Maas, and Minamino teach an information processing apparatus.
Maas, and Minamino do not teach the information processing apparatus according to claim 2, wherein the circuitry is configured to use only an effective utterance history out of the past utterance histories.
Iwata teaches wherein the circuitry is configured to use only an effective utterance history out of the past utterance histories. (Iwata, Page 12, lines 9-20:"The effective utterance extraction unit 41 extracts, from the conversation history acquired by the information acquisition unit 30, an utterance added with an annotation as an effective utterance. The reason why an annotated utterance can be considered as an effective utterance is that the annotated utterance is an utterance that has helped solve the history, knowledge useful for problem solving [hereinafter referred to as problem solving knowledge] is not necessarily expressed by only one statement. One problem solving knowledge may be explained by a plurality of statements. In such a case, there is a possibility that the utterance to which the annotation is added differs for each learning participant. In order to cope with such a situation, the speech group extraction unit 40 includes a speech number average value calculation unit 42, an effective speech determination unit 43, and an effective speech group extraction unit 44.").
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, and Minamino in view of Iwata to uses only an effective utterance history out of the past utterance histories, in order to provide a hint for problem solving, as evidence by Iwata (See ABS).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over  Mass, Minamino, Iwata as applied to claim 9, and in further view of Mochida (US 20190026266 A1).

Mochida was applied in the previous Office Action.
Regarding claim 10, Maas, Minamino, and Iwata teach an information processing apparatus.

Mochida teaches wherein the past utterance history includes information regarding a content of the utterance of the user and a time at which the utterance is made.  (Mochida, Par. 0090:” In the present exemplary embodiment, history database D1 manages a “start time,” an “end time,” an “input voice,” and an “operation content” in association with one another as shown in FIG. 7. In the example of FIG. 3A, in step S16, registration unit 142 records start time t1 and end time t2 in data items of “start time” and “end time” of history database D1, respectively.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, Minamino, and Iwata in view of Mochida to incorporate wherein the past utterance history includes information regarding a content of the utterance of the user and a time at which the utterance is made, in order to provide an electronic device or a system in which it is easy to communicate the content intended by the user who is speaking a speech as a target of the above speech processing, as evidence by Mochida (See Par. 0165).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over  Maas , Minamino, Binder as applied to claim 3, and in further view of Lloyd (US9076445B1).

Lloyd was applied in the previous Office Action.

Maas, Minamino and Binder do not teach the information processing apparatus according to claim 3, wherein the sensor information includes image data or position information, the image data being obtained by imaging an object, the position information indicating a position of the user.
Lloyd teaches wherein the sensor information includes image data or position information, the image data being obtained by imaging an object, the position information indicating a position of the user. (Lloyd, Col. 12, Lines 10 – 18:”The client device encodes audio for the yet unrecognized term, “pill” as audio data 305. The client device 102 also determines context information 307 that describes the context of audio data 305, for example, the date and time the audio data was recorded, the geographical location in which the audio data was recorded, and the active application and active document type when the audio data was recorded. The client device 302 sends the audio data 305 and the context information 307 to the speech recognition system 304.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Maas, Minamino and Binder in view of Lloyd to incorporate the sensor information includes image data or position information, the image data being obtained by imaging an object, the position information indicating a position of the user, in order for, a speech recognition system to identify a location where a speaker is currently speaking A language model of the speech recognition system can be adjusted so that words that were previously typed or spoken near that location can be .

Claims 13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kamatani, Weng, Divay, Mintz et al. (US20100161604A1)(hereinafter "Mintz").

Divay was applied in the previous Office Action.
Note: Each of the functional elements of claim 13 are identical to that of claim 16, therefore claims 13 and 16 are mapped together.
Regarding claims 13 and 16, Kamatani teaches a [[past utterance history ]]that matches spoken-language understanding processing on an utterance including a pause; (Kamatani, Par. 0003:” ... a technique of dividing an utterance including a pause given as a voiceless section into processing pieces is available.”, and Par. 0044:” Assume that a speaker continuously utters the contents “[Image Omitted] [V] [Image Omitted] [V] [Image Omitted] [Image Omitted] [Image Omitted] [V] [Image Omitted] [Tokyotonaide [V] hoteruwo yoyakushi [V] tainodesuga yoihoteruwo mitsukete [V] moraemasenka]” and that the simultaneous speech processing apparatus 100 acquires this utterance. A symbol [V] indicates a position recognized as a voiceless section [pause] by the simultaneous processing apparatus 100.”).
converting the current utterance of the user into a computer-executable command. (Kamatani, Par. 0060:” The second embodiment is different from the first embodiment in that information about a processing piece character string is updated depending on an utterance processing piece which correctly reflects the intention of an original utterance can be output.”). Note: Taking an action on the intention is equivalent on an executable command.
Kamatani does not teach that extracts, from past utterance histories of a user, a past utterance history that [[matches spoken-language understanding processing on an utterance including a pause]]; transmit the extracted past utterance history to an information processing apparatus along with a speech signal corresponding to a current utterance of the user, the speech signal not including the pause, the information processing apparatus performing the spoken-language understanding processing.
Weng teaches extracts, from past utterance histories of a user, (Weng, Par. 0022:” In one embodiment, dialog strategy component 114 keeps track of the user utterances, semantic content and data obtained from the user utterances in the past to recognize the current utterance during the interaction.)
a past utterance history that [[matches spoken-language understanding processing on an utterance including a pause]] (Weng, Par. 0022:” In one embodiment, dialog strategy component 114 keeps track of the user utterances, semantic content and data obtained from the user utterances in the past to recognize the current utterance during the interaction. Confidence levels are utilized to measure the accuracy of the recognition. One or more threshold confidence levels may be defined to implement the process. Specifically, if the confidence score of the current recognized utterance is high, the recognized utterance, 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kamatani in view of Weng to extract, from past utterance histories of a user in order to improve the recognition accuracy of names in the user utterance, as evidence by Weng (See Par. 0016).
Neither Kamatani nor Weng teaches transmit the extracted past utterance history to an information processing apparatus along with a speech signal corresponding to a current utterance of the user, the speech signal not including the pause, the information processing apparatus performing the spoken-language understanding processing.
Divay teaches transmit the extracted past utterance history to an information processing apparatus along with a speech signal corresponding to a current utterance of the user, (Divay, Par. 0068:” When information for a particular insertion point is not all within a current utterance 505, the information from a previous utterance 510 and next utterance 515 also may be needed as defined by the dictation module 304 and then transmitted to the AP server 302.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kamatani and Weng in view of Divay to transmit section that transmits the extracted past utterance history to an information processing apparatus along with a speech signal corresponding to a current utterance of the user, in order to represent acoustic model of silence and various types of environmental noise, as evidence by Divay (See Par. 0006).

Mintz teaches the speech signal not including the pause, the information processing apparatus performing the spoken-language understanding processing and (Mintz, Par. 0039: “On optional step 205, the interactions received on step 200, and in particular the vocal interactions or the vocal part of interactions, optionally undergo preprocessing, such as speaker separation, noise reduction, silence removal, or the like.”, Par. 0042:”The engines used during advanced analysis step 220 may include but are not limited to data mining, text mining, root cause analysis, link analysis, contextual analysis, text clustering, pattern recognition, hidden pattern recognition, prediction algorithms, semantic mapping, Natural Language Processing [NLP], Online analytical processing [OLAP] cube analysis, or others.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kamatani, Weng, and Divay in view of Mintz to employ speech signal not including the pause, the information processing apparatus performing the spoken-language understanding processing, in order to provide the data and relationship for the query engine, as evidence by Mintz (See Par. 0053)

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Kamatani, Weng, Divay, and Mintz, as applied to claim 13, in further view of  Yasavur  (US20180144738A1).

Yasavur was applied in the previous Office Action.

Kamatani, Weng, Divay, and Mintz do not teach the information processing apparatus according to claim 13, wherein the circuitry is configured to extract the past utterance history on a basis of extraction information including a time interval between pauses, information regarding a falter in wording, information regarding a speaker, or line-of-sight information regarding the user. 
Yasavur teaches wherein the circuitry is configured to extract the past utterance history on a basis of extraction information including a time interval between pauses, information regarding a falter in wording, information regarding a speaker, or line-of-sight information regarding the user. (Yasavur, Par. 0014:” In some embodiments, the first utterance, the second utterance or both are a sentence or a paraphrase.”, and Par. 0087:” The method can also involve identifying at least one of a goal of the user, a piece of information needed to satisfy a goal [e.g., slot], or a dialogue act from the user utterance [Step 520]. The utterance can be determined to be a goal, slot and/or dialogue act by parsing [extracting] the utterance and/or recognizing the intent of the utterance. The parsing can be based on identifying patterns in the utterance by comparing the utterance to pre-defined patterns.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kamatani, Weng, Divay, and Mintz in view of Yasavur to extracts the past utterance history on a basis of extraction information including an information regarding a speaker, in order to determine similarity 

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over  Kamatani, Weng, Divay, Mintz, and Yasavur as applied to claim 14, in further view of  Zhao (US 20180075842A1).

Zhao was applied in the previous Office Action.
Regarding claim 15, Kamatani, Weng, Divay, Mintz, and Yasavur teach an information processing apparatus.
Kamatani, Weng, Divay, Mintz, and Yasavur do not teach the information processing apparatus according to claim 14, wherein the circuitry is configured to transmit the past utterance history on a basis of a maximum number of the past utterance histories for transmission or a maximum data size of the past utterance history for transmission.  
Zhao teaches wherein the circuitry is configured to transmit the past utterance history on a basis of a maximum number of the past utterance histories for transmission or a maximum data size of the past utterance history for transmission.  (Zhao, Par. 0047:” At step 340, the quantity of speech received from the vehicle occupant is determined. If the quantity of speech is above a particular threshold, the received speech can be processed at the vehicle 12. If the quantity of speech is below a particular threshold, the method proceeds to step 380 and can be processed at the remote speech processing facility. The ASR system 210 can measure the amount of speech received using a timer maintained by the processor 52. The ASR system 210 sending longer speech segments to the remote facility thereby minimizing the amount of data sent via the wireless carrier system 14 and reducing data communications costs.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kamatani, Weng, Divay, Mintz, and Yasavur in view of Zhao to transmit the past utterance history on a basis of a maximum number of the past utterance histories for transmission or a maximum data size of the past utterance history for transmission, in order to identify conditions when it is more advantageous to send speech to the remote facility rather than performing speech recognition at the vehicle, as evidence by Zhao (see Par. 0002).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Komuma (US 20160217783 A1) teach a voice recognition processing apparatus that includes a voice acquirer, a first voice recognizer, a storage device, and a recognition result determiner. The first voice recognizer converts the voice information into first information. The recognition result determiner compares the first information with the exclusion vocabulary to .
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/HUYEN X VO/Primary Examiner, Art Unit 2656