DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and  In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969). A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).
Claims 21-25 and 28-30 are rejected on the ground of nonstatuatory obviousness-type double patenting as being unpatentable over claims 1 and 2, respectively, of U.S. 11043214. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 21-30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea – i.e., a mental activity - without significantly more. Specifically, the claim(s) recite(s) steps for converting text to speech output. The claim recites a series of steps and, therefore, is a process (Step 1). The claim(s) recite(s) a mental process (Step 2A, Prong One). The processor in all steps is recited at a high level of generality, i.e., as a generic processor performing a generic computer function of processing data. These generic processor limitations are no more than mere instructions to apply the exception using a generic computer component (Step 2A, Prong Two). As discussed previously with respect to Step 2A Prong Two, the additional element in the claim amounts to no more than mere instructions to apply the exception using a generic computer component. The same analysis applies here in 2B, i.e., mere instructions to apply an exception using a generic computer component cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is ineligible.

Claim Rejections - 35 USC § 102
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 21-24, 27-34, and 37-40 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 20200152184, hereinafter referred to as Henderson et al.

Regarding claim 21 (New), Henderson et al. discloses a computer-implemented method, comprising: 

receiving input audio data corresponding to a first utterance (“In S201, an automatic speech recognition (ASR) step is performed, to generate a text signal from the audio input,” Henderson et al., para [0132]. Also, “In use, the system receives an audio input from a human user. The audio input may be received through a microphone located remotely from the system, and the audio signal transmitted to the dialogue system for example. Each audio input corresponds to a turn in the dialogue. A dialogue turn is a conversational unit within turn-taking dialogues, which comprises the previous system utterance and the current user utterance,” Henderson et al., para [0131].); 

performing speech recognition using the input audio data to determine first data (“In S201, an automatic speech recognition (ASR) step is performed, to generate a text signal from the audio input. The generated text signal may be referred to as a text hypothesis. Any method of ASR may be used in this step,” Henderson et al., para [0132]. Here, the first data is the resulting text.); 

encoding the first data to generate first encoded data (“Generating features from the input signal may further comprise converting the input signal to vector representations and inputting the vector representations to the first trained model,” Henderson et al., para [0047]. The vector representation is encoded data. And, “Both the delexicalised versions of the utterance and system output, and the original utterances and system outputs are then transformed into vector representations,” Henderson et al., para [0152].); 

receiving second encoded data corresponding to a previous utterance (Henderson et al., para [0054]), the previous utterance being received prior to the first utterance (“Generating features from the previous dialogue system output or system dialogue act may comprise generating features for a slot and value combination, comprising,” Henderson et al., para [0050]. Also, as explained by Henderson et al., para [0106] and [0131], each dialogue turn contains a current user utterance. Thus, a previous turn contains a previous user utterance. The second data is encoded as second encoded data in the same manner as the first data (i.e., vectorized).); 

processing the first encoded data and the second encoded data using a model to determine model output data representing an understanding of the first utterance (“updating a belief state based on the outputs of the classifier models,” Henderson et al., para [0076]. Also, “The previous model based on the second encoded data is updated using the first encoded data “In S202, a spoken language understanding step is performed. This gives a “turn-level” prediction. Although the term “spoken” language is used, as explained above, the input signal may be a text signal. A dialogue state tracking stage is performed in S203, which generates an updated belief state,” Henderson et al., para [0134]. And, “In an embodiment, the belief state update depends on two factors: 1) the latest user utterance (i.e., the probability distributions over the slot, value and relation combinations generated in S303); and 2) previous belief state history (in this case the previous belief state, i.e. the probability distributions over the slot, value and relation combinations stored in the belief state after the previous dialogue turn). The most recent belief state reflects previous state history. The relative weight of the impact of the two components determines the accuracy of the current belief state,” Henderson et al., para [0182].); and 

generating output data using the model output data (“In S303, classifier models extract turn-level user goals, which are then incorporated into the belief state in S304,” Henderson et al., para [0141]. “The output of S303 comprises a probability value corresponding to a plurality of (slot, value, relation) combinations in the domain ontology. The classifier output for the example slot value combination is shown in 606 of FIG. 6(a). These probability values are then used to update the belief state in S304. The belief state stores the current belief in what the current user goals are,” Henderson et al., para [0180].).  

As to claim 31, computer-implemented method claim 21 and system claim 31 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 31 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Regarding claim 22 (New), Henderson et al. discloses the computer-implemented method of claim 21, further comprising: 

receiving second data corresponding to the previous utterance (Henderson et al., para [0054]. And, “Generating features from the previous dialogue system output or system dialogue act may comprise generating features for a slot and value combination, comprising,” Henderson et al., para [0050]. Also, as explained by Henderson et al., para [0106] and [0131], each dialogue turn contains a current user utterance. Thus, a previous turn contains a previous user utterance. The second data is encoded as second encoded data in the same manner as the first data (i.e., vectorized).); and 

encoding the second data to generate the second encoded data (Henderson et al., para [0054]).  

As to claim 32, computer-implemented method claim 22 and system claim 32 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 32 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Regarding claim 23 (New), Henderson et al. discloses the computer-implemented method of claim 22, further comprising: 

determining third data representing content of a user input corresponding to the previous utterance (Henderson et al., para [0054]. And, “Generating features from the previous dialogue system output or system dialogue act may comprise generating features for a slot and value combination, comprising,” Henderson et al., para [0050]. Also, as explained by Henderson et al., para [0106] and [0131], each dialogue turn contains a current user utterance. Thus, a previous turn contains a previous user utterance. The third data is encoded as third encoded data in the same manner as the first data (i.e., vectorized).); and 

including the third data in the second data (The “second data” is defined as corresponding to the previous utterance. Thus, the “third data”, which represents content of a user input corresponding to the previous utterance, is necessarily included in the second data.).  

As to claim 33, computer-implemented method claim 23 and system claim 33 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 33 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Regarding claim 24 (New), Henderson et al. discloses the computer-implemented method of claim 22, further comprising: 

determining third data representing content of a system response to the previous utterance (“In steps S302 and S303, a spoken language understanding (SLU) process is performed to extract turn-level user goals, which are then incorporated into a belief state in the subsequent step S304, in which dialogue state tracking is performed. The belief state is then used by the dialogue manager to choose an appropriate system response in S305,” Henderson et al., para [0140]. And, “The belief state thus dictates which system act is chosen. The role of the dialogue manager, or policy model, is to choose an appropriate system response following the latest user utterance,” Henderson et al., para [0200].); and 

including the third data in the second data (Henderson et al., fig. 11 shows a dialog exchange between a user and the system. The example shows that the phrase “The Gallery West Central” is included in both user utterances and system responses.).  

Regarding claim 27 (New), Henderson et al. discloses the computer-implemented method of claim 21, further comprising: 

determining weight data based at least in part on the second encoded data, wherein the model uses the weight data to determine the model output data (“wherein the method further comprises updating the vector representations and the first model parameters,” Henderson et al., [0088]. And, “The gradient for each parameter is then used to calculate the updated parameter from the previous values using an optimizer function (i.e. a gradient descent type optimiser function),” Henderson et al., para [0554]. Parameters are equivalent to weights.).  

Regarding claim 28 (New), Henderson et al. discloses the computer-implemented method of claim 21, further comprising: 

determining third encoded data representing a topic of at least one of the first utterance or the previous utterance, wherein the model further processes the third encoded data to determine the model output data (“According to an embodiment, there is provided a method of generating data for training a dialogue system, comprising: obtaining domain information comprising information identifying a plurality of dialogue slots and values, each dialogue slot corresponding to a subject that a speech or text signal may relate to,” Henderson et al., para [0110]-[0111]. The subject is interpreted as topic data.).  

As to claim 38, computer-implemented method claim 28 and system claim 38 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 38 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Regarding claim 29 (New), Henderson et al. discloses the computer-implemented method of claim 21, wherein the first data represents a plurality of speech recognition hypotheses (“For example, a trained speech recognition algorithm based on a neural network or Hidden Markov Model may be used. ASR models may assign posterior probabilities to words in an utterance given the input acoustic signal. The ASR output may take the form of an N-best list, which approximates the full posterior distributions over the ASR hypotheses by returning the top N most probable hypotheses with their respective probabilities…In an embodiment, only the top scoring ASR hypothesis is output from S201 and used in the subsequent steps,” Henderson et al., para [0132]. The N most probable hypotheses is interpreted as at least first and second scores corresponding to first and second ASR hypotheses, respectively.).  


As to claim 39, computer-implemented method claim 29 and system claim 39 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 39 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Regarding claim 30 (New), Henderson et al. discloses the computer-implemented method of claim 21, wherein: 

the first data indicates a first speech recognition hypothesis as a most likely hypothesis (“For example, a trained speech recognition algorithm based on a neural network or Hidden Markov Model may be used. ASR models may assign posterior probabilities to words in an utterance given the input acoustic signal. The ASR output may take the form of an N-best list, which approximates the full posterior distributions over the ASR hypotheses by returning the top N most probable hypotheses with their respective probabilities…In an embodiment, only the top scoring ASR hypothesis is output from S201 and used in the subsequent steps,” Henderson et al., para [0132]. The N most probable hypotheses is interpreted as at least first and second scores (i.e., hypothesis likelihoods) corresponding to first and second ASR hypotheses, respectively.); and 

the model output data indicates a second speech recognition hypothesis as a most likely hypothesis, the second speech recognition hypothesis being different from the first speech recognition hypothesis (“For example, a trained speech recognition algorithm based on a neural network or Hidden Markov Model may be used. ASR models may assign posterior probabilities to words in an utterance given the input acoustic signal. The ASR output may take the form of an N-best list, which approximates the full posterior distributions over the ASR hypotheses by returning the top N most probable hypotheses with their respective probabilities…In an embodiment, only the top scoring ASR hypothesis is output from S201 and used in the subsequent steps,” Henderson et al., para [0132]. The N most probable hypotheses is interpreted as at least first and second scores (i.e., hypothesis likelihoods) corresponding to first and second ASR hypotheses, respectively.).

As to claim 40, computer-implemented method claim 30 and system claim 40 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 40 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 25 and 35 are rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20200152184, hereinafter referred to as Henderson et al., in view of US 20160055240, hereinafter referred to as Tur et al.

Regarding claim 25 (New), Henderson et al. discloses the computer-implemented method of claim 21, further comprising: 

wherein the model further processes the third encoded data to determine the model output data (Henderson et al., para [0141] and [0180].).  

Henderson et al., though, does not disclose determining third encoded data representing parts-of-speech of at least one of the first utterance or the previous utterance.

Tur et al. is cited to disclose determining third encoded data representing parts-of-speech of at least one of the first utterance or the previous utterance (“Because the orphan detector relies more on structure than content, syntactic features may also be used by the orphan classifier. The baseline syntactic feature for use in orphan determination is part-of-speech tag n-grams. Certain parts-of-speech appearing as the first word in an utterance provide a good indicator as to whether or not the utterance is an orphan. For example, the utterance is more likely to be an orphan when the part-of-speech of the first word is a modal (e.g., “could”) or a base form verb (e.g., play) than when the part-of-speech of the first word is a proper noun. Similarly, other parts-of-speech that are good indicators the utterance is likely to be an orphan include base personal pronouns (e.g., “I”) or genitive personal pronouns (e.g., “my”) appearing as the first word of the utterance,” Tur et al., para [0046].). Tur et al. benefits Henderson et al. by providing an orphan detector to distinguish orphans from web search queries and other out-of-domain utterances by focusing primarily on the structure of the utterance rather than the content, thereby improving user experiences with targeted language understanding dialog systems (Tur et al., Abstract). Therefore, it would be obvious for one skilled in the art to combine the teachings of Henderson et al. with those of Tur et al. to improve the dialog system of Henderson et al.

As to claim 35, computer-implemented method claim 25 and system claim 35 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 35 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.

Claims 26 and 36 are rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20200152184, hereinafter referred to as Henderson et al., in view of US 20180329957, hereinafter referred to as Frazzingaro et al.

Regarding claim 26 (New), Henderson et al. discloses the computer-implemented method of claim 21, further comprising: 

wherein the model further processes the third encoded data to determine the model output data (Henderson et al., para [0141] and [0180].).  

Henderson et al., though, does not disclose determining third encoded data representing a device corresponding to the first utterance.

determining third encoded data representing a device corresponding to the first utterance (“For example, with reference to FIG. 10A, a user provides a natural-language speech input 1012 “Hey Siri, give me directions to Starbucks” to a digital assistant of electronic device 1000. Accordingly, the electronic device 1000 can include, in the first set of data, the type of the input 1012 (e.g., speech), timing information of the input 1012 (e.g., time stamp when the utterance is detected by the electronic device), and/or the content of the input 1012 (e.g., an audio recording of the utterance),” Frazzingaro et al., para [0252]. Here, the encoded data includes the name of the device (i.e., Siri).). Frazzingaro et al. benefits Henderson et al. by having the user specify the name of the device to which the command is addressed, thereby helping to avoid command mis-interpretations (Frazzingaro et al., para [0252]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Henderson et al. with those of Frazzingaro et al. to improve the dialog system of Henderson et al.  

As to claim 36, computer-implemented method claim 26 and system claim 36 are related as system and computer-implemented method of using the same, with each claimed element’s function corresponding to the computer-implemented step. Accordingly claim 36 is similarly rejected under the same rationale as applied above with respect to computer-implemented method claim. Also, Henderson et al., para [0350], teaches a processor and memory.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656