DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 2/3/2022, the applicant has submitted an amendment, filed 6/21/2022, amending claims 1 and 11, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but are moot with respect to new grounds of rejections further in view of Skobeltsyn et al. (US 2016/0063994) mandated by the latest amendments.
Response to Arguments
Following a brief outline of the latest amendments on page 7 ¶ 1, in the second ¶ last 4 lines of that page it is concluded: “Nowhere does Behzadi show or contemplate a second query that is identified, based on a trigger word in the second query, as intended to correct a previous query. Accordingly, Behzadi does not anticipant Applicant’s independent claims”.
For the latest amendments, the new reference above is used. Therefore, please visit the new office action for further details.
As regards the dependent claims, in the remainder of page 7 it is argued that e.g. “Gadd discusses a voice interface, but also fails to teach the subject matter at issue” “Accordingly, independent claims 1 and 11, and all claims which dependent therefrom, are patentable over the art of record” (page 7  last ¶ through to page 8 ¶ 2).
Since applicants have not argued the merits of these dependent claims, but assert patentability solely through their dependence on the allegedly patentable parent claims, they stand or fall with said parent claims and hence no further response to applicant’s arguments is necessary.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4-5, 7, 9-11, 14-15, 17, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Behzadi et al. (US 2018/0012594), and further in view of Skobeltsysn et al. (US 2016/0063994).

Regarding claim 1, Behzadi et al. do teach a method for improving content discovery in response to a voice query (Title, Abstract, and: ¶ 0030 last sentence: “commonly incorrect transcriptions may be associated with correctly identified follow-up queries” (improving content in response to a voice query)), 
the method comprising: 
generating a first transcription of a first voice query (¶ 0022 sentence 2: “The system 100 includes an ASRM 110 that is capable of receiving audio data 104b encoding utterance of a voice query of the user 102” (a first voice query received) “and context data 104c associated with the audio data” “and generating a transcription 104d of the audio data 104b” (generating its transcription), e.g., ¶ 0044 lines 2+: “initial user query” “transcribed” “MUSEUMS IN PARIS”, or see ¶ 0030 lines 6+ “initial voice query is” “OPENING HOURS OF LUFRE” (initial voice query)); 
identifying, based on the first transcription, a context of the first voice query and a first plurality of candidate entities to which the first voice query refers (¶ 0022 sentence 2: “The system 100 includes an ASRM 110 that is capable of receiving audio data 104b encoding utterance of a voice query of the user 102” (a first voice query received) “and context data 104c associated with the audio data” (a context of the first query identified)“and generating transcription 104d of the audio data 104b”; ¶ 0029 lines 1+: “In some implementations, instead of specifying follow-up queries” “for a particular initial voice query, the query mappings within the table 120 may instead specify one or more terms or entities” (a plurality of candidate entities) “pre-associated with the” “initial voice query” (to which the first voice query refers)); 
performing a first search based on the context of the first voice query and the first plurality of candidate entities (¶ 0031 lines 5+: “the user 102 may submit the initial voice query” (for the first voice query) “104a as a form of input to a search engine to perform search for terms” (doing a first search based on the “terms” (plurality of candidate entities)); ¶ 0033 sentence 2: “search results” depend on “location”, where “location” according to ¶ 0022 is part of “context data”, i.e., “context data 104c associated with” “e.g., user location” (search results also depend on context data of the first voice query)); 
generating for output a first search result of the first search (¶ 0031 lines 8+: “The search results data are returned” (generating an output of the first search result for the first voice query of the first search) “for use” “in processing a subsequent voice query”); 
generating a second transcription of a second voice query (¶ 0044 lines 2+: “after transmitting an initial user query” “e.g.” “MUESEUMS IN PARIS” “the user then transmits a subsequent voice query” (a second voice query) “may be transcribed as” (transcription), or ¶ 0030 lines 6+ “OPENING HOURS OF LOUVRE” is “follow up” (second voice query and transcription) of “initial voice query” “OPENING HOURS OF LUFR”); 
determining whether the second transcription includes a trigger term (¶ 0044 lines 2+: “after transmitting an initial query” “e.g.” “MUESEUMS IN PARIS” “the user then transmits a subsequent voice query” (a second voice query) “may be transcribed as” “HOURS OF LOO” “HOURS OF LOUIE’S” “or” “HOURS OF LOUVRE” (the second transcription comprising of trigger term “HOURS”); likewise in ¶ 0030 lines 6+ “HOURS” is a trigger term in “OPENING HOURS OF LOUVRE”) ; 
and in response to determining that the second transcription includes a trigger term: retrieving the context of the first voice query (¶ 0044 last 7 lines: “HOURS OF LOUVRE” is “pre-associated” (retrieving) “with the” “PARIS” (the “location” (context) of the first voice query) “term included in the prior voice query” (based on the trigger term “HOURS” which is associated with “LOUVRE” which is a location in “PARIS”)); 
identifying, based on the second transcription, a second term of the second voice query that is similar to a term of the first voice query (¶ 0030 sentence 1: “query mappings included within the table 120 further include follow-up queries that are identified” (identifying from terms in the “follow-up” (second transcription)) “as such because of their phonetic similarity” (that are similar to) “to top transcription hypothesis for the initial voice query 104a” (with e.g. a first term and/or terms of the “initial” (first) voice query), e.g. ¶ 0030 lines 6+: “initial voice query is” “OPENING HOURS OF LUFRE” with “OPENING HOURS OF LOUVRE” “as a follow-up query based on the phonetic similarity between the terms” “LUFRE” “and” “LOUVRE” (“LOUVRE” is a second term in the second voice query that is similar to “LUFRE” which is a term in the first voice query; also the term “LOUVRE” above, see ¶ 0044 last 7 lines: “HOURS OF LOUVRE” is “pre-associated” “with the” “PARIS” “in the prior voice query”);
identifying a second plurality of candidate entities to which the second term refers  (¶ 0044 lines 6+: “utterance” (the second voice query) “may be transcribed either as” “HOURS OF LOO” “HOURS OF LOUIE’S” “or” “HOURS OF LOUVRE” (a second  plurality of candidate entities corresponding to “LOUVRE” (the second term)); 
performing a second search based on the second plurality of candidate entities and the context (¶ 0031 last sentence: “The search results data returned by the search engine may then be logged and included with the table 120 for use by the ASRM 110 in processing” (performing a second search for) “a subsequent voice query” (second query e.g. for “HOURS OF LOUVRE” “HOURS OF LOO” “HOURS OF LOUIE’S” (based on the second plurality of candidate entities associated with the context of location associated with “LOUVRE”)); 
and generating for output a second search result of the second search (¶ 0005 last sentence: “providing” “one or more search results” (generating for output associated with search results) “associated with the transcription of the subsequent” (for the second query or obtaining second search result) “utterance for output to the user” (for e.g. Second search associated with “HOURS OF LOUVRE”)).
Behzadi et al. do not specifically disclose:
Wherein the trigger term indicates that the user intends to correct the first voice query.
Skobeltsyn et al. do teach:
Wherein the trigger term indicates that the user intends to correct the first voice query (¶ 0027: “System receives a first voice query from a user device 302”, “for example”, “who is the President of France?”; ¶ 0028 sentence 1: “recognition output” “transcription of the received voice query” (generation of transcript of the “query”); ¶ 0028 lines 11+: “The user can then examine the presented recognition output” e.g., transcription “may be recognized as”: “who is the President of friends”; ¶ 0030 sentence 1: “The system receives a second voice query” “[which] can be a correction query” “For example” “in response to the recognition output” “who is the President of friends” “may be” “no I meant France” (“meant” (a trigger term uttered in the “second” “user” “query” is intended to correct the first “voice” “query” transcription); i.e., according to the abstract lines 6-7: “the second voice query triggers a correction request”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “QUERY REWRITE CORRECTIONS” techniques of Skobeltsysn et al. into “FOLLOW-UP VOICE QUERY PREDICTION” of Behzadi et al., would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. “system” “[to] determine whether a correction request is triggered by the second voice query” as disclosed in Skobeltsysn et al. ¶ 0031 sentence 1, and thereby user gets to help the system in determining corrections to transcriptions by the system.


Regarding claim 4, Behzadi et al. do teach the method of claim 1, wherein identifying a context of the first voice query further comprises identifying, based on the first transcription, a keyword associated with a type of content (both the words “MUSEUMS” and “PARIS” in “MUSEUMS IN PARIS” (¶ 0044 line 3) are keywords, one is associated with a location context or context and another with a cultural or touristic context or content).

Regarding claim 5, Behzadi et al. do teach the method of claim 1, wherein the first transcription and the second transcription are generated using a voice transcription model (Fig. 1 “LANGUAGE MODEL” “130” comprises of “130a” and “130b” (a voice transcription model); i.e., according to ¶ 0053 last sentence: “an initial language model 130a to generate the transcription 104d of the utterance of the initial voice query 104a” (the voice transcription model to generate “104d” (the first transcription)); ¶ 0057 sentence 2: “using the language model 130b to generate the transcription 106d of the utterance of the subsequent voice query 106d” (the second voice transcription generated using the voice transcription model “130”)).

Regarding claim 7, Behzadi et al. do teach the method of claim 1, wherein the first voice query contains at least a first term and the second voice query contains at least a second term in addition to the trigger term (¶ 0030 lines 6-9: “initial voice query is” “OPENING HOURS OF LUFRE” (the first voice query) “OPENING HOURS OF LOUVRE” “as a follow-up question” (the second voice query: i.e., they both have at least one extra term in addition to “HOURS” (the trigger term)), 
the method further comprising:
comparing the first term with the second term (¶ 0046 sentence 2: “compare identified terms” (comparing e.g., a first term in) “within the initial voice query 104a” (in the first voice query e.g. “LUFRE”) “and the subsequent voice query 106” (and the second query terms (e.g. a second term “LOUVRE”));
determining, based on the comparing, whether the second term is phonetically similar to the first term  (¶ 0030 sentence 1: “query mappings included within the table 120 further include follow-up queries that are identified” “as such because of their phonetic similarity” (using phonetic similarity comparison between “follow-up queries” (e.g. the second terms attributed to the second voice query)) “to top transcription hypothesis for the initial voice query 104a” (with e.g. the first term in the initial voice query); i.e., here “LUFRE” (the first term) is compared “phonetically” with “LOUVRE”) ; and
in response to determining that the second term is phonetically similar to the first term:
modifying an entity recognition model (¶ 0021 sentence 1: “to improve voice recognition accuracy by identifying a set of follow-up voice queries that are likely to be subsequently provide by a user” (in response to follow up queries (e.g., the “phonetically similar[]” second query in ¶ 0030 lines 6-9)) “adjusting” (modifying) “a language model” (an entity recognition model i.e., from “130a” to “130b” in Fig. 1)); 
and identifying the second plurality of candidate entities to which the second term refers using the modified entity recognition model (¶ 0030 sentence 1: “query mappings included within the table 120 further include follow-up queries that are identified” (identifying e.g. 2nd plurality of candidate entities) “as such because of their phonetic similarity” “to top transcription hypothesis for the initial voice query 104a” (and this is achieved by using the “language model” “130b” (the modified entity recognition model)).

Regarding claim 9, Behzadi et al. do teach the method of claim 1, wherein identifying a first plurality of candidate entities further comprises:
identifying at least one phrase in the first transcription (¶ 0027 lines 1+: “in response to receiving the audio data 104b” “of the initial voice query 104a, the ASRM 110 transcribes the n-gram” “MUSEUMS IN PARIS” (a phrase) “as the initial transcription” (identified in the first transcription));
determining a plurality of variants of the at least one phrase (¶ 0028 lines 4+: “the ASRM 110 may identify follow up queries” (determining a plurality of variants of) “given the terms included within the voice query 104a” (of the at least one phrase associated with the first voice query)); and
mapping the at least one phrase to at least one entity based on the plurality of variants (¶ 0028 lines 4+: “the ASRM 110 may identify follow up queries” (based on the plurality of variants of) “given the terms included within the voice query 104a” “the ASRM 110 may identify follow-up queries for the user 102 based on accessing a set of query mappings” (mapping) “that specifies a follow up query” (at least one entity) “for an initial voice query” (to the phrase associated with the first voice query)).

Regarding claim 10, Behzadi et al. do teach the method of claim 1, further comprising:
determining whether the second voice query was received within a threshold amount of time from a time at which the first voice query was received (¶ 0034 sentence 3: “the value of association score may reflect the likelihood that terms included within particular initial voice query and the follow-up query” (the “initial” (first) and the “follow-up” (second) voice queries) “are repeatedly sent within a particular period of time” (are received within a threshold amount of time from each other)) ;
wherein retrieving the context of the first voice query occurs in response to determining that the second voice query was received within the threshold amount of time from the time at which the first voice query was received (¶ 0035 sentence 2: “table 120 further specifies a higher associated score for the follow-up query "OPENING HOURS OF LOUVRE"” (the second query “OPENING HOURS OF LOUVRE” which is according to ¶ 0044 last 7 lines “pre-associated with the “Paris term”” (is associated with the location or retrieves the location context) “included in the prior voice query” (of the first voice query) is selected based on a “score” which according to ¶ 0034 “reflects the likelihood” “initial voice query and the follow-up query” “are” “sent within a particular period of time” (in response to determining that the second voice query was received within the threshold amount of time from the first voice query)).

Regarding claim 11, Behzadi et al. do teach a system for improving content discovery in response to a voice query (Title, Abstract, and: ¶ 0030 last sentence: “commonly incorrect transcriptions may be associated with correctly identified follow-up queries” (improving content in response to a voice query)), 
the system comprising: 
memory (¶ 0069 lines 1+: “The memory 364 stores information within the computing device 350”);
and control circuity (¶ 0074 sentence 1: “Various implementations of the systems and methods described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations of such implementations”)
configured to:
generate a first transcription of a first voice query (¶ 0022 sentence 2: “The system 100 includes an ASRM 110 that is capable of receiving audio data 104b encoding utterance of a voice query of the user 102” (a first voice query received) “and context data 104c associated with the audio data” “and generating a transcription 104d of the audio data 104b” (generating its transcription), e.g., ¶ 0044 lines 2+: “initial user query” “transcribed” “MUSEUMS IN PARIS”, or see ¶ 0030 lines 6+ “initial voice query is” “OPENING HOURS OF LUFRE” (initial voice query)); 
identify, based on the first transcription, a context of the first voice query and a first plurality of candidate entities to which the first voice query refers (¶ 0022 sentence 2: “The system 100 includes an ASRM 110 that is capable of receiving audio data 104b encoding utterance of a voice query of the user 102” (a first voice query received) “and context data 104c associated with the audio data” (a context of the first query identified)“and generating transcription 104d of the audio data 104b”; ¶ 0029 lines 1+: “In some implementations, instead of specifying follow-up queries” “for a particular initial voice query, the query mappings within the table 120 may instead specify one or more terms or entities” (a plurality of candidate entities) “pre-associated with the” “initial voice query” (to which the first voice query refers)); 
perform a first search based on the context of the first voice query and the first plurality of candidate entities (¶ 0031 lines 5+: “the user 102 may submit the initial voice query” (for the first voice query) “104a as a form of input to a search engine to perform search for terms” (doing a first search based on the “terms” (plurality of candidate entities)); ¶ 0033 sentence 2: “search results” depend on “location”, where “location” according to ¶ 0022 is part of “context data”, i.e., “context data 104c associated with” “e.g., user location” (search results also depend on context data of the first voice query)); 
generate for output a first search result of the first search (¶ 0031 lines 8+: “The search results data are returned” (generating an output of the first search result for the first voice query of the first search) “for use” “in processing a subsequent voice query”); 
generate a second transcription of a second voice query (¶ 0044 lines 2+: “after transmitting an initial user query” “e.g.” “MUESEUMS IN PARIS” “the user then transmits a subsequent voice query” (a second voice query) “may be transcribed as” (transcription), or ¶ 0030 lines 6+ “OPENING HOURS OF LOUVRE” is “follow up” (second voice query and transcription) of “initial voice query” “OPENING HOURS OF LUFR”); 
determine whether the second transcription includes a trigger term (¶ 0044 lines 2+: “after transmitting an initial query” “e.g.” “MUESEUMS IN PARIS” “the user then transmits a subsequent voice query” (a second voice query) “may be transcribed as” “HOURS OF LOO” “HOURS OF LOUIE’S” “or” “HOURS OF LOUVRE” (the second transcription comprising of trigger term “HOURS”); likewise in ¶ 0030 lines 6+ “HOURS” is a trigger term in “OPENING HOURS OF LOUVRE”) ; 
and in response to determining that the second transcription includes a trigger term: retrieve, from the memory,  the context of the first voice query (¶ 0044 last 7 lines: “HOURS OF LOUVRE” is “pre-associated” (retrieving) “with the” “PARIS” (the “location” (context) of the first voice query which is in memory) “term included in the prior voice query” (based on the trigger term “HOURS” which is associated with “LOUVRE” which is a location in “PARIS” )); 
identify, based on the second transcription, a second term of the second voice query that is similar to a term of the first voice query (¶ 0030 sentence 1: “query mappings included within the table 120 further include follow-up queries that are identified” (identifying from terms in the “follow-up” (second transcription)) “as such because of their phonetic similarity” (that are similar to) “to top transcription hypothesis for the initial voice query 104a” (with e.g. a first term and/or terms of the “initial” (first) voice query), e.g. ¶ 0030 lines 6+: “initial voice query is” “OPENING HOURS OF LUFRE” with “OPENING HOURS OF LOUVRE” “as a follow-up query based on the phonetic similarity between the terms” “LUFRE” “and” “LOUVRE” (“LOUVRE” is a second term in the second voice query that is similar to “LUFRE” which is a term in the first voice query; also the term “LOUVRE” above, see ¶ 0044 last 7 lines: “HOURS OF LOUVRE” is “pre-associated” “with the” “PARIS” “in the prior voice query”);
identify a second plurality of candidate entities to which the second term refers  (¶ 0044 lines 6+: “utterance” (the second voice query) “may be transcribed either as” “HOURS OF LOO” “HOURS OF LOUIE’S” “or” “HOURS OF LOUVRE” (a second  plurality of candidate entities corresponding to “LOUVRE” (the second term)); 
perform a second search based on the second plurality of candidate entities and the context (¶ 0031 last sentence: “The search results data returned by the search engine may then be logged and included with the table 120 for use by the ASRM 110 in processing” (performing a second search for) “a subsequent voice query” (second query e.g. for “HOURS OF LOUVRE” “HOURS OF LOO” “HOURS OF LOUIE’S” (based on the second plurality of candidate entities associated with the context of location associated with “LOUVRE”)); 
and generate for output a second search result of the second search (¶ 0005 last sentence: “providing” “one or more search results” (generating for output associated with search results) “associated with the transcription of the subsequent” (for the second query or obtaining second search result) “utterance for output to the user” (for e.g. Second search associated with “HOURS OF LOUVRE”)).
Behzadi et al. do not specifically disclose:
Wherein the trigger term indicates that the user intends to correct the first voice query.
Skobeltsyn et al. do teach:
Wherein the trigger term indicates that the user intends to correct the first voice query (¶ 0027: “System receives a first voice query from a user device 302”, “for example”, “who is the President of France?”; ¶ 0028 sentence 1: “recognition output” “transcription of the received voice query” (generation of transcript of the “query”); ¶ 0028 lines 11+: “The user can then examine the presented recognition output” e.g., transcription “may be recognized as”: “who is the President of friends”; ¶ 0030 sentence 1: “The system receives a second voice query” “[which] can be a correction query” “For example” “in response to the recognition output” “who is the President of friends” “may be” “no I meant France” (“meant” (a trigger term uttered in the “second” “user” “query” is intended to correct the first “voice” “query” transcription); i.e., according to the abstract lines 6-7: “the second voice query triggers a correction request”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “QUERY REWRITE CORRECTIONS” techniques of Skobeltsysn et al. into “FOLLOW-UP VOICE QUERY PREDICTION” of Behzadi et al., would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. “system” “[to] determine whether a correction request is triggered by the second voice query” as disclosed in Skobeltsysn et al. ¶ 0031 sentence 1, and thereby user gets to help the system in determining corrections to transcriptions by the system.


Regarding claim 14, Behzadi et al. do teach the system of claim 11, wherein the control circuitry configured to identify a context of the first voice query is further configured to identify, based on the first transcription, a keyword associated with a type of content (both the words “MUSEUMS” and “PARIS” in “MUSEUMS IN PARIS” (¶ 0044 line 3) are keywords, one is associated with a location context or context and another with a cultural or touristic context or content).

Regarding claim 15, Behzadi et al. do teach the system of claim 11, wherein the control circuitry is configured to generate the first transcription and the second transcription using a voice transcription model (Fig. 1 “LANGUAGE MODEL” “130” comprises of “130a” and “130b” (a voice transcription model); i.e., according to ¶ 0053 last sentence: “an initial language model 130a to generate the transcription 104d of the utterance of the initial voice query 104a” (the voice transcription model to generate “104d” (the first transcription)); ¶ 0057 sentence 2: “using the language model 130b to generate the transcription 106d of the utterance of the subsequent voice query 106d” (the second voice transcription generated using the voice transcription model “130”)).

Regarding claim 17, Behzadi et al. do teach the system of claim 11, wherein the first voice query contains at least a first term and the second voice query contains at least a second term in addition to the trigger term (¶ 0030 lines 6-9: “initial voice query is” “OPENING HOURS OF LUFRE” (the first voice query) “OPENING HOURS OF LOUVRE” “as a follow-up question” (the second voice query: i.e., they both have at least one extra term in addition to “HOURS” (the trigger term)), 
And wherein the control circuitry is further configured to:
compare the first term with the second term (¶ 0046 sentence 2: “compare identified terms” (comparing e.g., a first term in) “within the initial voice query 104a” (in the first voice query e.g. “LUFRE”) “and the subsequent voice query 106” (and the second query terms (e.g. a second term “LOUVRE”));
determine, based on the comparing, whether the second term is phonetically similar to the first term  (¶ 0030 sentence 1: “query mappings included within the table 120 further include follow-up queries that are identified” “as such because of their phonetic similarity” (using phonetic similarity comparison between “follow-up queries” (e.g. the second terms attributed to the second voice query)) “to top transcription hypothesis for the initial voice query 104a” (with e.g. the first term in the initial voice query); i.e., here “LUFRE” (the first term) is compared “phonetically” with “LOUVRE”) ; and
in response to determining that the second term is phonetically similar to the first term:
modify an entity recognition model (¶ 0021 sentence 1: “to improve voice recognition accuracy by identifying a set of follow-up voice queries that are likely to be subsequently provide by a user” (in response to follow up queries (e.g., the “phonetically similar[]” second query in ¶ 0030 lines 6-9)) “adjusting” (modifying) “a language model” (an entity recognition model i.e., from “130a” to “130b” in Fig. 1)); 
and identify the second plurality of candidate entities to which the second term refers using the modified entity recognition model (¶ 0030 sentence 1: “query mappings included within the table 120 further include follow-up queries that are identified” (identifying e.g. 2nd plurality of candidate entities) “as such because of their phonetic similarity” “to top transcription hypothesis for the initial voice query 104a” (and this is achieved by using the “language model” “130b” (the modified entity recognition model)).

Regarding claim 19, Behzadi et al. do teach the system of claim 11, wherein the control circuitry configured to identify a first plurality of candidate entities is further configured to:
identify at least one phrase in the first transcription (¶ 0027 lines 1+: “in response to receiving the audio data 104b” “of the initial voice query 104a, the ASRM 110 transcribes the n-gram” “MUSEUMS IN PARIS” (a phrase) “as the initial transcription” (identified in the first transcription));
determine a plurality of variants of the at least one phrase (¶ 0028 lines 4+: “the ASRM 110 may identify follow up queries” (determining a plurality of variants of) “given the terms included within the voice query 104a” (of the at least one phrase associated with the first voice query)); and
map the at least one phrase to at least one entity based on the plurality of variants (¶ 0028 lines 4+: “the ASRM 110 may identify follow up queries” (based on the plurality of variants of) “given the terms included within the voice query 104a” “the ASRM 110 may identify follow-up queries for the user 102 based on accessing a set of query mappings” (mapping) “that specifies a follow up query” (at least one entity) “for an initial voice query” (to the phrase associated with the first voice query)).

Regarding claim 20, Behzadi et al. do teach the system of claim 11, wherein the control circuitry is further configured to:
determine whether the second voice query was received within a threshold amount of time from a time at which the first voice query was received (¶ 0034 sentence 3: “the value of association score may reflect the likelihood that terms included within particular initial voice query and the follow-up query” (the “initial” (first) and the “follow-up” (second) voice queries) “are repeatedly sent within a particular period of time” (are received within a threshold amount of time from each other)) ;
wherein the control circuitry configured to retrieve the context of the first voice query is configured to do so in response to determining that the second voice query was received within the threshold amount of time from the time at which the first voice query was received (¶ 0035 sentence 2: “table 120 further specifies a higher associated score for the follow-up query "OPENING HOURS OF LOUVRE"” (the second query “OPENING HOURS OF LOUVRE” which is according to ¶ 0044 last 7 lines “pre-associated with the “Paris term”” (is associated with the location or retrieves the location context) “included in the prior voice query” (of the first voice query) is selected based on a “score” which according to ¶ 0034 “reflects the likelihood” “initial voice query and the follow-up query” “are” “sent within a particular period of time” (in response to determining that the second voice query was received within the threshold amount of time from the first voice query)).

Claims 2-3, 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Behzadi et al. in view of Skobeltsyn et al., and further in view of Gadd et al. (US 2005/0033582).
Regarding claim 2, Behzadi et al. in view of Skobeltsyn et al. do not specifically disclose the method of claim 1, wherein the trigger term is a politeness term.
Gadd et al. do teach the method of claim 1, wherein the trigger term is a politeness term (¶ 0255:  “System: (M4) Please” (a politeness trigger term) “repeat what you said once more” (in a second or higher transcription in an interactive user system dialog)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “prompt” of the dialog system of Gadd et al. into “FOLLOW UP VOICE QUERY” system and method of Behzadi et al. in Behzadi et al. in view of Skobeltsyn et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of Skobeltsyn et al. with “The error resolution cycle involves presentation of a series of "I'm sorry, but I didn't understand. . ." messages.” as disclosed in Gadd et al. ¶ 0024 last six lines.

Regarding claim 3, Behzadi et al. in view of Skobeltsyn et al. do not specifically disclose the method of claim 1, wherein the trigger term is a negative term.
Gadd et al. do teach the method of claim 1, wherein the trigger term is a negative term (¶ 0250:  “User : No” ( a negative trigger term in a second or higher transcription of an interactive user system dialog)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions of “Voice recognition engine” “22” (Fig. 4) of Gadd et al. into the “AUTOMATED SPEECH RECOGNIZER MODULE” of Behzadi et al. in Behzadi et al. in view of  Skobeltsyn et al.  would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al.  in view of Skobeltsyn et al.  “The confirmation class also defines two methods yes and no which define what should occur if either a ` yes` or a `no` response is received whilst the confirmation object is handling the inputs” as disclosed in Gadd et al. ¶ 0360.

Regarding claim 12, Behzadi et al. in view of Skobeltsyn et al. do not specifically disclose the system of claim 11, wherein the trigger term is a politeness term.
Gadd et al. do teach the system of claim 11, wherein the trigger term is a politeness term (¶ 0255:  “System: (M4) Please” (a politeness trigger term) “repeat what you said once more” (in a second or higher transcription in an interactive user system dialog)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “prompt” of the dialog system of Gadd et al. into “FOLLOW UP VOICE QUERY” system and method of Behzadi et al. in Behzadi et al. in view of Skobeltsyn et al.   would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of Skobeltsyn et al.  with “The error resolution cycle involves presentation of a series of "I'm sorry, but I didn't understand. . ." messages.” as disclosed in Gadd et al. ¶ 0024 last six lines.

Regarding claim 13, Behzadi et al. in view of of Skobeltsyn et al. do not specifically disclose the system of claim 11, wherein the trigger term is a negative term.
Gadd et al. do teach the system of claim 11, wherein the trigger term is a negative term (¶ 0250:  “User : No” ( a negative trigger term in a second or higher transcription of an interactive user system dialog)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions of “Voice recognition engine” “22” (Fig. 4) of Gadd et al. into the “AUTOMATED SPEECH RECOGNIZER MODULE” of Behzadi et al. in Behzadi et al. in view of  of Skobeltsyn et al.   would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of  Skobeltsyn et al.  “The confirmation class also defines two methods yes and no which define what should occur if either a ` yes` or a `no` response is received whilst the confirmation object is handling the inputs” as disclosed in Gadd et al. ¶ 0360.

Claims 6, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Behzadi et al. in view of Skobeltsyn et al., and further in view of FORMHALS et al. (US 2016/0203817).
Regarding claim 6, Behzadi et al. in view of Skobeltsyn et al.  do not specifically disclose the method of claim 5, further comprising:
storing an indication that the first transcription is incorrect; and
refining the transcription model based on the indication.
FORMHALS et al. do teach:
storing an indication that the first transcription is incorrect (¶ 0034 sentence 4+: “the human transcription service may simply operate to error check or validate the results of Transcription Module 247. For example, text or grammar checking could be performed in Transcription Module 207 and an error flag” (an indication that a transcription is incorrect) “could be set” (stored)); 
and
refining the transcription model based on the indication (¶ 0034 sentence 4+: “the human transcription service may simply operate to error check or validate the results of Transcription Module 247. For example, text or grammar checking could be performed in Transcription Module 247 and an error flag” “could be set” “which would send the error-tagged portion with the corresponding voice recording portion to the human transcription service” (so as to refine the transcription)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions of the “Transcription Module 247” of FORMHALS et al. into the “AUTOMATED SPEECH RECOGNIZER MODULE” of Behzadi et al. in Behzadi et al. in view of   Skobeltsyn et al.  would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of Skobeltsyn et al.  to avoid having to send “the entirety” “of actual voice recording” to a “human transcription service” and only send the “error-tagged portion” for correction as disclosed in FORMHALS et al. ¶ 0034 last two sentences.

Regarding claim 16, Behzadi et al. in view of  Skobeltsyn et al.  do not specifically disclose the system of claim 15, wherein the control circuitry is further configured to:
store an indication that the first transcription is incorrect; and
refine the transcription model based on the indication.
FORMHALS et al. do teach:
store an indication that the first transcription is incorrect (¶ 0034 sentence 4+: “the human transcription service may simply operate to error check or validate the results of Transcription Module 247. For example, text or grammar checking could be performed in Transcription Module 207 and an error flag” (an indication that a transcription is incorrect) “could be set” (stored)); 
and
refine the transcription model based on the indication (¶ 0034 sentence 4+: “the human transcription service may simply operate to error check or validate the results of Transcription Module 247. For example, text or grammar checking could be performed in Transcription Module 247 and an error flag” “could be set” “which would send the error-tagged portion with the corresponding voice recording portion to the human transcription service” (so as to refine the transcription)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions of the “Transcription Module 247” of FORMHALS et al. into the “AUTOMATED SPEECH RECOGNIZER MODULE” of Behzadi et al. in Behzadi et al. in view of of Skobeltsyn et al.  would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of  Skobeltsyn et al.  to avoid having to send “the entirety” “of actual voice recording” to a “human transcription service” and only send the “error-tagged portion” for correction as disclosed in FORMHALS et al. ¶ 0034 last two sentences.

Claims 8, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Behzadi et al. in view of  Skobeltsyn et al., and further in view of Arakawa et al. (US 2010/0070277).
Regarding claim 8, Behzadi et al. in view of Skobeltsyn et al. do not specifically disclose the method of claim 7, wherein modifying the entity recognition model comprises temporarily increasing a relaxation rate, wherein the number of interpretations considered for a particular term is based on the relaxation rate.
Arakawa et al. do teach the method of claim 7, wherein modifying the entity recognition model comprises temporarily increasing a relaxation rate, wherein the number of interpretations considered for a particular term is based on the relaxation rate (¶ 0121 sentence 2+: “when the detail level is low, the parameter setting unit 10 sets the pruning parameter so that the number of hypotheses” (relaxation rate) “is increased” (is increased) “because the reliability of voice information is low” “Conversely, when the detail level is high” “the number of hypotheses is decreased” (the change in “number of hypotheses” (the relaxation rate) is temporary as it depends on the quality of received “voice”; the “hypothesis” according to ¶ 0014 sentence 2 corresponds to the “the candidate” “for the word string”, where the “candidate” according to ¶ 0010 last sentence corresponds to “recognition result” (interpretations for each term)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “voice recognition” of Arakawa et al. into the “AUTOMATED SPEECH RECOGNIZER MODULE” of Behzadi et al. in Behzadi et al. in view of  Skobeltsyn et al.  would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of Skobeltsyn et al.  to set “the number of hypotheses” (recognition interpretations) according to “reliability of voice” received for recognition as disclosed in Arakawa et al. ¶ 0121. 

Regarding claim 18, Behzadi et al. in view of of Skobeltsyn et al.  do not specifically disclose the system of claim 17, wherein the control circuitry configured to modify the entity recognition model is further configured to temporarily increase a relaxation rate, wherein the number of interpretations considered for a particular term is based on the relaxation rate.
Arakawa et al. do teach the system of claim 17, wherein the control circuitry configured to modify the entity recognition model is further configured to temporarily increase a relaxation rate, wherein the number of interpretations considered for a particular term is based on the relaxation rate (¶ 0121 sentence 2+: “when the detail level is low, the parameter setting unit 10 sets the pruning parameter so that the number of hypotheses” (relaxation rate) “is increased” (is increased) “because the reliability of voice information is low” “Conversely, when the detail level is high” “the number of hypotheses is decreased” (the change in “number of hypotheses” (the relaxation rate) is temporary as it depends on the quality of received “voice”; the “hypothesis” according to ¶ 0014 sentence 2 corresponds to the “the candidate” “for the word string”, where the “candidate” according to ¶ 0010 last sentence corresponds to “recognition result” (interpretations for each term)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “voice recognition” of Arakawa et al. into the  “AUTOMATED SPEECH RECOGNIZER MODULE” of Behzadi et al. in Behzadi et al. in view of Skobeltsyn et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Behzadi et al. in view of of Skobeltsyn et al.  to set “the number of hypotheses” (recognition interpretations) according to “reliability of voice” received for recognition as disclosed in Arakawa et al. ¶ 0121. 
Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farzad Kazeminezhad/
Art Unit 2657
July 12th 2022.