DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on September 9, 2020 and April 21, 2021 is/are being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5 and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 5 and 14 recite the phrase “a pre-configured correspondence between semantic and voice-actions and the voice-action description information…” However, the element “semantic” is unclear. The word “semantic” describes the intended structure and meaning of something or indicates that the subject is related to “semantics or the meaning of words”. It appears that the subject to which the word semantic applies in claims 5 and 14 is missing. Further, the intended subject of “semantic” or the meaning of the word “semantic” in context is unclear, as 
For examination purposes, the above phrase from claims 5 and 14 is being read as “a pre-configured correspondence between the voice-actions and the voice-action description information…”   
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 2, 5-11, 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Patch (U.S. Pat. App. Pub. No. 2010/0169098, hereinafter Patch) in view of Bai (U.S. Pat. App. Pub. No. 2016/0132291, hereinafter Bai).

Regarding claim 1, Patch discloses A view-based voice interaction method, which is applied to a server, comprising (the method described with reference to the “speech recognition system”; Patch, ¶¶ [0070]): obtaining voice information of a user and (“receiving voice input Patch, ¶¶ [0070]) voice-action description information of a voice-operable element in a currently displayed view on a terminal (“defining a structured grammar {voice-action description information} for handling a global voice command 402 {voice operable element in a currently displayed view on a terminal}”; Patch, ¶¶ [0070]), the voice-action description information including a voice-action list and configuration information of each voice-action in the voice-action list (“defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information},” where “combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}”. Further, “the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070], [0067]), and the voice-action being configured to describe a voice operation to be performed on an element in the view (the global voice command 408 {a voice label corresponding to the voice action} is mapped to {configured to describe} “at least one function of the object” where a function {a voice operation} is performed on an object {an element}, and where “logical objects” such as “virtual objects and physical objects” are perceived by a “computer...as on-screen elements {performed on elements in the view}”; Patch, ¶¶ [0070], [0012]). However, Patch fails to expressly recite obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing.
Bai teaches systems and methods for interpretation of a voice input into a set of commands. (Bai, ¶ [0008]). Regarding claim 1, Bai teaches obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element (“Speech recognition system 138 Bai, ¶¶ [0081]); locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {locating...actions} to take based upon the intent {matched with the operational intention in the voice-action list} and it can also do this based on the context information and the arguments {according to the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure in order to accomplish a task that requires multiple different commands or actions.{locating a sequence of actions}”; Bai, ¶¶ [0083], [0084]); and delivering the sequence of actions to the terminal for performing (“Once action generator 120 has identified the action to be taken, and has used search system 116 to identify the items of content needed to perform the action {the sequence of actions}, it illustratively uses one of the components in action generator 120 to perform the action... by performing one or more actions in one of the controlled systems 124” and where the system can be “Software or components of architecture 100 as well as the corresponding data... stored on servers at a remote location” and delivered to “any other computing component {to the terminal for performing}”; Bai, ¶¶ [0086], [0150]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include obtaining operational intention of the user by Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 2, the rejection of claim 1 is incorporated. Patch further discloses wherein the voice-action description information further including a voice label (discloses “a global voice command {a voice action of a voice action list} of the structured grammar 404 {voice-action description information}, wherein the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070]), the voice label being configured to describe information about the voice-operable element in the view (where the listed commands can be “Start...with the name of the program or, to call up a default program, the name of the type of program” as well rules call for “consistent, descriptive, noun-based menu items. {describe information about the voice operable element in the view}”; Patch, ¶¶ [0067], [0094]).

Regarding claim 5, the rejection of claim 1 is incorporated. Patch disclose all of the elements of the current invention as stated above. However, Patch fail(s) to expressly recite wherein locating the sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information, includes: locating the sequence of actions matched with the operational intention in the voice-action list according to a pre-configured correspondence between semantic and voice-actions and the voice-action description 
The relevance of Bai is described above with relation to claim 1. Regarding claim 5, Bai teaches wherein locating the sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information, includes: locating the sequence of actions matched with the operational intention in the voice-action list according to a pre-configured correspondence between semantic and voice-actions and the voice-action description information (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {locating...actions} to take based upon the intent {matched with the operational intention in the voice-action list} and it can also do this based on the context information and the arguments {according to the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure {a pre-configured correspondence between semantic and voice actions and the voice-action description information} in order to accomplish a task that requires multiple different commands or actions {locating a sequence of actions}”; Bai, ¶¶ [0083], [0084]), the sequence of actions including an ID of at least one voice-action and a key value in the configuration information of the voice-action (The system “identifies actions to take based upon the [identified] intent {including an identification (ID) of at least one voice-action}” which are part of the “nested or arranged [actions] in a hierarchal or dependency structure {sequence of actions}.” Further, the system “identifies actions to take based upon the intent and it can also do this based on the context information and the arguments {a key value in...},” where context information is included with the intent {...the configuration information of the voice-action}; Bai, ¶¶ [0083], [0084]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include wherein locating the sequence of actions matched with Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 6, Patch discloses A view-based voice interaction method, which is applied to a terminal, comprising (the method described with reference to the “speech recognition system”; Patch, ¶¶ [0070]): transmitting voice information of a user that is heard and (“receiving voice input from the user of the computer platform”; Patch, ¶¶ [0070]) voice-action description information of a voice-operable element in a currently displayed view on the terminal to a server (“defining a structured grammar {voice-action description information} for handling a global voice command 402 {voice operable element in a currently displayed view on a terminal}”; Patch, ¶¶ [0070]), the voice-action description information including a voice-action list and configuration information of each voice-action in the voice-action list (“defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information},” where “combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}”. Further, “the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070], [0067]), and the voice-action being configured to describe a voice operation to be performed on an element in the view (the global voice command 408 {a voice label corresponding to the voice action} is mapped to {configured to describe} “at least one function of the object” where a function {a voice operation} Patch, ¶¶ [0070], [0012]). However, Patch fails to expressly recite receiving a sequence of actions determined according to the voice information and the voice-action description information from the server and performing action processing logics corresponding to the voice-actions in the sequence of actions.
The relevance of Bai is described above with relation to claim 1. Regarding claim 6, Bai teaches receiving a sequence of actions determined according to the voice information and the voice-action description information from the server (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {receiving...actions} to take based upon the intent {determined according to the voice information} and it can also do this based on the context information and the arguments {and the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure in order to accomplish a task that requires multiple different commands or actions.{locating a sequence of actions}” and where “The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software on a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware.”; Bai, ¶¶ [0083], [0084], [0150]), and performing action processing logics corresponding to the voice-actions in the sequence of actions (The system “identifies actions to take {operational intention of the user} based upon the intent” where the “natural language understanding system 140 identifies an intent 386 in the utterance 142.” Further, the system identifies the actions to take “based on the context information {view description information of the voice-operable element}”; Bai, ¶¶ [0081]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include receiving a sequence of actions determined according Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 7, the rejection of claim 6 is incorporated. Patch further discloses wherein the voice-action description information further including a voice label (discloses “a global voice command {a voice action of a voice action list} of the structured grammar 404 {voice-action description information}, wherein the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070]), the voice label being configured to describe information about the voice-operable element in the view (where the listed commands can be “Start...with the name of the program or, to call up a default program, the name of the type of program” as well rules call for “consistent, descriptive, noun-based menu items {describe information about the voice operable element in the view}.”; Patch, ¶¶ [0067], [0094]).

Regarding claim 8, the rejection of claim 6 is incorporated. Patch disclose all of the elements of the current invention as stated above. However, Patch fail(s) to expressly recite wherein the sequence of actions includes an ID of at least one voice-action and a key value in the configuration information of the voice-action, and performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: when the sequence of actions includes an ID of a voice-action and a key value in the configuration information of the voice-action, performing a corresponding action processing logic according to the ID and the key value; and when the sequence of actions includes IDs of more than two voice-actions and key values in the configuration information of the voice-actions, determining a target voice-action in 
The relevance of Bai is described above with relation to claim 1. Regarding claim 8, Bai teaches wherein the sequence of actions includes an ID of at least one voice-action and a key value in the configuration information of the voice-action, (The system “identifies actions to take based upon the [identified] intent {including an identification (ID) of at least one voice-action}” which are part of the “nested or arranged [actions] in a hierarchal or dependency structure {sequence of actions}.” Further, the system “identifies actions to take based upon the intent and it can also do this based on the context information and the arguments {a key value in...},” where context information is included with the intent {...the configuration information of the voice-action}; Bai, ¶¶ [0083], [0084]) and performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: when the sequence of actions includes an ID of a voice-action and a key value in the configuration information of the voice-action (The system “identifies actions to take based upon the intent and it can also do this based on the context information and the arguments.”; Bai, ¶¶ [0083], [0084], [0063]), performing a corresponding action processing logic according to the ID and the key value (“The [identified] intent illustratively corresponds to an action that the user wishes to perform” where “actions are to be performed based upon the arguments and context information.”; Bai, ¶¶ [0085]); and when the sequence of actions includes IDs of more than two voice-actions and key values in the configuration information of the voice-actions, determining a target voice-action in the sequence of actions through interactions with the terminal, and performing a corresponding action processing logic according to the ID and the key value of the target voice-action.
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include wherein the sequence of actions includes an ID of at Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 9, the rejection of claim 6 is incorporated. Patch disclose all of the elements of the current invention as stated above. However, Patch fail(s) to expressly recite wherein performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: performing the action processing logics corresponding to the voice-actions in the sequence of actions and obtaining voice events corresponding to the action processing logics and performing the voice events during performing the action processing logics, wherein the voice events are configured to define product logics to be processed during performing the voice-actions.
The relevance of Bai is described above with relation to claim 1. Regarding claim 9, Bai teaches wherein performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: performing the action processing logics corresponding to the voice-actions in the sequence of actions (“The [identified] intent illustratively corresponds to an action that the user wishes to perform” where “actions are to be performed based upon the arguments and context information,” where the actions can be “nested or arranged in a hierarchal or dependency structure”; Bai, ¶¶ [0085], [0084]), and obtaining voice events corresponding to the action processing logics and performing the voice events during performing the action processing logics, (In one example, the system enters a dialog to correct a specific element on a view. The interaction begins by “The user then speaks ‘Make this part more visually appealing.’ A textual representation of the utterance, along with a possible interpretation, are generated, and the textual representation is displayed.” Thus, at this point the generation of the possible interpretation {action processing logics} corresponding to the user speech {voice-action in the sequence of actions} is paused. The system then asks for further input during the performing of the processing logics, which “the user can indicate... using a voice command”; Bai, ¶¶ [0135]) wherein the voice events are configured to define product logics to be processed during performing the voice-actions (The voice command {voice events} are configured to make certain that the actions are performed on the right portion of the view { configured to define product logics to be processed} during the performance of the correction {during performing the voice-actions}; Bai, ¶¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include wherein performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: performing the action processing logics corresponding to the voice-actions in the sequence of actions and obtaining voice events corresponding to the action processing logics and performing the voice events during performing the action processing logics, wherein the voice events are configured to define product logics to be processed during performing the voice-actions. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 10, Patch discloses A view-based voice interaction apparatus, comprising: one or more processors (“The methods and systems described herein may be Patch, ¶ [0148]), and a storage device, configured to store one or more programs, wherein, when the one or more programs are executed by the one or more processors (“The processor may access a storage medium through an interface that may store methods, codes, and instructions as described herein and elsewhere.”; Patch, ¶ [0148]), the one or more processors are configured to implement a view-based voice interaction method, which is applied to a server (The processor may be part of a server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform.”; Patch, ¶ [0148]), comprising (the method described with reference to the “speech recognition system”; Patch, ¶¶ [0070]): obtaining voice information of a user and (“receiving voice input from the user of the computer platform”; Patch, ¶¶ [0070]) voice-action description information of a voice-operable element in a currently displayed view on a terminal (“defining a structured grammar {voice-action description information} for handling a global voice command 402 {voice operable element in a currently displayed view on a terminal}”; Patch, ¶¶ [0070]), the voice-action description information including a voice-action list and configuration information of each voice-action in the voice-action list (“defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information},” where “combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}”. Further, “the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070], [0067]), and the voice-action being configured to describe a voice operation to be performed on an element in the view (the global voice command 408 {a voice label corresponding to the voice action} is mapped to {configured to describe} “at least one function of the object” where a function {a voice operation} is performed on an object {an element}, and where “logical objects” such as “virtual objects and physical objects” are perceived by a Patch, ¶¶ [0070], [0012]). However, Patch fails to expressly recite obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing.
The relevance of Bai is described above with relation to claim 1. Regarding claim 10, Bai teaches obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element (“Speech recognition system 138 generates a textual representation of the utterance, as indicated by block 382. Once a textual representation is generated, natural language understanding system 140 identifies an intent 386 in the utterance 142, based upon the textual representation.” where The system “identifies actions to take {operational intention of the user} based upon the intent” where the “natural language understanding system 140 identifies an intent 386 in the utterance 142.” Further, the system identifies the actions to take “based on the context information {view description information of the voice-operable element}”; Bai, ¶¶ [0081]); locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {locating...actions} to take based upon the intent {matched with the operational intention in the voice-action list} and it can also do this based on the context information and the arguments {according to the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure in order to accomplish a task that requires multiple different commands or actions.{locating a sequence of actions}”; Bai, ¶¶ [0083], [0084]); and delivering the sequence of actions to the terminal for performing (“Once action generator 120 has identified the action to be taken, and has used Bai, ¶¶ [0086], [0150]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 11, the rejection of claim 10 is incorporated. Claim 11 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 14, the rejection of claim 10 is incorporated. Claim 14 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Claims 3-4 and 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Patch and Bai as applied to claims 1 and 10 above, and further in view of Deoras (U.S. Pat. App. Pub. No. 2015/0066496, hereinafter Deoras).

Regarding claim 3, the rejection of claim 1 is incorporated. Patch and Bai disclose all of the elements of the current invention as stated above. Bai further discloses wherein obtaining the operational intention of the user by performing the semantic recognition on the voice information according to the view description information of the voice-operable element, includes: obtaining a corresponding query text by performing speech recognition on the voice information according to the view description information of the voice-operable element (“Speech recognition system 138 {performing speech recognition} generates a textual representation {corresponding query text} of the utterance {on the voice information}, as indicated by block 382.” where “context of the application” is used in determining the intent. As indicated in the example, ‘the phrase “share this document with Joe,” is interpreted by the “natural language understanding system 140” as an “an action or command that the user wishes the system to perform” on the displayed document {according to the view description information}; Bai, ¶¶ [0081], [0082]); extracting a text label of the voice-operable element from the view description information of the voice-operable element (“Once a textual representation is generated, natural language understanding system 140 identifies an intent 386 {text label} in the utterance 142, based upon the textual representation.”; Bai, ¶¶ [0081]), the text label including a type and attributes of the voice-operable element (“Identifying the user intent {text label}... [can include] the string and contextual data {attributes} can be sent to a classifier where they are classified into a class {type}”; Bai, ¶¶ [0042]); and obtaining...the operational intention of the user (“natural language understanding system 140 deciphers a user intent {text label}, and maps the intent {text label} to an action {operational intent}”; Bai, ¶¶ [0081]). However, Patch and Bai fail to expressly recite obtaining a semantic-labeled result of the query text as the operational intention of the user by performing semantic labeling on the query text according to the extracted text label by utilizing a pre-trained labeling model.
Deoras teaches systems and methods for “assignment of semantic labels to words in a natural language utterance.” (Deoras, ¶ [0004]). Regarding claim 3, Deoras teaches obtaining a semantic-labeled result of the query text as the operational intention of the user (The system includes a “labeler component 124 that receives semantic features output by the semantic feature identifier component 122 for words in the sequence of words, and assigns respective labels {obtaining a semantic-labeled result...} to words in the sequence of words {of the query text} based upon the semantic features.”; Deoras, ¶¶ [0029]) by performing semantic labeling on the query text according to the extracted text label by utilizing a pre-trained labeling model (“the labeler component 124 can comprise at least one of a DNN 126 or a recurrent neural network (RNN) 128, wherein the at least one of the DNN 126 or the RNN 128 is used in connection with performing the labeling {by performing semantic labeling} of words in the sequence of words {on the query text...}” for “ assigning semantic labels to words in a sequence of words.” Further, “the at least one of the DNN 126 or the RNN 128 are trained to assign labels {by utilizing a pre-trained labeling model} pertaining to a particular domain and/or intent {...according to the extracted text label}.”; Deoras, ¶¶ [0030]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai to incorporate the teachings of Deoras to include obtaining a semantic-labeled result of the query text as the operational intention of the user by performing semantic labeling on the query text according to the extracted text label by utilizing a pre-trained labeling model. Semantic slot filling as performed using trained neural networks allows for automatic extraction of a semantic concept while overcoming the deficiencies in the prior art regarding poor generalization of models on “complex combinations of patterns,” as recognized by Deoras. (Deoras, ¶¶ [0001], [0002], [0004]).

Regarding claim 4, the rejection of claim 3 is incorporated. Patch and Bai disclose all of the elements of the current invention as stated above. However, Patch and Bai fail to expressly 
The relevance of Bai is described above with relation to claim 1. Regarding claim 4, Deoras teaches wherein obtaining the corresponding query text by performing the speech recognition on the voice information according to the view description information of the voice-operable element, includes: predicting acoustic features of an audio signal of the voice information by utilizing a pre-trained acoustic model (“client computing device 102 may also include an acoustic feature extractor component 112 that receives the spoken utterance captured by the microphone 108 and extracts acoustic features {predicting acoustic features of an audio signal...} from such utterance.” where the “acoustic model 116 may be or include any suitable type of model, such as... a deep neural network (DNN) {pre-trained acoustic model}”; Deoras, ¶¶ [0025]); and generating the corresponding query text by decoding the acoustic features dynamically (“In combination, the acoustic model 116 and the language model 118 can recognize and output {thus, by decoding the acoustic features...} words, numbers, acronyms, etc. {generating the corresponding query text} in the spoken utterance set forth by the user 110.”; Deoras, ¶¶ [0026]) based on the view description information of the voice-operable element by utilizing a pre-trained language model (“The domain determiner component is configured to identify a general domain {view description information} to which the utterance set forth by the user 110 is directed.” where “a determined domain and/or intent can be provided as an input feature to a model that is trained to assign semantic labels to words in sequences of words across several domains/intents {...by utilizing a pre-trained language model}.”; Deoras, ¶¶ [0027]).
Patch as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai to incorporate the teachings of Deoras to include wherein obtaining the corresponding query text by performing the speech recognition on the voice information according to the view description information of the voice-operable element, includes: predicting acoustic features of an audio signal of the voice information by utilizing a pre-trained acoustic model and generating the corresponding query text by decoding the acoustic features dynamically based on the view description information of the voice-operable element by utilizing a pre-trained language model. Semantic slot filling as performed using trained neural networks allows for automatic extraction of a semantic concept while overcoming the deficiencies in the prior art regarding poor generalization of models on “complex combinations of patterns,” as recognized by Deoras. (Deoras, ¶¶ [0001], [0002], [0004]).

Regarding claim 12, the rejection of claim 10 is incorporated. Claim 12 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 13, the rejection of claim 12 is incorporated. Claim 13 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hennecke et al. (U.S. Pat. App. Pub. No. 2004/0034527) discloses a speech recognition system which processes voice inputs from a user to select a list element from a list or group of list elements.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657