Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on April 21, 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1 - 2, 4 - 7, 10 - 11 and 13 - 14 has been acknowledged and entered.  
In view of the amendment to claim(s) 5 and 14, the rejection of claim(s) 5 and 14 under 35 U.S.C. §112 is withdrawn.
In view of the amendment to claim(s) 1, 4 - 6, 10, and 13 - 14, the rejection of claims 1, 3, 5-6, 8-10, 12, and 14-15 under 35 U.S.C. §103 is maintained, as amended, for the reasons discussed below.
In view of the amendment to claim(s) 2, 4, 7, 11, and 13, the rejection of claims 2, 4, 7, 11, and 13 under 35 U.S.C. §103 is withdrawn.
In light of the amendments to claims 2, 4, 7, 11, and 13, new grounds for rejection under 35 U.S.C. §103 are provided in the response below.

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102/103, see pages 10-14 of the Response to Non-Final Office Action dated January 25, 2022, which was received on April 21, 2022 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 1, 6, and 10 under 35 U.S.C. §103 in light of Patch (U.S. Pat. App. Pub. No. 2010/0169098, hereinafter Patch) in view of Bai (U.S. Pat. App. Pub. No. 2016/0132291, hereinafter Bai), applicant first asserts that Patch “does not disclose ‘obtaining voice-action description information of a voice- operable element which includes configuration information of each voice-action in the voice- action list, in which the configuration information of each voice-action is configured to indicate specific execution features corresponding to the voice-action’ as recited in amended claim 1.” Second, applicant asserts that Patch “does not teach or suggest obtaining operational intention of the user that fits the currently displayed view by matching the voice information with view description information of the voice-operable element, in which the view description information comprises an element name, a text label, and coordinate distribution of the element in the view.” Third, applicant asserts that Bai fails to cure the deficiencies of Patch, regarding the above presented limitations. Fourth, applicant asserts that Deoras (U.S. Pat. App. Pub. No. 2015/0066496, hereinafter Deoras) fails to cure the deficiencies of Patch, regarding the above presented limitations. However, these arguments are not persuasive.
Regarding the first and second arguments, Patch discloses the above recited elements. Regarding the first argument, Patch teaches "defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information}," including "combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}.” (Patch, [0070]). Further, Patch discloses that "the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408" where "end result may be a map {configuration information} of speech commands {...of each voice action} tailored to the user" where "single speech commands {…corresponding to the voice actions} may enable performing several keystrokes worth of work” where the several keystrokes are specific execution features. (Patch, [0070], [0131]). Regarding the second argument, as explained in Patch, the view description information can include "an x-y-z coordinate system associated with” the objects {coordinate distribution of the element in the view}" where "objects may also be manipulated as a group using a group name {element name}" and, referencing a specific example "commands that call up dialog boxes may also be accessed using the first word of the dialog box label {text label}." The newly amended claims are more clearly mapped in the rejections presented below. Therefore, the rejection is maintained, as amended below.
Regarding the third argument, applicant’s amended elements in claim 1 are addressed by the disclosure of Patch, as presented herein. As such, any asserted deficiencies of Bai regarding the same amended elements is rendered moot.
Regarding the fourth argument, applicant’s amended elements in claim 1 are addressed by the disclosure of Patch, as presented herein. As such, any asserted deficiencies of Deoras regarding the same amended elements is rendered moot.
Applicant further argues that dependent claims 2-5, 7-9, and 11-14 are allowable for at least the same reasons as independent claims 1, 6, and 10. Applicant’s arguments in light of the amended claims are not persuasive. However, Patch, Bai, and Deoras fail to teach or suggest all elements of amended claims 2, 4, 7, 11, and 13. As such, the rejections of claims 3, 5, 8-9, 12, and 14 under 35 U.S.C. §103 are maintained as amended in light of the claim amendments, and the rejections of claims 2, 4, 7, 11, and 13 under 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection of claims 2, 4, 7, 11, and 13 under 35 U.S.C. §103 are made in light of combinations of Patch, Bai, Deoras, and newly cited reference Thangarathnam (U.S. Pat. App. Pub. No. 2019/0179607, hereinafter Thangarathnam).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 5-6, 8-10, and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Patch in view of Bai.

Regarding claim 1, Patch discloses A view-based voice interaction method, which is applied to a server, comprising (the method described with reference to the “speech recognition system”; Patch, ¶¶ [0070]): obtaining voice information of a user and (“receiving voice input from the user of the computer platform”; Patch, ¶¶ [0070]) voice-action description information of a voice-operable element in a currently displayed view on a terminal (“defining a structured grammar {voice-action description information} for handling a global voice command 402 {voice operable element in a currently displayed view on a terminal}”; Patch, ¶¶ [0070]), the voice-action description information including a voice-action list and configuration information of each voice-action in the voice-action list (“defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information},” where “combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}”. Further, “the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070], [0067]), and the voice-action being configured to describe a voice operation to be performed on an element in the view (the global voice command 408 {a voice label corresponding to the voice action} is mapped to {configured to describe} “at least one function of the object” where a function {a voice operation} is performed on an object {an element}, and where “logical objects” such as “virtual objects and physical objects” are perceived by a “computer...as on-screen elements {performed on elements in the view}”; Patch, ¶¶ [0070], [0012]), in which the configuration information of each voice- action is configured to indicate specific execution features corresponding to the voice-action (“The end result may be a map {configuration information} of speech commands {...of each voice action} tailored to the user” where “single speech commands {…corresponding to the voice actions} may enable performing several keystrokes worth of work {specific execution features}.”; Patch, ¶¶ [0131]); obtaining operational intention of the user… according to view description information of the voice-operable element (“method for enabling a user to interact with a computer platform using a voice command {obtaining an operational intent of the user...} may comprise the steps of defining a structured grammar for handling a global voice command 402... and mapping at least one function of the object to the global voice command 408” where “the voice command specifies moving or changing an object location with respect to at least one of an x-y and an x-y-z coordinate system associated with at least one of the object and a target location {according to view description information...}” and where “the global voice command enables building a custom list of objects and the function 414 may relate to a listed object. {...of the voice-operable element}”; Patch, ¶¶ [0070]), in which the view description information comprises an element name, a text label, and coordinate distribution of the element in the view (The view description information can include “an x-y-z coordinate system associated with at least one of the object, {coordinate distribution of the element in the view}” where “objects may also be manipulated as a group using a group name {element name}” and, referencing a specific example “commands that call up dialog boxes may also be accessed using the first word of the dialog box label {text label}”; Patch, ¶¶ [0070], [0094]-[0095]). However, Patch fails to expressly recite obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element…; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing.
Bai teaches systems and methods for interpretation of a voice input into a set of commands. (Bai, ¶ [0008]). Regarding claim 1, Bai teaches obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element… (“Speech recognition system 138 generates a textual representation of the utterance, as indicated by block 382. Once a textual representation is generated, natural language understanding system 140 identifies an intent 386 in the utterance 142, based upon the textual representation.” where the system “identifies actions to take {operational intention of the user} based upon the intent” where the “natural language understanding system 140 identifies an intent 386 in the utterance 142.” Further, the system identifies the actions to take “based on the context information {view description information of the voice-operable element}”; Bai, ¶¶ [0081]); locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {locating...actions} to take based upon the intent {matched with the operational intention in the voice-action list} and it can also do this based on the context information and the arguments {according to the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure in order to accomplish a task that requires multiple different commands or actions.{locating a sequence of actions}”; Bai, ¶¶ [0083], [0084]); and delivering the sequence of actions to the terminal for performing (“Once action generator 120 has identified the action to be taken, and has used search system 116 to identify the items of content needed to perform the action {the sequence of actions}, it illustratively uses one of the components in action generator 120 to perform the action... by performing one or more actions in one of the controlled systems 124” and where the system can be “Software or components of architecture 100 as well as the corresponding data... stored on servers at a remote location” and delivered to “any other computing component {to the terminal for performing}”; Bai, ¶¶ [0086], [0150]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element…; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 5, the rejection of claim 1 is incorporated. Patch disclose all of the elements of the current invention as stated above. However, Patch fail(s) to expressly recite wherein locating the sequence of actions matched with the operational intention in the voice-action list according to the voice- action description information, includes: determining a set of target voice-actions in the voice-action list based on a pre-configured correspondence between semantic intentions and voice-actions; and locating the sequence of actions matched with the operational intention in the set of target voice-actions based on the configuration information of each voice-action, the sequence of actions including an ID of at least one voice-action and a key value in the configuration information of the voice-action.
The relevance of Bai is described above with relation to claim 1. Regarding claim 5, Bai teaches wherein locating the sequence of actions matched with the operational intention in the voice-action list according to the voice- action description information, includes: determining a set of target voice-actions in the voice-action list based on a pre-configured correspondence between semantic intentions and voice-actions; and locating the sequence of actions matched with the operational intention in the set of target voice-actions based on the configuration information of each voice-action (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions to take {determining a set of target voice actions...} based upon the intent” where the actions can be “nested or arranged in a hierarchal or dependency structure {in the voice-action list based on a pre-configured correspondence between semantic intent and voice actions} in order to accomplish a task {...matched with the operational intention} that requires multiple different commands or actions {locating a sequence of actions...in the set of target voice actions}” and where “All of these actions or commands can be identified through a sequence of rules {based on the configuration information of each voice action} that are active based upon the intent expressed in the utterance.”; Bai, ¶¶ [0083], [0084]), the sequence of actions including an ID of at least one voice-action and a key value in the configuration information of the voice-action (The system “identifies actions to take based upon the [identified] intent {including an identification (ID) of at least one voice-action}” which are part of the “nested or arranged [actions] in a hierarchal or dependency structure {sequence of actions}.” Further, the system “identifies actions to take based upon the intent and it can also do this based on the context information and the arguments {a key value in...},” where context information is included with the intent {...the configuration information of the voice-action}; Bai, ¶¶ [0083], [0084]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include wherein locating the sequence of actions matched with the operational intention in the voice-action list according to the voice- action description information, includes: determining a set of target voice-actions in the voice-action list based on a pre-configured correspondence between semantic intentions and voice-actions; and locating the sequence of actions matched with the operational intention in the set of target voice-actions based on the configuration information of each voice-action, the sequence of actions including an ID of at least one voice-action and a key value in the configuration information of the voice-action. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 6, Patch discloses A view-based voice interaction method, which is applied to a terminal, comprising (the method described with reference to the “speech recognition system”; Patch, ¶¶ [0070]): transmitting voice information of a user that is heard and (“receiving voice input from the user of the computer platform”; Patch, ¶¶ [0070]) voice-action description information of a voice-operable element in a currently displayed view on the terminal to a server (“defining a structured grammar {voice-action description information} for handling a global voice command 402 {voice operable element in a currently displayed view on a terminal}”; Patch, ¶¶ [0070]), the voice-action description information including a voice-action list and configuration information of each voice-action in the voice-action list (“defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information},” where “combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}”. Further, “the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070], [0067]), and the voice-action being configured to describe a voice operation to be performed on an element in the view (the global voice command 408 {a voice label corresponding to the voice action} is mapped to {configured to describe} “at least one function of the object” where a function {a voice operation} is performed on an object {an element}, and where “logical objects” such as “virtual objects and physical objects” are perceived by a “computer...as on-screen elements {performed on elements in the view}”; Patch, ¶¶ [0070], [0012]) in which the configuration information of each voice- action is configured to indicate specific execution features corresponding to the voice-action (“The end result may be a map {configuration information} of speech commands {...of each voice action} tailored to the user” where “single speech commands {…corresponding to the voice actions} may enable performing several keystrokes worth of work {specific execution features}.”; Patch, ¶¶ [0131]), receiving a sequence of actions determined according to the voice information [and] … view description information of the voice-operable element… (“method for enabling a user to interact with a computer platform using a voice command {receiving….actions according to the voice information} may comprise the steps of defining a structured grammar for handling a global voice command 402... and mapping at least one function of the object to the global voice command 408” where “the voice command specifies moving or changing an object location with respect to at least one of an x-y and an x-y-z coordinate system associated with at least one of the object and a target location {according to view description information...}” and where “the global voice command enables building a custom list of objects and the function 414 may relate to a listed object. {...of the voice-operable element}” and where “the speech recognition command system 102 may enable… combinations of [actions] {receiving a sequence of actions…}” in response to speech commands; Patch, ¶¶ [0070], [0116]) in which the view description information comprises an element name, a text label, and coordinate distribution of the element in the view (The view description information can include “an x-y-z coordinate system associated with at least one of the object, {coordinate distribution of the element in the view}” where “objects may also be manipulated as a group using a group name {element name}” and, referencing a specific example “commands that call up dialog boxes may also be accessed using the first word of the dialog box label {text label}”; Patch, ¶¶ [0070], [0094]-[0095]). However, Patch fails to expressly recite receiving a sequence of actions determined according to the voice information… and the voice-action description information from the server and performing action processing logics corresponding to the voice-actions in the sequence of actions.
The relevance of Bai is described above with relation to claim 1. Regarding claim 6, Bai teaches receiving a sequence of actions determined according to the voice information… and the voice-action description information from the server (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {receiving...actions} to take based upon the intent {determined according to the voice information} and it can also do this based on the context information and the arguments {and the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure in order to accomplish a task that requires multiple different commands or actions.{locating a sequence of actions}” and where “The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software on a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware.”; Bai, ¶¶ [0083], [0084], [0150]), and performing action processing logics corresponding to the voice-actions in the sequence of actions (The system “identifies actions to take {operational intention of the user} based upon the intent” where the “natural language understanding system 140 identifies an intent 386 in the utterance 142.” Further, the system identifies the actions to take “based on the context information {view description information of the voice-operable element}”; Bai, ¶¶ [0081]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include receiving a sequence of actions determined according to the voice information and the voice-action description information from the server and performing action processing logics corresponding to the voice-actions in the sequence of actions. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 8, the rejection of claim 6 is incorporated. Patch disclose all of the elements of the current invention as stated above. However, Patch fail(s) to expressly recite wherein the sequence of actions includes an ID of at least one voice-action and a key value in the configuration information of the voice-action, and performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: when the sequence of actions includes an ID of a voice-action and a key value in the configuration information of the voice-action, performing a corresponding action processing logic according to the ID and the key value; and when the sequence of actions includes IDs of more than two voice-actions and key values in the configuration information of the voice-actions, determining a target voice-action in the sequence of actions through interactions with the terminal, and performing a corresponding action processing logic according to the ID and the key value of the target voice-action.
The relevance of Bai is described above with relation to claim 1. Regarding claim 8, Bai teaches wherein the sequence of actions includes an ID of at least one voice-action and a key value in the configuration information of the voice-action, (The system “identifies actions to take based upon the [identified] intent {including an identification (ID) of at least one voice-action}” which are part of the “nested or arranged [actions] in a hierarchal or dependency structure {sequence of actions}.” Further, the system “identifies actions to take based upon the intent and it can also do this based on the context information and the arguments {a key value in...},” where context information is included with the intent {...the configuration information of the voice-action}; Bai, ¶¶ [0083], [0084]) and performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: when the sequence of actions includes an ID of a voice-action and a key value in the configuration information of the voice-action (The system “identifies actions to take based upon the intent and it can also do this based on the context information and the arguments.”; Bai, ¶¶ [0083], [0084], [0063]), performing a corresponding action processing logic according to the ID and the key value (“The [identified] intent illustratively corresponds to an action that the user wishes to perform” where “actions are to be performed based upon the arguments and context information.”; Bai, ¶¶ [0085]); and when the sequence of actions includes IDs of more than two voice-actions and key values in the configuration information of the voice-actions, determining a target voice-action in the sequence of actions through interactions with the terminal, and performing a corresponding action processing logic according to the ID and the key value of the target voice-action.
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include wherein the sequence of actions includes an ID of at least one voice-action and a key value in the configuration information of the voice-action, and performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: when the sequence of actions includes an ID of a voice-action and a key value in the configuration information of the voice-action, performing a corresponding action processing logic according to the ID and the key value; and when the sequence of actions includes IDs of more than two voice-actions and key values in the configuration information of the voice-actions, determining a target voice-action in the sequence of actions through interactions with the terminal, and performing a corresponding action processing logic according to the ID and the key value of the target voice-action. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 9, the rejection of claim 6 is incorporated. Patch disclose all of the elements of the current invention as stated above. However, Patch fail(s) to expressly recite wherein performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: performing the action processing logics corresponding to the voice-actions in the sequence of actions and obtaining voice events corresponding to the action processing logics and performing the voice events during performing the action processing logics, wherein the voice events are configured to define product logics to be processed during performing the voice-actions.
The relevance of Bai is described above with relation to claim 1. Regarding claim 9, Bai teaches wherein performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: performing the action processing logics corresponding to the voice-actions in the sequence of actions (“The [identified] intent illustratively corresponds to an action that the user wishes to perform” where “actions are to be performed based upon the arguments and context information,” where the actions can be “nested or arranged in a hierarchal or dependency structure”; Bai, ¶¶ [0085], [0084]), and obtaining voice events corresponding to the action processing logics and performing the voice events during performing the action processing logics, (In one example, the system enters a dialog to correct a specific element on a view. The interaction begins by “The user then speaks ‘Make this part more visually appealing.’ A textual representation of the utterance, along with a possible interpretation, are generated, and the textual representation is displayed.” Thus, at this point the generation of the possible interpretation {action processing logics} corresponding to the user speech {voice-action in the sequence of actions} is paused. The system then asks for further input during the performing of the processing logics, which “the user can indicate... using a voice command”; Bai, ¶¶ [0135]) wherein the voice events are configured to define product logics to be processed during performing the voice-actions (The voice command {voice events} are configured to make certain that the actions are performed on the right portion of the view { configured to define product logics to be processed} during the performance of the correction {during performing the voice-actions}; Bai, ¶¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include wherein performing the action processing logics corresponding to the voice-actions in the sequence of actions, includes: performing the action processing logics corresponding to the voice-actions in the sequence of actions and obtaining voice events corresponding to the action processing logics and performing the voice events during performing the action processing logics, wherein the voice events are configured to define product logics to be processed during performing the voice-actions. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 10, Patch discloses A view-based voice interaction apparatus, comprising: one or more processors (“The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor.”; Patch, ¶ [0148]), and a storage device, configured to store one or more programs, wherein, when the one or more programs are executed by the one or more processors (“The processor may access a storage medium through an interface that may store methods, codes, and instructions as described herein and elsewhere.”; Patch, ¶ [0148]), the one or more processors are configured to implement a view-based voice interaction method, which is applied to a server (The processor may be part of a server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform.”; Patch, ¶ [0148]), comprising (the method described with reference to the “speech recognition system”; Patch, ¶¶ [0070]): obtaining voice information of a user and (“receiving voice input from the user of the computer platform”; Patch, ¶¶ [0070]) voice-action description information of a voice-operable element in a currently displayed view on a terminal (“defining a structured grammar {voice-action description information} for handling a global voice command 402 {voice operable element in a currently displayed view on a terminal}”; Patch, ¶¶ [0070]), the voice-action description information including a voice-action list and configuration information of each voice-action in the voice-action list (“defining a global voice command {a voice label corresponding to the voice action} of the structured grammar 404 {voice-action description information},” where “combinations of these functions to discrete voice commands {thus, a voice action as part of a voice action list}”. Further, “the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070], [0067]), and the voice-action being configured to describe a voice operation to be performed on an element in the view (the global voice command 408 {a voice label corresponding to the voice action} is mapped to {configured to describe} “at least one function of the object” where a function {a voice operation} is performed on an object {an element}, and where “logical objects” such as “virtual objects and physical objects” are perceived by a “computer...as on-screen elements {performed on elements in the view}”; Patch, ¶¶ [0070], [0012]), in which the configuration information of each voice- action is configured to indicate specific execution features corresponding to the voice-action (“The end result may be a map {configuration information} of speech commands {...of each voice action} tailored to the user” where “single speech commands {…corresponding to the voice actions} may enable performing several keystrokes worth of work {specific execution features}.”; Patch, ¶¶ [0131]); obtaining operational intention of the user… according to view description information of the voice-operable element (“method for enabling a user to interact with a computer platform using a voice command {obtaining an operational intent of the user...} may comprise the steps of defining a structured grammar for handling a global voice command 402... and mapping at least one function of the object to the global voice command 408” where “the voice command specifies moving or changing an object location with respect to at least one of an x-y and an x-y-z coordinate system associated with at least one of the object and a target location {according to view description information...}” and where “the global voice command enables building a custom list of objects and the function 414 may relate to a listed object. {...of the voice-operable element}”; Patch, ¶¶ [0070]), in which the view description information comprises an element name, a text label, and coordinate distribution of the element in the view (The view description information can include “an x-y-z coordinate system associated with at least one of the object, {coordinate distribution of the element in the view}” where “objects may also be manipulated as a group using a group name {element name}” and, referencing a specific example “commands that call up dialog boxes may also be accessed using the first word of the dialog box label {text label}”; Patch, ¶¶ [0070], [0094]-[0095]). However, Patch fails to expressly recite obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element…; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing.
The relevance of Bai is described above with relation to claim 1. Regarding claim 10, Bai teaches obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element… (“Speech recognition system 138 generates a textual representation of the utterance, as indicated by block 382. Once a textual representation is generated, natural language understanding system 140 identifies an intent 386 in the utterance 142, based upon the textual representation.” where The system “identifies actions to take {operational intention of the user} based upon the intent” where the “natural language understanding system 140 identifies an intent 386 in the utterance 142.” Further, the system identifies the actions to take “based on the context information {view description information of the voice-operable element}”; Bai, ¶¶ [0081]); locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information (“User interface component 130 then displays the textual representation to the user, as indicated by block 392. Action identifier 169 in action generator 120 identifies actions {locating...actions} to take based upon the intent {matched with the operational intention in the voice-action list} and it can also do this based on the context information and the arguments {according to the voice-action description information}” where the actions can be “nested or arranged in a hierarchal or dependency structure in order to accomplish a task that requires multiple different commands or actions.{locating a sequence of actions}”; Bai, ¶¶ [0083], [0084]); and delivering the sequence of actions to the terminal for performing (“Once action generator 120 has identified the action to be taken, and has used search system 116 to identify the items of content needed to perform the action {the sequence of actions}, it illustratively uses one of the components in action generator 120 to perform the action... by performing one or more actions in one of the controlled systems 124” and where the system can be “Software or components of architecture 100 as well as the corresponding data... stored on servers at a remote location” and delivered to “any other computing component {to the terminal for performing}”; Bai, ¶¶ [0086], [0150]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch to incorporate the teachings of Bai to include obtaining operational intention of the user by performing semantic recognition on the voice information according to view description information of the voice-operable element…; locating a sequence of actions matched with the operational intention in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing. The systems and methods described in Bai reduce the burden and time consumption of “loading and interacting with content” while avoiding error. (Bai, ¶ [0006]).

Regarding claim 14, the rejection of claim 10 is incorporated. Claim 14 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Claims 2, 7, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Patch and Bai as applied to claims 1, 6, and 10 above, and further in view of Thangarathnam.

Regarding claim 2, the rejection of claim 1 is incorporated. Patch and Bai disclose all of the elements of the current invention as stated above. Patch further discloses wherein the voice-action description information further including a voice label (discloses “a global voice command {a voice action of a voice action list} of the structured grammar 404 {voice-action description information}, wherein the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070]), the voice label being configured to describe information about the voice-operable element in the view (where the listed commands can be “Start...with the name of the program or, to call up a default program, the name of the type of program” as well rules call for “consistent, descriptive, noun-based menu items. {describe information about the voice operable element in the view}”; Patch, ¶¶ [0067], [0094]). However, Patch and Bai fail(s) to expressly recite distinguish different function operations of the same voice-actions in different views.
Thangarathnam teaches “systems and methods for voice control of computing devices.” (Thangarathnam, ¶¶ [0015]). Regarding claim 2, Thangarathnam teaches distinguish different function operations of the same voice-actions in different views (“historical use data may indicate that a given voice command, while corresponding to multiple directives, historically corresponds to a first directive more frequently than a second directive with respect to voice commands received via the user device 102...” where “data indicating the location of objects with respect to each other as displayed on the user device 102 may be utilized to rank directives. For example, directives to perform actions on objects that are displayed more prominently may be prioritized more than directives to perform actions on objects that are displayed less prominently.”; Thangarathnam, ¶¶ [0049]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai to incorporate the teachings of Thangarathnam to include distinguish different function operations of the same voice-actions in different views. By “utilizing the context information indicating the objects displayed on the screen…[as] ‘hints’ for user interaction with the system,” the “confidence at which the system determines which action to perform from the voice command may be increased,” as recognized by Thangarathnam. (Thangarathnam, ¶¶ [0021]).

Regarding claim 7, the rejection of claim 6 is incorporated. Patch and Bai disclose all of the elements of the current invention as stated above. Patch further discloses wherein the voice-action description information further including a voice label (discloses “a global voice command {a voice action of a voice action list} of the structured grammar 404 {voice-action description information}, wherein the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function {configuration information} of the object to the global voice command 408,”; Patch, ¶¶ [0070]), the voice label being configured to describe information about the voice-operable element in the view (where the listed commands can be “Start...with the name of the program or, to call up a default program, the name of the type of program” as well rules call for “consistent, descriptive, noun-based menu items. {describe information about the voice operable element in the view}”; Patch, ¶¶ [0067], [0094]). However, Patch and Bai fail(s) to expressly recite distinguish different function operations of the same voice-actions in different views.
The relevance of Thangarathnam is described above with relation to claim 2. Regarding claim 7, Thangarathnam teaches distinguish different function operations of the same voice-actions in different views (“historical use data may indicate that a given voice command, while corresponding to multiple directives, historically corresponds to a first directive more frequently than a second directive with respect to voice commands received via the user device 102...” where “data indicating the location of objects with respect to each other as displayed on the user device 102 may be utilized to rank directives. For example, directives to perform actions on objects that are displayed more prominently may be prioritized more than directives to perform actions on objects that are displayed less prominently.”; Thangarathnam, ¶¶ [0049]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai to incorporate the teachings of Thangarathnam to include distinguish different function operations of the same voice-actions in different views. By “utilizing the context information indicating the objects displayed on the screen…[as] ‘hints’ for user interaction with the system,” the “confidence at which the system determines which action to perform from the voice command may be increased,” as recognized by Thangarathnam. (Thangarathnam, ¶¶ [0021]).

Regarding claim 11, the rejection of claim 10 is incorporated. Claim 11 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Claims 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Patch and Bai as applied to claims 1 and 10 above, and further in view of Deoras (U.S. Pat. App. Pub. No. 2015/0066496, hereinafter Deoras).

Regarding claim 3, the rejection of claim 1 is incorporated. Patch and Bai disclose all of the elements of the current invention as stated above. Bai further discloses wherein obtaining the operational intention of the user by performing the semantic recognition on the voice information according to the view description information of the voice-operable element, includes: obtaining a corresponding query text by performing speech recognition on the voice information according to the view description information of the voice-operable element (“Speech recognition system 138 {performing speech recognition} generates a textual representation {corresponding query text} of the utterance {on the voice information}, as indicated by block 382.” where “context of the application” is used in determining the intent. As indicated in the example, ‘the phrase “share this document with Joe,” is interpreted by the “natural language understanding system 140” as an “an action or command that the user wishes the system to perform” on the displayed document {according to the view description information}; Bai, ¶¶ [0081], [0082]); extracting a text label of the voice-operable element from the view description information of the voice-operable element (“Once a textual representation is generated, natural language understanding system 140 identifies an intent 386 {text label} in the utterance 142, based upon the textual representation.”; Bai, ¶¶ [0081]), the text label including a type and attributes of the voice-operable element (“Identifying the user intent {text label}... [can include] the string and contextual data {attributes} can be sent to a classifier where they are classified into a class {type}”; Bai, ¶¶ [0042]); and obtaining...the operational intention of the user (“natural language understanding system 140 deciphers a user intent {text label}, and maps the intent {text label} to an action {operational intent}”; Bai, ¶¶ [0081]). However, Patch and Bai fail to expressly recite obtaining a semantic-labeled result of the query text as the operational intention of the user by performing semantic labeling on the query text according to the extracted text label by utilizing a pre-trained labeling model.
Deoras teaches systems and methods for “assignment of semantic labels to words in a natural language utterance.” (Deoras, ¶ [0004]). Regarding claim 3, Deoras teaches obtaining a semantic-labeled result of the query text as the operational intention of the user (The system includes a “labeler component 124 that receives semantic features output by the semantic feature identifier component 122 for words in the sequence of words, and assigns respective labels {obtaining a semantic-labeled result...} to words in the sequence of words {of the query text} based upon the semantic features.”; Deoras, ¶¶ [0029]) by performing semantic labeling on the query text according to the extracted text label by utilizing a pre-trained labeling model (“the labeler component 124 can comprise at least one of a DNN 126 or a recurrent neural network (RNN) 128, wherein the at least one of the DNN 126 or the RNN 128 is used in connection with performing the labeling {by performing semantic labeling} of words in the sequence of words {on the query text...}” for “ assigning semantic labels to words in a sequence of words.” Further, “the at least one of the DNN 126 or the RNN 128 are trained to assign labels {by utilizing a pre-trained labeling model} pertaining to a particular domain and/or intent {...according to the extracted text label}.”; Deoras, ¶¶ [0030]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai to incorporate the teachings of Deoras to include obtaining a semantic-labeled result of the query text as the operational intention of the user by performing semantic labeling on the query text according to the extracted text label by utilizing a pre-trained labeling model. Semantic slot filling as performed using trained neural networks allows for automatic extraction of a semantic concept while overcoming the deficiencies in the prior art regarding poor generalization of models on “complex combinations of patterns,” as recognized by Deoras. (Deoras, ¶¶ [0001], [0002], [0004]).

Regarding claim 12, the rejection of claim 10 is incorporated. Claim 12 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Claims 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Patch, Bai, and Deoras as applied to claims 3 and 12 above, and further in view of Thangarathnam.

Regarding claim 4, the rejection of claim 3 is incorporated. Patch, Bai and Deoras disclose all of the elements of the current invention as stated above. However, Patch and Bai fail to expressly recite wherein obtaining the corresponding query text by performing the speech recognition on the voice information according to the view description information of the voice-operable element, includes: predicting acoustic features of an audio signal of the voice information by utilizing a pre-trained acoustic model and generating the corresponding query text by decoding the acoustic features dynamically based on the view description information of the voice-operable element by utilizing a pre-trained language model.
The relevance of Deoras is described above with relation to claim 3. Regarding claim 4, Deoras teaches wherein obtaining the corresponding query text by performing the speech recognition on the voice information according to the view description information of the voice-operable element, includes: predicting acoustic features of an audio signal of the voice information by utilizing a pre-trained acoustic model (“client computing device 102 may also include an acoustic feature extractor component 112 that receives the spoken utterance captured by the microphone 108 and extracts acoustic features {predicting acoustic features of an audio signal...} from such utterance.” where the “acoustic model 116 may be or include any suitable type of model, such as... a deep neural network (DNN) {pre-trained acoustic model}”; Deoras, ¶¶ [0025]); and generating the corresponding query text by decoding the acoustic features dynamically (“In combination, the acoustic model 116 and the language model 118 can recognize and output {thus, by decoding the acoustic features...} words, numbers, acronyms, etc. {generating the corresponding query text} in the spoken utterance set forth by the user 110.”; Deoras, ¶¶ [0026]) based on [the view description information]… by utilizing a pre-trained language model (“The domain determiner component is configured to identify a general domain {view description information} to which the utterance set forth by the user 110 is directed.” where “a determined domain and/or intent can be provided as an input feature to a model that is trained to assign semantic labels to words in sequences of words across several domains/intents {...by utilizing a pre-trained language model}.”; Deoras, ¶¶ [0027]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai to incorporate the teachings of Deoras to include wherein obtaining the corresponding query text by performing the speech recognition on the voice information according to the view description information of the voice-operable element, includes: predicting acoustic features of an audio signal of the voice information by utilizing a pre-trained acoustic model and generating the corresponding query text by decoding the acoustic features dynamically based on [the view description information]… by utilizing a pre-trained language model. Semantic slot filling as performed using trained neural networks allows for automatic extraction of a semantic concept while overcoming the deficiencies in the prior art regarding poor generalization of models on “complex combinations of patterns,” as recognized by Deoras. (Deoras, ¶¶ [0001], [0002], [0004]). However, Patch, Bai and Deoras fail to expressly recite wherein the view description information includes an architecture of the view and a relationship among respective voice-operable elements in the view.
The relevance of Thangarathnam is described above with relation to claim 2. Regarding claim 4, Thangarathnam teaches wherein the view description information is an architecture of the view and a relationship among respective voice-operable elements in the view (wherein “historical use data” can include contextual data related to the “view on the user device” {view description information}, and wherein said contextual data, according to an exemplary embodiment, can include “data indicating the location of objects with respect to each other as displayed on the user device 102 [which] may be utilized to rank directives {a relationship among respective voice-operable elements in the view}. For example, directives to perform actions on objects that are displayed more prominently may be prioritized more than directives to perform actions on objects that are displayed less prominently {an architecture of the view}.”; Thangarathnam, ¶¶ [0049]-[0050]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system of Patch, as modified by the systems and methods for interpretation of a voice input into a set of commands of Bai, and as modified by the semantic label assignment systems of Deoras, to incorporate the teachings of Thangarathnam to include wherein the view description information is an architecture of the view and a relationship among respective voice-operable elements in the view. By “utilizing the context information indicating the objects displayed on the screen…[as] ‘hints’ for user interaction with the system,” the “confidence at which the system determines which action to perform from the voice command may be increased,” as recognized by Thangarathnam. (Thangarathnam, ¶¶ [0021]).

Regarding claim 13, the rejection of claim 12 is incorporated. Claim 13 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ben-Dor et al. (U.S. Pat. App. Pub. No. 2019/0279615) discloses systems and methods for execution of commands by a digital assistant in a group device environment based on determined intent and the executing device’s presentation capabilities.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657