Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114 
A request for continued examination under, including the fee set forth in 37 CFR1.17(e), was filed in this application after final rejection. Since this application is eligiblefor continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e)has been timely paid, the finality of the previous Office action has been withdrawnpursuant to 37 CFR 1.114. Applicant's submission filed on 10/11/2021 has been entered.
Status of the Claims
Claims 1, 4-6, and 8-10 are pending.  
Response to Applicant’s Argument
In response to “Therefore, Pitschel does not disclose or teach performing a semantic analysis, not to mention "determining whether control instruction information is comprised in the conversation voice by performing a semantic analysis on the conversation voice" (i.e., the above technical feature A)”.
Semantic analysis is simply a process for determining a meaning or user intention. For example, the specification US 2020/0219503 A1 at ¶41: The determining whether control instruction information is included in the conversation voice may further include performing a semantic analysis on the conversation voice with the wake-up word and determining whether control instruction information carrying an operational intention is included in content of the conversation voice at S220. In a particular example, a conversation voice of a user may be "where are you going tomorrow afternoon?". After a semantic analysis is performed on the conversation voice, it may be determined that the target intention of the conversation voice is to ask another person's schedule for tomorrow. US 2020/0219503 A1 at ¶43.
As noted by Pitschel, a digital assistant system’s ability to fulfill a user’s request is dependent on the system’s correct comprehension of the request or instructions (Col 1, Rows 23-26). Further, natural language processing enabled users to interact with the system using natural language where the system can interpret the user’s input to infer the user’s intent, translate the inferred intent into actionable tasks, and execute operations to perform the tasks (Col 1, Rows 28-33). 
Therefore, Pitschel teaches determining whether control instruction information is comprised in the conversation voice by performing a semantic analysis on the conversation voice because it has a natural language processing module that takes sequences of words or tokens generated by speech to text processing of user input to attempt to associate the tokens or words with one or more actionable intents (Col 10, Rows 6-14).
In response to “Thus, the above disclosure in Baldwin fails to teach the technical feature "in response to a determination of no control instruction information in the conversation voice, sending the conversation voice to the opposite equipment of the conversation process" recited in amended claim 1 of the present application. Instead, the above disclosure in Baldwin gives an opposite teaching regarding "in response to a determination of control filtering out the conversation voice with the control instruction information and prohibiting from sending the conversation voice to an opposite equipment of the conversation process”.
Baldwin teaches active monitoring of communication session 110 between a set of user equipments for a command associated with a user equipment (¶29). If this active monitoring process detects such command, the filtering process is performed (¶33) in order to ensure that participants in the communication session would not hear such detected commands or a voice conversation that the command’s issuer does not wish another user to hear. 
However, if the monitoring process does not detect such commands, then communication session is not subject to the filtering process and conversation voice in the communication session are transmitted without filtering consistent with the limitation “in response to a determination of no control instruction information in the conversation voice, sending the conversation voice to the opposite equipment of the conversation process”. This is simply the established function of an unfiltered communication session of Baldwin.  
In response to “It can be seen that, paragraph [0031] of Baldwin merely discloses "voice commands can be identified by an action word associated with a user requesting a command" and "the word "COMMAND" can signify that a user is attempting to address the command recognition component and can be followed with a specific command request", but does not disclose "determine whether control instruction information carrying an operational intention is comprised in content of the conversation voice based on the conversation voice with the wake-up word and at least one subsequent conversation voice following the conversation voice with the wake-up word"”.

Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 5-6, and 8-10 are rejected under 35 USC 103(a) as being unpatentable over Baldwin (US 2013/0141516 A1) in view of Pantel (US 2014/0214429 A1).
Regarding Claims 1, 6, and 10, Baldwin discloses an apparatus for filtering out a voice instruction (Figs. 4-6, in-call command control system 400, 500, 600), comprising: 
one or more processors (¶58, computer system with a processing unit); and 
a non-transitory memory for storing computer executable instructions, wherein the computer executable instructions are executed by the one or more processors to enable the one or more processors (¶16 and ¶61, computer storage media storing computer program to control a computer) to: 
receive a conversation voice in a conversation process (Fig. 4, communication session with devices 402, 404, and 406 maintained by in-call command system 400); 
determine whether control instruction information is comprised in the conversation voice by performing natural language analysis on the conversation voice (¶42, command recognition component 210 independently monitor each communication channel for commands; ¶45, perform natural language processing to infer a requested action); and 
in response to a determination of control instruction information in the conversation voice, filter out the conversation voice with the control instruction information (¶42, filter component 220 can dynamically filter commands based on the channel that gave the command and the command given; ¶25-26, in-call command control system 110 interacting with users 112, 114, and 116 where filter 122 can filter voice command of user 112 to isolate user 114 from being heard by users 114 and 116) and prohibit from sending the conversation voice to an opposite equipment of the conversation process (¶25, in response to user 112’s voice command to isolate user 114, filter 122 can work in harmony with filter 124 to isolate conversation between user 112 and user 114 that is not perceived by user 116), 
and in response to a determination of no control instruction information in the conversation voice, sending the conversation voice to the opposite equipment of the conversation process (¶29 in view of ¶33 and Fig. 8, in call command control system 200 includes a command recognition component 210 actively monitors communication session 110 between a set of user equipments for a command associated with a user equipment to recognize command and to dynamically filter the command prior to it being perceived within communication session 110; in a dynamic filtering process, if the active monitoring process does not recognize any voice command, then filter components do not perform the functions set forth in ¶25 and ¶33 and communication session 110 comprising audio and video between equipments of users 112, 114 and 116 continues).
Baldwin does not disclose determining whether control instruction information is comprised in the conversation voice by performing a semantic analysis on the conversation voice.
Pantel discloses a speech recognition device configured to recognize an inputted spoken utterance (Fig. 2, ¶24, digital assistant system terminal 1 with primary voice recognition process), the speech recognition device (¶44 in view of ¶24, processor 27 running digital assistant software implementing the extractor, analyzer, determiner, and controller) receives a conversation voice in a conversation process (¶29, input audio data are buffered into audio buffer 6), determines whether control instruction information is comprised in the conversation voice by identifying a preset wake-up word in the conversation voice (¶31 in view of ¶10, switch from standby / low power mode to full operation mode based on a secondary recognition process of a keyword or a phrase from a defined keyword and phrase-catalog; e.g., ¶60, a conversation between people is monitored and a keyword 18 or a phrase from the keyword and phrase-catalog appears in the conversation (e.g. "soccer"), so that the primary voice recognition process 8 and the dialog system 9 is started or activated), perform a semantic analysis on the conversation voice with the wake-up word and at least one subsequent conversation voice following the conversation voice with the wake-up word (¶33-34, perform primary voice recognition process 8 on audio content comprising audio recording 21 (buffered content located in the audio buffer 6 including the keyword) and the subsequent live audio data 22 for conversion into text 13 and perform semantic analysis on text 13 to determine the extent of query to the digital assistant; ¶39, the entire content 21 of the audio buffer 6 can be converted to text 13 together with the subsequent live transmission 22 and be analyzed by the dialog system 9), and determining whether control instruction information carrying an operational intention is comprised in content of the conversation voice (¶35, if the dialog system 9 concludes that the question, message, or request contained in the audio buffer 6 is relevant, the terminal 1 remains in full operation and the dialog system 9 will interact with the user; ¶60, determine whether a question, message, or request was made to the personal assistant system; e.g., the question "Who won the match today?" can be replied with the soccer results of the current match day where “who” is the keyword / phrase that activated the dialog system). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Baldwin’s natural language processing to infer a requested action by performing a semantic analysis on the conversation voice with the wake-up word and determining whether control instruction information carrying an operational intention is comprised in content of the conversation voice in order to implement intent inference (Baldwin, ¶31, natural language processing to infer a requested action; Pantel, ¶35, when the dialog system 9 concludes that the question / message or request is relevant, remain in full operation and interact with the user).
Further regarding claim 10, Baldwin discloses a non-transitory computer-readable storage medium, having computer executable instructions stored thereon, that when executed by a processor (¶58, computer system with a processing unit), causes the processor to perform the functions of claims 1 and 6 (¶16 and ¶61, computer storage media storing computer program to control a computer).
Regarding Claim 5, Baldwin discloses wherein after filtering out the conversation voice with the control instruction information and prohibiting from sending the conversation ¶42, command recognition component 210 independently monitor each communication channel for commands and filter component 220 can dynamically filter commands based on the channel that gave the command and the command given), the method further comprises: 
performing an operation associated with the control instruction information, according to the control instruction information (¶38, command performance component 230 for performing specific command).
Regarding Claim 8, Baldwin discloses wherein the computer executable instructions are executed by the one or more processors to enable the one or more processors to: 
receive the conversation voice from the identification module and send the conversation voice to the opposite equipment of the conversation process; or receive the conversation voice from the receiving module and send the conversation voice to the opposite equipment of the conversation process (¶25, send the filtered audio / video to users 114 and 116; ¶29, communication session between a set of user equipments).
Regarding Claim 9, Baldwin discloses wherein the one or more programs are executed by the one or more processors to enable the one or more processors to: filter out the conversation voice; or notify the conversation module to filter out the conversation voice received from the receiving module (¶42, filter component 220 can dynamically filter commands based on the channel that gave the command and the command given).
Claim 4 is rejected under 35 USC 103(a) as being unpatentable over Baldwin (US 2013/0141516 A1) in view of Pantel (US 2014/0214429 A1) as applied to claim 1, in further view of Pitschel et al. (US 9922642 B2).
Regarding Claim 4, Baldwin does not disclose performing a semantic analysis on the conversation voice and determine a target intention of the content of the conversation voice.
Pitschel teaches an AI digital assistant (Col 7, Rows 60-65) capable of understanding (i.e., performing semantic analysis) and acting upon user speech commands in multiple domains (Col 9, Rows 11-24 and Col 11, Rows 45-50) to determine whether a control instruction information / speech commands carries an operational intention is comprised in content of a conversation voice (Col 9, Rows 25-47, obtain user speech input and perform speech to text; Col 9, Rows 65-67, pass speech recognition results to natural language processing module 332 for intent inference; Col 10, Rows 5-24, natural language processing attempts to associate token sequences with one or more actionable intents recognized by the digital assistant where an actionable intent represents a task that can be performed by the digital assistant; in one particular example, Col 13, Rows 39-51, when user inputs speech “Make me a dinner reservation at a sushi place at 7”, the NLP 332 generates a partial structured query for restaurant reservation domain with the parameters {Cuisine = “sushi”} and {Time = “7 pm”}).
In particular, Pitschel teaches wherein the determining whether control instruction information is comprised in the conversation voice comprises: 
performing a semantic analysis on the conversation voice (Col 9, Rows 65-67 and Col 11, Rows 45-51); 
determining a target intention of content of the conversation voice (Col 10, Rows 10-24, attempt to associate token sequence resulted from speech to text processing with one or more actionable intents); 
Col 10, Rows 39-58, map token sequence to an actionable intent node in ontology 360); and 
determining whether the control instruction information is comprised in the conversation voice according to a result of the matching (Col 12, Rows 21-42, natural language processor 332 determines what nodes are implicated by words in the token sequence by determining if a word / phrase is found to be associated with one or more nodes in ontology 360 and select one of the actionable intents as the task that the user intended the digital assistant to perform).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Baldwin’s natural language processing to infer a requested action by performing a semantic analysis on the conversation voice with the wake-up word and determining whether control instruction information carrying an operational intention is comprised in content of the conversation voice in order to implement intent inference (Baldwin, ¶31, natural language processing to infer a requested action; Pitschel, Col 9, Rows 65-67, natural language processing for intent inference associated with an “actionable intent”). 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published 
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        10/22/2021