DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Regarding Korean Patent Application No. 10-2019-0150984, Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: Figure 6, reference characters 115 and 115a-d.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
Para. 0003, line 1: “Machine learning(e.g., deep learning)” should read “Machine learning (e.g., deep learning)”
Para. 0060, line 2: “BS” should read “base station”
Para. 00234, line 1: “chuimsae” appears to be a word that was not properly translated from Korean into English.
Appropriate correction is required.
Claim Objections
Claim 11 is objected to because of the following informalities:  
Claim 11 recites “wherein some of the at least one node has different weights….”  The examiner suggests amending claim 11 to recite “wherein some of the at least one nodes in each of the input layer, hidden layer, and output layer have [[has]] different weights…”  to clarify that “some” refers to more than one total node.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites “preset temporary pause keyword as a nonverbal element” and is indefinite because there is an apparent contradiction between a “keyword” and “nonverbal element”.  It is unclear how a “keyword” can be “nonverbal” or how a “voice signal” could be “nonverbal”. For purposes of examination “preset temporary pause keyword as a nonverbal element” is being interpreted to mean an utterance that indicates a temporary pause (e.g., “Wait a minute!”) and that such utterance is not considered to be part of the claimed “first utterance” or “second utterance”, e.g., a nonverbal element.  See instant specification, para. 00231-00233 and Fig. 11.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-13 and 17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Claim 1 recites a voice processing method. Under the broadest reasonable interpretation, the limitations cover performance in the human mind with the assistance of physical aids (e.g., pen and paper).  For example, two humans conversing could:
if a stop signal is detected during a reception of a first utterance, temporarily pausing the reception of the first utterance; (e.g., Human A could begin asking a question, and Human B could say, “hey, wait a minute” to pause Human A, where for example Human B is writing down the question and needs Human A to pause so that Human B can catch up when writing)
receiving a second utterance after a termination of a temporary pause state based on the stop signal; and (e.g., Human B, could indicate that Human A should resume, e.g., “hey, sorry, please continue” and Human A could continue asking the question and Human B can continue writing down the question)
applying a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances to a pre-trained learning model to generate an output from which at least one word having an overlapping meaning is removed. (Human B can piece together Human A’s question by considering the two parts and mentally removing any overlapping words, or crossing-out overlapping words in the transcribed question (which may be transcribed on graph paper, where each character is in a cell in vector format), where Human B uses their brain, e.g., a pre-trained learning model, to determine a cohesive merged question)
The judicial exception is not integrated into a practical application. While the claim recites a “pre-trained learning model”, the claim only recites the model at a high level of generality and the claim does not recite a computer-specific algorithm for training or using the model.  Therefore, the pre-trained learning model is a simple computer automation of the process for determining overlapping words that could be performed by a human.  Accordingly, these elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. With respect to the claimed “pre-trained learning model,” use of a pre-trained neural network model, including a recurrent neural network or convolutional neural network, is well-known, routine, and conventional activity, as evidenced by at least:
US 20180276532 A1 (Kim et al. – filing date 03/23/2018) at Paras. 0005-0006: convolutional neural networks (CNN) are “conventional” and “widely used for image recognition” and recurrent neural networks (RNN) are “widely used for voice recognition”

US 20180253648 A1 (Kaskari et al. – filing date 03/01/2018) at Para. 0022: “conventional recurrent neural networks (RNNs)” process sequences of inputs and are suitable for tasks such as speech recognition.

US 20180190280 A1 (Cui et al.  – filing date 06/09/2017) at Para. 0064: convolutional neural networks “are well-known technologies widely studied and applied at present”

US 20170140753 A1 (Jaitly et al. – filing date 11/11/2016) at Para.0072: “conventional machine learning training technique to train the layers of the RNN system, e.g., a stochastic gradient descent with backpropagation through time training technique”

US 20170053646 A1 (Watanabe et al. – filing date 08/17/2015 at Paras: 0031 and 0032 and Figs. 6 and 7: flow diagrams for a prediction system and training system for a “conventional recurrent neural network”

The remaining limitations in claim 1 are not sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer automation amounts to no more than mere instructions to apply the exception using generic computer automation. Mere instructions to apply an exception using generic computer automation cannot provide an inventive concept. Claim 1 is not patent eligible. 
	Claims 2-7 and 12 depend from claim 1 and do not remedy any of the deficiencies recited in claim 1 and are therefore rejected under the same grounds as claim 1 above.  In general, these claims merely recite specific details regarding the claimed method, including the use of vectors and sub-vectors and performing mathematical calculations to determine if first and second utterances have overlapping words. None of the additional limitations recited in these claims amount to anything more than the same or a similar abstract idea as recited in claim 1.
	Claim 8-11 depend from claim 1 and recite specific limitations relating to the “pre-trained learning model”, where claim 8 recites using a convolutional neural network (CNN), claim 9 recites the various layers of an artificial neural network, claim 10 recites a recurrent neural network, and claim 11 recites that various nodes in the respective neural networks have different weights.  As discussed above with respect to claim 1, neural networks, including CNNs and RNNs, are well-known, conventional, and routine activity and do not render the claims subject matter eligible.  Further, the various layers in a neural network are routine, conventional, and well-known activity as evidenced at least by:
US 20130013543 A1 (Dull et al. – filing date 09/06/2011) at Paras. 0052-0053: “conventional recurrent neural network: has input layer, hidden states, and output layer.

US 20170046616 A1 (Socher et al. – filing date 08/15/2016) at Para. 0026: “conventional CNN” has input layer, output layer, and hidden layers.

US 20190139540 A1 (Kanda et al. – filing date 06/02/2017) at Para. 0052: conventional deep neural networks and recurrent neural networks have an input layer, output layer, and a plurality of hidden layers, with each layer having a number of nodes.  See also, Fig. 2.
	
Therefore, the remaining limitations in claims 8-11 are not sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer automation amounts to no more than mere instructions to apply the exception using generic computer automation. Mere instructions to apply an exception using generic computer automation cannot provide an inventive concept. Claims 8-11 are not patent eligible.
Claim 13 recites a method that is substantially similar to claim 1, where claim 13 now recites that the first and second utterances are transmitted to a “server”.  A “server” is merely a general purpose computing component that is not sufficient to (a) integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception, because in both cases the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 13 is not patent eligible.
Claim 17 depends from claim 1 and does not remedy any of the deficiencies recited in claim 1 and are therefore is rejected under the same grounds as claim 1 above.  The addition of a “computer readable recording medium” is merely a general purpose computing component that is not sufficient to (a) integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception, because in both cases the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 17 is not patent eligible.
Claim 17 is further rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claims, which are directed to a “computer readable recording medium,” can be interpreted under their broadest reasonable interpretation to cover transitory forms of signal transmission (often referred to as “signals per se”), such as a propagating electrical or electromagnetic signal or carrier wave.  MPEP 2106.
The examiner notes that Paragraph 0273 to the instant specification states that “computer-readable recording medium may be realized in the form of a carrier wave (e.g., transmission over Internet).”  The examiner suggests amending the claims to be directed to “a non-transitory computer readable recording medium.”
	The examiner notes that claims 14-16 claim a specific method of transmitting and receiving data between various devices using specific data types, channels, and formats, and therefore claims 14-16 do not pertain to mental steps that can be performed entirely in the human mind and are therefore considered to be eligible subject matter.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-5, 9, 11-13, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Malhotra et al., U.S. Patent Application Publication US 20210103606 A1, hereinafter referenced as MALHOTRA, in view of Garcia et al., U.S. Patent Application Publication, US 20190295544 A1, hereinafter referenced as GARCIA.

Regarding claim 1, MALHOTRA discloses:
A voice processing method, comprising: (systems and methods for determining whether current and previous queries should be merged so that a human can interact with a computer system in a conversational manner; paras. 0001 and 0002; queries may be voice inputs that are converted to strings of characters suing a voice-to-text algorithm; para. 0141)
a reception of a first utterance; (Fig. 8, step 804, control circuitry 604 receives from a user a first query, where the first query may be a voice input; para. 0141)
receiving a second utterance; and (Fig. 8, step 804, control circuitry 604 receives from a user a second query, where the second query may be a voice input; para. 0141)
applying a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances (Fig. 8, step 806, first and second queries are mapped to input nodes 304 and 308, respectively, of a neural network; para. 0142; Fig. 3, input nodes 304 and 308 are inputs to the neural network, where each node is associated with a word from the query, e.g., nodes 304 are a first sentence vector and nodes 308 are a second sentence vector and the combination, e.g., concatenation, of nodes 304 and 308 are applied to the layers of the neural network; paras. 0068-0072) to a pre-trained learning model to generate an output from which at least one word having an overlapping meaning is removed. (Fig. 8, steps 808-816, determine, using the neural network, whether the first and second queries should be merged, e.g., “Where is the nearest pizza shop” and “one with at least 4 stars” may be merged, and “one” is removed because control circuitry 604 determines that “one” and “pizza shop” have redundant meanings; paras. 0143-0147; the neural network may be trained using a training data set; para. 0003, 0004, 0050).

However, MALHOTRA fails to explicitly teach:
if a stop signal is detected during a reception of a first utterance, temporarily pausing the reception of the first utterance;
receiving a second utterance after a termination of a temporary pause state based on the stop signal; 

However, in a related field of endeavor, GARCIA discloses that a virtual assistant 800, via a microphone, detects beginning points and end points of utterances in a particular audio stream via an absence of voice activity.  (paras. 0255-0257).  For example, if a pause or silence exceeds a threshold period of time (e.g., 3 seconds), the virtual assistant 800 determines that the first audio stream has been completed.  (para. 0255).  Later, a second audio stream may be received and the digital assistant can determine if the first audio stream and second audio stream belong to the same audio session.  (Figs. 12A-D, paras. 0287-0290).

The combination of MALHOTRA in view of GARCIA makes obvious:
if a stop signal is detected during a reception of a first utterance, (GARCIA discloses that a virtual assistant 800 detects, via a microphone, the end point of an utterance or audio stream, e.g., a stop signal at the end of the first utterance; GARCIA, para. 0255; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 perform endpoint detection as disclosed in GARCIA to determine if a first query/audio stream has reached an endpoint, e.g., stop signal at end of reception of a first utterance; MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, para. 0255) temporarily pausing the reception of the first utterance; (GARCIA discloses that a pause or silence exceeding a threshold such as a 3-second pause or silence, can be detected by a digital assistant via a microphone to determine if the utterances in a first audio stream have been paused, where the digital assistant’s determination of the endpoint detection, e.g., stop signal, and pause/silence exceeding a 3-second threshold is an indication of temporary pausing of the reception of the first utterance; GARCIA, paras. 0255-0257; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 pause receiving and recording the first query if there is a pause or silence exceeding a certain threshold, such as 3 seconds; MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, paras. 0255-0257).
receiving a second utterance after a termination of a temporary pause state based on the stop signal; and (GARCIA, in Figs. 12A-B, discloses that a second audio stream is received after the 3-second pause, e.g., temporary pause state that is triggered by the endpoint detection of the first audio stream, e.g., the stop signal, and that the digital assistant can determine if the first audio stream and second audio stream belong to the same audio session; paras. 0255, 0287-0290; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 resume receiving and recording the second query after a threshold period of delay (e.g., 3 seconds of pause or silence) and GARCIA discloses the threshold period of delay between utterances (e.g., 3 seconds of pause or silence); MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, paras. 0255-0257, 0287-0290).

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of GARCIA to MALHOTRA.  As disclosed in GARCIA, one of ordinary skill would be motivated to make such a combination that utilizes periods of pauses/silence to automatically end an audio stream/query so that the digital assistant does not over-fill audio stream buffers, which may be time-limited (e.g., 30 second buffers).  (GARCIA, para. 0252).  One of ordinary skill would further be motivated to utilize the teachings of GARCIA so that each speech command can be detected through pause/silences instead of using a trigger word (e.g., “Hey Siri,”), which leads to a more natural form of communication that also is more efficient and reduces power usage and improves battery life.  (GARCIA, paras. 0010, 0011).

Regarding claim 2, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  MALHOTRA in view of GARCIA makes obvious:
wherein the stop signal is a voice signal corresponding to one of a hesitation word, (MALHOTRA discloses that filler words such as “uh,” “like” do not meaningfully contribute to the understanding of an intent of a query using the neural network and may be ignored, and therefore treated as part of a silent delay; MALHOTRA, para. 0019) a silent delay, (GARCIA discloses a silent delay exceeding a threshold, e.g., 3 seconds, is a pause that indicates a first audio stream is finished; GARCIA, paras. 0255-0257) or a preset temporary pause keyword as a nonverbal element. (GARCIA discloses “stop” as a keyword for stopping the utterance or action; GARCIA, paras. 0288, 0301; the combination of MALHOTRA in view of GARCIA now utilizes a pause/delay exceeding a threshold (which includes use of hesitation words like “uh”) or an explicit keyword like “stop” to determine when an utterance/query has completed; MALHOTRA, para. 0019 with GARCIA, paras. 0255-0257, 0288, 0301)

Regarding claim 3, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  However, MALHOTRA fails to disclose:
if the reception of the first utterance is temporarily paused, waiting for an additional voice input for the first utterance that is input before the temporary pause state. (GARCIA discloses that audio streams are buffered in buffer 812, which is part of the digital assistant 800; paras. 0251, 0259; GARCIA further discloses that the digital assistant may be divided into client-server portions, or that the entire digital assistant implementation can be performed by a server; GARCIA, paras. 0203, 0249; MALHOTRA also discloses a client-server architecture where control circuitry 604 is implemented on a server and the user equipment device 600 is a thin client that just issues requests to the remote server; MALHOTRA, para. 0113; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 implemented in a client-server architecture, as disclosed in both MALHOTRA and GARCIA, where control circuitry 604 on the server is controlling user equipment 600 and has a digital assistant 800 as in GARCIA with a buffer 812, where control circuitry 604 now waits for additional packets/signals to be put into the buffer and determines if such packets/signals correspond to voice input that occurred before the pause/delay that exceeds a threshold; MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, paras. 0249, 0251).


Regarding claim 4, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  MALHOTRA further discloses:
wherein the first sentence vector is a vector representing an overall content of the first utterance. (Fig. 8, step 806, first query is mapped to input nodes 304 of a neural network; para. 0142; Fig. 3, input nodes are inputs to the neural network, where each node is associated with a word from the query, e.g., nodes 304 collectively are a first sentence vector; paras. 0068-0072)

Regarding claim 5, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  MALHOTRA further discloses:
wherein the second sentence vector is a vector concatenating a plurality of sub-vectors extracted from at least one word included in the second utterance.
(Fig. 8, step 806, second query is mapped to input nodes 308 of a neural network; para. 0142; Fig. 3, input nodes are inputs to the neural network, where each node is associated with a word from the query, e.g., nodes 308 are each a sub-vector that is merged together to collectively form the second sentence vector; paras. 0068-0072)

Regarding claim 9, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  MALHOTRA further discloses:
wherein the learning model is a learning model based on an artificial neural network, wherein the artificial neural network includes an input layer, a hidden layer, and an output layer each having at least one node. (neural network model is used to predict a user’s intention and/or merge first and second queries; paras. 0007-0009, 0063, 0065; Fig. 3, depicts a neural network 300, with input layer having input nodes 304, hidden layers 312 with multiple nodes, and output layer with output node 316; MALHOTRA, paras. 0068, 0172, 0182).

Regarding claim 11, MALHOTRA in view of GARCIA discloses the voice processing method of claim 9.  MALHOTRA further discloses:
wherein some of the at least one node has different weights in order to generate the output. (Fig. 3, weights 310 between input layer (e.g., nodes 304 and 308) and artificial layer (e.g., nodes 312) and weights 314 connecting the nodes in hidden layer 312 and output node 316; para. 0072; weights are determined based on training and therefore may differ and may be updated; para. 0008, 0009, 0013, 0050; multiplying and summing the weights in the neural networks determines an output 318 that is used to determine whether to perform a merge operation; paras. 0072-0074).

Regarding claim 12, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  MALHOTRA further discloses:
wherein the second utterance is an utterance belonging to the same dialog group as the first utterance. (MALHOTRA discloses that first and second queries are merged when the queries are determined to belong to the same context; paras. 0002, 0005, 0016, 0079; for example, “Where is the nearest pizza shop” and “one with at least 4 stars” are merged, and “one” is removed because control circuitry 604 determines that “one” and “pizza shop” have redundant meanings and belong to the same context; paras. 0143-0147; the examiner notes that the broadest reasonable interpretation of “same dialog group” includes determining if the utterances belong in the same context or have a similar user intent, as disclosed in para. 00229 and Fig. 12 of the instant application).

Regarding claim 13, MALHOTRA discloses:
A voice processing method, comprising: (systems and methods for determining whether current and previous queries should be merged so that a human can interact with a computer system in a conversational manner; paras. 0001 and 0002; queries may be voice inputs that are converted to strings of characters suing a voice-to-text algorithm; para. 0141)
a first utterance is transmitted to a server,; (Fig. 8, step 804, control circuitry 604 receives from a user a first query, where the first query may be a voice input; para. 0141; Figs. 6 and 7, the media guidance application may be a client-server based application, where a client device, e.g., user television equipment 702 or user equipment 704/706, contains a user input interface 610, and transmits queries to a remote server that contains control circuitry 604; paras. 0113, 0115, 0125, 0137)
transmitting a second utterance to the server; and (Fig. 8, step 804, control circuitry 604 receives from a user a second query, where the second query may be a voice input; para. 0141; Figs. 6 and 7, the media guidance application may be a client-server based application, where a client device, e.g., user television equipment 702 or user equipment 704/706, contains a user input interface 610, and transmits queries to a remote server that contains control circuitry 604; paras. 0113, 0115, 0125, 0137)
applying a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances (Fig. 8, step 806, first and second queries are mapped to input nodes 304 and 308, respectively, of a neural network; para. 0142; Fig. 3, input nodes 304 and 308 are inputs to the neural network, where each node is associated with a word from the query, e.g., nodes 304 are a first sentence vector and nodes 308 are a second sentence vector and the combination, e.g., concatenation, of nodes 304 and 308 are applied to the layers of the neural network; paras. 0068-0072) to a pre-trained learning model and receiving, from the server, an output from which at least one word having an overlapping meaning is removed. (Fig. 8, steps 808-816, determine, using the neural network, whether the first and second queries should be merged, e.g., “Where is the nearest pizza shop” and “one with at least 4 stars” may be merged, and “one” is removed because control circuitry 604 determines that “one” and “pizza shop” have redundant meanings; paras. 0143-0147; the neural network may be trained using a training data set; para. 0003, 0004, 0050).

However, MALHOTRA fails to explicitly teach:
if a stop signal is detected while a first utterance is transmitted to a server, temporarily pausing the transmission of the first utterance;
transmitting a second utterance to the server after a termination of a temporary pause state based on the stop signal; and

However, in a related field of endeavor, GARCIA discloses that a virtual assistant 800, via a microphone, detects beginning points and end points of utterances in a particular audio stream via an absence of voice activity.  (paras. 0255-0257).  For example, if a pause or silence exceeds a threshold period of time (e.g., 3 seconds), the virtual assistant 800 determines that the first audio stream has been completed.  (para. 0255). Later, a second audio stream may be received and the digital assistant can determine if the first audio stream and second audio stream belong to the same audio session.  (Figs. 12A-D, paras. 0287-0290).

The combination of MALHOTRA in view of GARCIA makes obvious:
if a stop signal is detected (GARCIA discloses that a virtual assistant 800 detects, via a microphone, the end point of an utterance or audio stream, e.g., a stop signal at the end of the first utterance; GARCIA, para. 0255; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 perform endpoint detection as disclosed in GARCIA to determine if a first query/audio stream has reached an endpoint, e.g., stop signal at end of transmission of a first utterance; MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, para. 0255) while a first utterance is transmitted to a server, (GARCIA discloses that the digital assistant may be divided into client-server portions, or that the entire digital assistant implementation can be performed by a server; GARCIA, para. 0203; MALHOTRA also discloses a client-server architecture where control circuitry 604 is implemented on a server and the user equipment device 600 is a thin client that just issues requests to the remote server; MALHOTRA, para. 0113; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 implemented in a client-server architecture, as disclosed in both MALHOTRA and GARCIA, where control circuitry 604 on the server is controlling user equipment 600; MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, paras. 0255-0257) temporarily pausing the transmission of the first utterance; (GARCIA discloses that a pause or silence exceeding a threshold such as a 3-second pause or silence, can be detected by a digital assistant via a microphone to determine if the utterances in a first audio stream have been paused, where the digital assistant’s determination of the endpoint detection, e.g., stop signal, and pause/silence exceeding a 3-second threshold is an indication of temporary pausing of the reception of the first utterance; GARCIA, paras. 0255-0257; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 pause transmitting and recording the first query if there is a pause or silence exceeding a certain threshold, such as 3 seconds; MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, paras. 0255-0257).
transmitting a second utterance to the server after a termination of a temporary pause state based on the stop signal; and (GARCIA, in Figs. 12A-B, discloses that a second audio stream is transmitted after the 3-second pause, e.g., temporary pause state that is triggered by the endpoint detection of the first audio stream, e.g., the stop signal, and that the digital assistant can determine if the first audio stream and second audio stream belong to the same audio session; paras. 0255, 0287-0290; the combination of MALHOTRA in view of GARCIA now has control circuitry 604 resume transmitting and recording the second query after a threshold period of delay (e.g., 3 seconds of pause or silence) and GARCIA discloses the threshold period of delay between utterances (e.g., 3 seconds of pause or silence); MALHOTRA, Fig. 8, step 804, para. 0141 with GARCIA, paras. 0255-0257, 0287-0290).

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of GARCIA to MALHOTRA.  As disclosed in GARCIA, one of ordinary skill would be motivated to make such a combination that utilizes periods of pauses/silence to automatically end an audio stream/query so that the digital assistant does not over-fill audio stream buffers, which may be time-limited (e.g., 30 second buffers).  (GARCIA, para. 0252).  One of ordinary skill would further be motivated to utilize the teachings of GARCIA so that each speech command can be detected through pause/silences instead of using a trigger word (e.g., “Hey Siri,”), which leads to a more natural form of communication that also is more efficient and reduces power usage and improves battery life.  (GARCIA, paras. 0010, 0011).

Regarding claim 17, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  MALHOTRA in view of GARCIA further discloses:
A computer readable recording medium on which a program for implementing (MALHOTRA, computer readable media may include instructions used by media guidance application; para. 0055) a method according to claim 1 is recorded (the combination of MALHOTRA in view of GARCIA discloses the voice processing method of claim 1, which may be implemented as instructions on a computer readable media, and therefore this claim 17 is rejected under the same grounds as set forth above with respect to claim 1).

Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over MALHOTRA in view of GARCIA and further in view of Colas et al., US 20200380030 A1, hereinafter referenced COLAS.

Regarding claim 6, MALHOTRA in view of GARCIA discloses the voice processing method of claim 5 (including the generating the output limitation recited in claim 1).  However, MALHOTRA in view of GARCIA fails to explicitly teach:
calculating a similarity between the first sentence vector and at least one of the plurality of sub-vectors constituting the second sentence vector; and 
if it is determined that the first sentence vector and the at least one of the plurality of sub-vectors have an overlapping meaning based on the similarity, generating an output from which at least one word having the overlapping meaning is removed.

However, in a related field of endeavor, COLAS discloses an in-application video navigation system where queries and video sentences may be encoded using a vector.  (paras. 0024, 0038, 0044-46).  Sentences within the audio are encoded in a vector space with span embeddings, and a distance between vectors can be used to determine if a span within the vector can answer a user query.  (para. 0044).

The combination of MALHOTRA in view of COLAS and GARCIA makes obvious:
calculating a similarity between the first sentence vector and at least one of the plurality of sub-vectors constituting the second sentence vector; and (COLAS discloses that a distance between vector embeddings is calculated to determine the best span to answer the query and that the distance calculation can be compared to a threshold distance to determine the correctness of a predicted response to a query; COLAS paras. 0044, 0050, 0065; MALHOTRA discloses that a neural network is trained to determine similarities in first and second queries to determine if the queries should be merged; MALHOTRA, paras. 0004, 0012, 0050; the combination of MALHOTRA in view of GARCIA and COLAS now uses the vector distance calculation, as applied to the first and second queries represented as vectors and sub-vectors (discussed with respect to claim 5) to determine if queries should be merged by comparing the distance to a threshold as disclosed in COLAS; MALHOTRA, paras. 0004, 0012, 0050 with COLAS paras. 0044, 0050, 0065).
if it is determined that the first sentence vector and the at least one of the plurality of sub-vectors have an overlapping meaning based on the similarity, generating an output from which at least one word having the overlapping meaning is removed. (MALHOTRA discloses in Fig. 8, steps 808-816, that the first and second queries are input into a neural network to determine if the first and second queries should be merged, e.g., “Where is the nearest pizza shop” and “one with at least 4 stars” are merged, and “one” is removed because control circuitry 604 determines that “one” and “pizza shop” have redundant meanings; paras. 0143-0147; the combination of MALHOTRA in view of COLAS now uses the vector distance calculation, as applied to the first and second queries represented as vectors and sub-vectors (discussed with respect to claim 5) to determine if queries should be merged and outputs a merged query in vector format; MALHOTRA, paras. 0004, 0012, 0050, 0143-0147 with COLAS paras. 0044, 0050, 0065).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the vector distance and distance threshold teachings of COLAS  to MALHOTRA and GARCIA.  As disclosed in COLAS, one of ordinary skill would be motivated to make such a combination because representing sentences as vectors allows the sentences to be searched to find an answer to a user query.  (COLAS, paras. 0050, 0059).  One of ordinary skill would further be motivated to use vectors as in COLAS because COLAS explains that vectors may be embedded with additional information and such information can be used as part of a distance measurement to determine the best answer to a query.  (COLAS, para. 0044).
	The examiner notes that MALHOTRA similarly pertains to responding to requests to user queries to perform searches. (MALHOTRA, paras. 0063 and 0064).

Regarding claim 7, MALHOTRA in view of GARCIA and COLAS discloses the voice processing method of claim 6.  MALHOTRA in view of GARCIA and COLAS further makes obvious:
wherein the at least one word having the overlapping meaning is a word corresponding to at least one of the plurality of sub-vectors that is calculated that the similarity is equal to or greater than a threshold. (MALHOTRA discloses in Fig. 8, steps 808-816, that removed words in first and second queries are removed/merged, e.g., “Where is the nearest pizza shop” and “one with at least 4 stars” are merged, and “one” is removed because control circuitry 604 determines that “one” and “pizza shop” have redundant meanings; paras. 0143-0147; COLAS discloses that a distance between vector embeddings is calculated to determine the best span to answer the query and that the distance calculation can be compared to a threshold distance to determine the correctness of a predicted response to a query; COLAS paras. 0044, 0050, 0065; the combination of MALHOTRA in view of COLAS now uses the vector distance calculation, as applied to the first and second queries represented as vectors and sub-vectors (discussed with respect to claim 5) to determine if queries should be merged, and determines if the distance (or the reciprocal of the distance) is equal to or greater than a distance threshold, as disclosed in COLAS; MALHOTRA, paras. 0004, 0012, 0050, 0143-0147 with COLAS paras. 0044, 0050, 0065).

Claims 8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over MALHOTRA in view of GARCIA and further in view of Basu, Saikat, et al. "Emotion recognition from speech using convolutional neural network with recurrent neural network architecture." 2017 2nd International Conference on Communication and Electronics Systems (ICCES). IEEE, 2017, pp. 333-336, hereinafter referenced as BASU.

Regarding claim 8, MALHOTRA in view of GARCIA discloses the voice processing method of claim 1.  However, MALHOTRA fails to explicitly teach:
wherein the first and second sentence vectors are extracted by a convolutional neural network (CNN).

	However, in a related field of endeavor, BASU pertains to using convolutional neural networks and recurrent neural networks to classify emotion in speech. (p. 333, section I).  The combination of MALHOTRA in view of GARCIA and BASU makes obvious:
wherein the first and second sentence vectors are extracted by a convolutional neural network (CNN). (BASU teaches that speech is analyzed and classified for emotion using a convolutional neural network, which sub-samples the input from a specified filter; BASU, pp. 334-335, section IV.A; MALHOTRA discloses that first and second queries are tokenized and the tokens are applied to the various input nodes in the input layer of an artificial neural network, e.g., input nodes 304 and 308; MALHOTRA, paras. 0017-0019, 0068-0072, 0080-0084; the combination of MALHOTRA in view of GARCIA and BASU now uses convolutional neural network layers of BASU in addition to the artificial neural network of MALHOTRA to sub-sample the input nodes 304 and 308 in MAHOLTRA to generate first and second sentence vectors, e.g., to help remove redundant words from the input query; MALHOTRA, paras. 0017-0019, 0068-0072, 0080-0084; BASU, pp. 334-335, section IV.A).

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the convolutional neural network teachings of BASU to MALHOTRA and GARCIA.  As disclosed in BASU, one of ordinary skill would be motivated to make such a combination to take advantage of sub-sampling the input using various pooling filters as taught by BASU.  (BASU, pp. 334-335, section IV.A).  One of ordinary skill would further be motivated to use the convolutional network layers as disclosed in BASU to take advantage of the classification provided by the CNN to reduce the dimensionality of the input.  (BASU, p. 333, section I).

Regarding claim 10, MALHOTRA in view of GARCIA discloses the voice processing method of claim 9.  However, MALHOTRA fails to explicitly teach:
wherein the learning model is a learning model based on a recurrent neural network (RNN).

However, in a related field of endeavor, BASU pertains to using convolutional neural networks and recurrent neural networks to classify emotion in speech. (p. 333, section I).  The combination of MALHOTRA in view of GARCIA and BASU makes obvious:
wherein the learning model is a learning model based on a recurrent neural network (RNN). (BASU discloses that a recurrent neural network is very effective for speech signals because it exploits temporal relations in the sequence of data; BASU, p. 335, section IV.B; MALHOTRA discloses that first and second queries are tokenized and the tokens are applied to the various input nodes in the input layer of an artificial neural network; MALHOTRA, paras. 0017-0019, 0080-0084; the combination of MALHOTRA in view of GARCIA and BASU now uses recurrent neural network layers of BASU in addition to the artificial neural network of MALHOTRA to train the combined model on longer sequences of data; MALHOTRA, paras. 0017-0019, 0080-0084; BASU, p. 335, section IV.B).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the recurrent neural network teachings of BASU to MALHOTRA and GARCIA.  As disclosed in BASU, one of ordinary skill would be motivated to make such a combination to take advantage of the temporal relations present in sequential speech and language data that recurrent neural networks are effective at analyzing.  (BASU, p. 335, section IV.B).  

Claims 14-16 are  rejected under 35 U.S.C. 103 as being unpatentable over MALHOTRA in view of GARCIA and further in view of Sengupta et al., U.S. Patent Publication 20220116247 A1, hereinafter referenced as SENGUPTA.

Regarding claim 14, MALHOTRA in view of GARCIA discloses the voice processing method of claim 13.  However, MALHOTRA fails to explicitly teach:
receiving, from a network, downlink control information (DCI) used to schedule the transmission of the first and second utterances; and
transmitting the first and second utterances to the network based on the DCI.

However, SENGUPTA discloses techniques for transmitting and receiving signals for cellular communication, including 5G communications.  (paras. 0004, 0006).  The combination of MALHOTRA in view of GARCIA and SENGUPTA makes obvious:
receiving, from a network, downlink control information (DCI) used to schedule the transmission of the first and second utterances; and (SENGUPTA discloses: Fig. 1, various transmission/reception points (TRPs), e.g., user equipment 101a/b and RAN nodes 111a/b, transmit and receive downlink control information (DCI) for scheduling channel transmissions by the TRPs; SENGUPTA, paras. 0004, 0005, 0007, 0014; MALHOTRA in Figs. 6 and 7 discloses transmissions using a communications network 714 for coupling user equipment devices to remote servers; MALHOTRA, paras. 0120-0123; MALHOTRA further discloses that the media guidance application may be a client-server based application, where a client device, e.g., user television equipment 702 or user equipment 704/706, contains a user input interface 610, and transmits queries to a remote server that contains control circuitry 604; MALHOTRA, paras. 0113, 0115, 0125, 0137; the combination of MALHOTRA in view of GARCIA and SENGUPTA now has the communications network 714 use the architecture of SENGUPTA, where user equipment 702 receives DCI to schedule transmissions, including the first and second queries, to remote server having control circuitry 604; MALHOTRA, paras. 0113, 0115, 0120-0123, 0125, 0137 and Figs. 6-7, with SENGUPTA, paras. 0004, 0005, 0007, 0014).
transmitting the first and second utterances to the network based on the DCI. (SENGUPTA discloses: Fig. 1, various transmission/reception points (TRPs), e.g., user equipment 101a/b and RAN nodes 111a/b, transmit and receive downlink control information (DCI) for scheduling channel transmissions by the TRPs; SENGUPTA, paras. 0004, 0005, 0007, 0014; MALHOTRA in Figs. 6 and 7 discloses transmissions using a communications network 714 for coupling user equipment devices to remote servers; MALHOTRA, paras. 0120-0123; MALHOTRA further discloses that the media guidance application may be a client-server based application, where a client device, e.g., user television equipment 702 or user equipment 704/706, contains a user input interface 610, and transmits queries to a remote server that contains control circuitry 604; MALHOTRA, paras. 0113, 0115, 0125, 0137; the combination of MALHOTRA in view of GARCIA and SENGUPTA now has the communications network 714 use the architecture of SENGUPTA, where user equipment 702 transmits the first and second queries to remote server having control circuitry 604 in view of the DCI scheduling; MALHOTRA, paras. 0113, 0115, 0120-0123, 0125, 0137 and Figs. 6-7, with SENGUPTA, paras. 0004, 0005, 0007, 0014).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the communication network teachings of SENGUPTA with MALHOTRA and GARCIA, for example, so that the communication network 714 in MALHOTRA is a 5G network (or other cellular network that provides Internet) as disclosed in SENGUPTA. (SENGUPTA, paras. 0006, 0008).  As disclosed in SENGUPTA, one of ordinary skill would be motivated to utilize the teachings of SENGUPTA because they apply to a wide variety of networks, such as 4G, LTE, 5G, 6G, IEEE 802.16 WMAN, and other networks. (SENGUPTA, para. 0006).  
The examiner notes that MALHOTRA discloses that the communication network 714 can be various types of networks, including the Internet, mobile phone networks, mobile voice or data networks (e.g., 4G or LTE networks), PSTN networks, or combinations thereof. (MALHOTRA, para. 0121).  Therefore, SENGUPTA explicitly benefits the same type of communication networks contemplated in MALHOTRA.

Regarding claim 15, MALHOTRA in view of GARCIA discloses the voice processing method of claim 14.  
SENGUPTA discloses techniques for transmitting and receiving signals for cellular communication, including 5G communications.  (paras. 0004, 0006).  The combination of MALHOTRA in view of GARCIA and SENGUPTA makes obvious:
performing an initial access procedure with the network based on a synchronization signal block (SSB); and (SENGUPTA discloses that a user equipment 101 uses an initial access procedure with respect to QCL-typeA and QCL-typeD to assume that DM-RS ports of PDSCH of a serving cell are quasi co-located with the SS/PBCH block (synchronization signal / physical broadcast channel) determined in the initial access procedure; paras. 0018, 0020, 0047; the combination of MALHOTRA in view of GARCIA and SENGUPTA now has the communications network 714 use the architecture of SENGUPTA; MALHOTRA, paras. 0113, 0115, 0120-0123, 0125, 0137 and Figs. 6-7, with SENGUPTA, paras. 0004, 0005, 0007, 0014, 0018, 0020, 0047)
transmitting the first and second utterances to the network via a physical uplink shared channel (PUSCH), (SENGUPTA discloses uplink shared channel (PUSCH) is a network channel for transmissions; SENGUPTA, paras. 0003, 0020, 0027, 0029; the combination of MALHOTRA in view of GARCIA and SENGUPTA now has the communications network 714 use the architecture of SENGUPTA, where user equipment 702 transmits the first and second queries to remote server having control circuitry 604 over a PUSCH as disclosed in SENGUPTA; MALHOTRA, paras. 0113, 0115, 0120-0123, 0125, 0137 and Figs. 6-7, with SENGUPTA, paras. 0003, 0020, 0027, 0029).
wherein the SSB and a demodulation reference signal (DM-RS) of the PUSCH are QCLed for QCL (quasi co-located) type D. (SENGUPTA discloses that a PUSCH is paired with a PDSCH for reception; para. 0053; SENGUPTA further discloses that user equipment 101 assumes that DM-RS ports of PDSCH of a serving cell are quasi co-located with the SS/PBCH block (synchronization signal / physical broadcast channel) with respect to QCL-typeD; paras. 0018, 0020, 0047; the combination of MALHOTRA in view of GARCIA and SENGUPTA now has the communications network 714 use the architecture of SENGUPTA; MALHOTRA, paras. 0113, 0115, 0120-0123, 0125, 0137 and Figs. 6-7, with SENGUPTA, paras. 0014, 0018, 0020, 0047, 0053)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the communication network teachings of SENGUPTA with MALHOTRA and GARCIA, for example, so that the communication network 714 in MALHOTRA is a 5G network (or other cellular network that provides Internet) as disclosed in SENGUPTA. (SENGUPTA, paras. 0006, 0008).  As disclosed in SENGUPTA, one of ordinary skill would be motivated to utilize the teachings of SENGUPTA because they apply to a wide variety of networks, such as 4G, LTE, 5G, 6G, IEEE 802.16 WMAN, and other networks. (SENGUPTA, para. 0006).  
The examiner notes that MALHOTRA discloses that the communication network 714 can be various types of networks, including the Internet, mobile phone networks, mobile voice or data networks (e.g., 4G or LTE networks), PSTN networks, or combinations thereof. (MALHOTRA, para. 0121).  Therefore, SENGUPTA explicitly benefits the same type of communication networks contemplated in MALHOTRA.

Regarding claim 16, MALHOTRA in view of GARCIA and SENGUPTA discloses the voice processing method of claim 15.  MALHOLTRA in view of GARCIA and SENGUPTA further makes obvious:
controlling a communication module to transmit the first and second utterances (MALHOTRA, Figs. 6 and 7, the media guidance application may be a client-server based application, where a client device, e.g., user television equipment 702 or user equipment 704/706, has communications circuitry such as an Ethernet card, a wireless modem, a cable model, DSL modem, or other similar circuitry for communicating over network 714, and transmits first and second queries to a remote server that contains control circuitry 604, where the control circuitry 604 controls the user interface 610 on the user equipment; MALHOTRA, paras. 0108, 0113, 0115, 0125, 0137) to an AI processor included in the network; (MALHOTRA, first and second queries are tokenized and the tokens are applied to the various input nodes in the input layer of an artificial neural network, there the neural network is implemented by a media guidance application running on control circuitry 604, which may be over a network on a remote server, e.g., an AI processor; MALHOTRA, paras. 0017-0019, 0080-0084, 0113, 0139, 0140; the examiner notes that the broadest reasonable interpretation of AI processor includes AI processing in a cloud or server environment as disclosed in paras. 00149-00150 in the instant specification) and 
controlling the communication module to receive AI processing information from the AI processor, (MALHOTRA, Figs. 6 and 7, the media guidance application may be a client-server based application, where a client device, e.g., user television equipment 702 or user equipment 704/706, has communications circuitry such as an Ethernet card, a wireless modem, a cable model, DSL modem, or other similar circuitry for receiving data over network 714, including data via the neural network is implemented by a media guidance application running on control circuitry 604, which may be over a network on a remote server, e.g., an AI processor, where the control circuitry 604 controls the user interface 610 on the user equipment; MALHOTRA, paras. 0017-0019, 0080-0084, 0108, 0113, 0115, 0125, 0137, 0139, 0140).
wherein the AI processing information is voice information synthesized (GARCIA discloses that a digital assistant may include a speech synthesis module configured to output synthesized speech; GARCIA, paras. 0203, 0246) based on the output from which the at least one word having the overlapping meaning is removed. (MALHOTRA, Fig. 8, steps 808-816, the first and second queries are input into a neural network to determine whether the first and second queries should be merged, e.g., “Where is the nearest pizza shop” and “one with at least 4 stars” may be merged, and “one” is removed because control circuitry 604 determines that “one” and “pizza shop” have redundant meanings; MALHOTRA, paras. 0143-0147; the combination of MALHOTRA in view of GARCIA and SENGUPTA now uses MALHOTRA control circuitry 604 to utilize speech synthesis modules to synthesize the merged output to speech, and then transmits the speech information over the network, as SENGUPTA discloses that voice connectivity is supported; SENGUPTA, paras. 0007, 0010).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings in GARCIA pertaining to speech synthesis to MALHOTRA and SENGUPTA.  As disclosed in GARCIA, one of ordinary skill in the art would be motivated to make such a combination because GARCIA discloses that synthesis is another form that the digital assistant can use to communicate with the user. (paras. 0246-0247).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20180242219 A1 (Deluca et al.) discloses functions of an assistant device during an active telephone call.
US 20190333501 A1 (Kurtz et al.) discloses controlling source tracking and delaying beamforming in a microphone array system. A source tracker may continuously determine a direction of an audio source. A source tracker controller may pause the source tracking of the source tracker if a user may continue to speak to the system. The source tracker controller may resume the source tracking of the source tracker if the user may cease to speak to the system, or when one or more pause durations have been reached.
US 10923122 B1 (Rozycki et al.) discloses pausing automatic speech recognition.  A speech interface device is configured to “pause” an automatic speech recognition (ASR) component executing on the speech interface device after the ASR component has ingested audio data representing user speech, and while other local components, including a natural language understanding (NLU) component, process data corresponding to the user speech.
US 11062700 B1 (Azimi et al.) discloses a method comprising receiving first data representative of a query received by an electronic device. Device characteristic data indicative of at least one characteristic of the electronic device is received. It is determined, using the device characteristic data, that the electronic device is authorized to access a first portion of at least one knowledge graph, which is an access-controlled portion. The at least one knowledge graph also includes a second portion which is a non-access-controlled portion. The first data is sent to at least the first portion of the at least one knowledge graph. Second data is received from the first portion of the at least one knowledge graph. The second data is representative of an answer to the query. Answer data representative of the answer to the query is generated using the second data.
US 20190362712 A1 (Karpukhin) discloses a method employing a neural network configured to determine the intent of the spoken user utterance by inputting into the NN the enhanced feature vectors. The NN has been trained to estimate a probability of the intent being of a given type. The method includes determining at least one speech unit where each speech unit has textual data representative of a word or pause and has a corresponding segment of the digital audio signal. 
US 7203643 B2 (Garudadri) discloses an advanced feature extraction (AFE) module that extracts features from a speech signal, and a voice activity detection (VAD) module that detects voice activity within a speech signal. The combined results from the VAD module and feature extraction module are provided in an efficient manner to a remote device, such as a server, in the form of advanced front end features, thereby enabling the server to process speech segments free of silence regions. Various aspects of efficient speech segment transmission are disclosed.
US 10339918 B2 (Hofer et al.) discloses an adaptive speech endpoint detector.  Fig. 5 discloses detecting a pause duration and determining if such duration exceeds a threshold, and waiting for another utterance to start if the pause duration exceeds the threshold.  Col. 7, line 41 – col. 8, line 3.
US 20140222430 A1 (Rao) discloses a  system and method for detecting one or more segments of desired speech utterances from an audio stream using timings of events from other modes that are correlated to the timings of the desired segments of speech. The redundant information from other modes results in a highly accurate and robust utterance detection.  Fig. 5 depicts a scenario where a user may activate a “speak” button to begin the reception and transmission of an utterance.  Paras. 0047-0050.
US 20090222265 A1 (Iwamiya et al.) discloses a voice recognition apparatus which recognizes an inputted voice and outputs recognition according to the recognition result.  In Fig. 1, the input timeout time control means 16 adjusts a timeout time to stop the receipt of a voice input according to the ambient noise. (para. 0025).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL C. LEE/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655