Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1.	This action is responsive to Application no.16/906,525.  All claims have been examined and are currently pending.
Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
3.	The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 112
4.	The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

5.	Claims 8-9 and 16 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention.
	Regarding claim 8 the limitations recite:

18PCT460/US40receiving, by the client, the first information, the second information, and the third information that correspond to the second portion of audio information, wherein the first information, the second information, and the third information are information obtained based on the first portion of audio information that has been currently received”.
The claim recites the first information, the second information, and the third information corresponding to second portion and the first portion of the audio information, and thus does not specifically point out the metes and bounds of the claim.
Examiner interprets the claim language where an audio input is received, and is separated (spliced) into multiple segments (second and third portion of audio information), and one of the segments corresponds to the first portion of audio information for which first information, the second information, and the third information is provided for.

Claim Rejections - 35 USC § 102
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

8.	Claims 1, 6-13, 15-16, 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Garg et al (2018/0330730).

Regarding claim 1 Garg teaches A method for processing audio information (abstract; 8-10 methods, non-transitory computer readable media, devices), comprising: 
detecting, by a client, a segment of audio information being received, a first portion of audio information in the segment of audio information having been currently received on the client (abstract; fig 1, 8D; 8: at an electronic device receiving an utterance from a user; 35); 
obtaining, by the client, first information, second information, and third information based on the first portion of audio information that has been currently received (figures 8H-L – obtaining and displaying information associated with user input), 
the first information comprising text information corresponding to the first portion of audio information (212 STT processing, speech input processed, produces recognition results containing text string; 250), 
212 multiple candidate text representations, n-best; 258 alternative text affordances), and 
the third information comprising information to be pushed to the client, which is obtained based on a keyword in the first information (fig 8H; 256: alternative intent affordances); and 
displaying, by the client, the first information, the second information, and the third information (fig 8 H-L
where Garg teaches receiving user speech input and obtaining multiple types of information related to recognized input to be presented to user).  


Regarding claim 6 Garg teaches The method according to claim 1, wherein after the displaying, by the client, the first information, the second information, and the third information, the method further comprises: 
requesting, by the client, a page corresponding to the third information from a first server corresponding to the third information in response to detecting a second operation performed on the third information (8J-M; 56: touch screen; 265 user device displays a result; result includes an image that is retrieved from a remote website, operation is based on detecting a tap input); 
receiving, by the client, the page returned by the first server (8K; 265); and 
displaying, by the client, the page (8K; 265 – obtaining and displaying additional data when user chooses an option/third information).  

Regarding claim 7 Garg teaches The method according to claim 6, wherein the third information comprises: 
content information and one or more pieces of prompt information, wherein the content information is used for indicating content of the information to be pushed to the client, and the requesting the page corresponding to the third information from the first server corresponding to the third information in response to detecting the second operation performed on the third information (fig 8F-N) comprises: 
in response to detecting a first sub-operation performed on the content information, requesting, by the client, a first page corresponding to the content information from the first server, the second operation comprising the first sub-operation (8K; 265); and 
in response to detecting a second sub-operation performed on target prompt information in the one or more pieces of prompt information, requesting, by the client, a second page corresponding to the target prompt information from the first server, the second operation comprising the second sub- operation (8F-N; 265); and 
wherein the second page and the first page are different pages belonging to a same field, or the second page and the first page belong to different fields (8F-N; 265
where the claim currently appears to recite presenting content information and prompt information, both allowing for a user to select (using tap (sub-operations)) and access corresponding information; the figures and corresponding paragraphs of Garg teach presenting content and prompt (additional options for obtaining more data) 

Regarding claim 8 Garg teaches The method according to claim 1, wherein the obtaining the first information, the second information, and the third information based on the first portion of audio information that has been currently received comprises: 
from a beginning of receiving the segment of audio information, transmitting, by the client, an information request corresponding to a current time interval every target time interval, wherein the information request carries a second portion of audio information in the segment of audio information, the information request is used for requesting the first information, the second information, and the third information that correspond to the second portion of audio information, and the second portion of audio information is audio information received within the current time interval (figs 8D-J; 35; 211; 249; 260); and 
18PCT460/US40receiving, by the client, the first information, the second information, and the third information that correspond to the second portion of audio information, wherein the first information, the second information, and the third information are information obtained based on the first portion of audio information that has been currently received, the first portion of audio information is audio information obtained by splicing the second portion of audio information and a third portion of audio information in chronological order, and the third portion of audio information is audio information, in the segment of audio information, received before the current time interval (see 112 above
Figs 8D-J; 35: user in a continuous dialogue involving multiple exchanges over an extended period of time; 211; 249: user audio input (or a portion); performs speech recognition (e.g., using STT processing module 730) on the received user audio input 814 (or the portion thereof) to determine a plurality of (or at least one) candidate text representations of the user audio input; 260 term “sharks” – where Garg can receive speech input and process the stream or a portion, and obtain the first, second, and third information for the portion of the input that was processed).  

Regarding claim 9 Garg teaches The method according to claim 8, wherein the transmitting the information request corresponding to the current time every target time interval comprises: 
from the beginning of receiving the segment of audio information, transmitting, by the client, the information request corresponding to the current time interval every target time interval until the receiving the segment of audio information ends (Figs 8D-J; 35: user in a continuous dialogue involving multiple exchanges over an extended period of time; 211; 249 – processing currently (within a time interval) received audio); and 
in response to the receiving the segment of audio information ending within a last target time interval, transmitting an information request corresponding to the last target time interval within the last target time interval, wherein the last target time interval is a target time interval within which the receiving the segment of audio information ends (Figs 8D-J; 35: user in a continuous dialogue involving multiple exchanges over an extended period of time; 211; 249 – processing currently (within a time interval) received audio).  

Regarding claim 10 Garg teaches The method according to claim 1, wherein the segment of audio information comprises voice information, a song, or audio information in a video (79: voice input).  

Regarding claim 11 Garg teaches The method according to claim 1, wherein the detecting the segment of audio information being received comprises: determining that the segment of audio information is being received by detecting a continuous touch operation on the client (246-247; 253; 265 – touch/tap input/operation).  

Regarding claim 12 Garg teaches The method according to claim 1, the method further comprises: 
detecting an editing operation performed on the first information (253: correction); and 
in response the editing operation being completed, displaying fourth information corresponding to a target editing result of the editing operation (258: correction; displays a set of text affordances; 268; fig 8N-O).  

Regarding claim 13 Garg teaches The method according to claim 12, wherein the method further comprises: 
obtaining feedback information corresponding to the fourth information based on the displayed fourth information (figs N-O; 255); and 
displaying the feedback information (figs N-O; 255).  

Regarding claim 15 Garg teaches A method for processing audio information (abstract; 8-10 methods, non-transitory computer readable media, devices), comprising: 
receiving, by a server, a first portion of audio information transmitted by a client, the client detecting that a segment of audio information is being received, and the first portion of audio information being a portion of audio information in the segment of audio information that has been currently received on the client (abstract; fig 1, 8D; 8: at an electronic device receiving an utterance from a user; 35; 249 portion;
fig 1, 7A; 199; 210 digital assistant module 726, the digital assistant can perform converting speech input into text; identifying a user's intent expressed in a natural language input received from the user; actively eliciting and obtaining information needed to fully infer the user's intent (e.g., by disambiguating words, games, intentions, etc.); determining the task flow for fulfilling the inferred intent; and executing the task flow to fulfill the inferred intent.); 
converting, by the server, the first portion of audio information into first information, and transmitting the first information to the client, the first information comprising text information corresponding to the first portion of audio information (fig 1, 7A; 199; 210; 212 STT); 
editing, by the server, the first information to obtain second information, the second information comprising information that meets a target condition and corresponds to the first information (212; 258); 
fig 8H; 257); and 
transmitting, by the server, the second information and the third information to the client (fig 1;7A; 199; 210; fig 8H)
Rejected for similar rationale and reasoning as claim 1 where claim 15 further discusses the processing on the server, which is taught by Garg (fig 1; 7A; 199; 210).

Regarding claim 16 Garg teaches The method according to claim 15, wherein the converting the first portion of audio information into the first information comprises: 
obtaining, by the server, a second portion of audio information and a third portion of audio information, wherein the second portion of audio information is audio information transmitted by the client to a second server corresponding to the client within a current time interval, and the third portion of audio information is audio information received by the second server before the current time interval; 
splicing, by the server, the second portion of audio information and the third portion of audio information in chronological order, to obtain the first portion of audio information; and 
converting, by the server, the first portion of audio information into text to obtain the first information.  
Recites limitations similar to claim 8 and is rejected for similar rationale and reasoning


Regarding claim 20 Garg teaches A non-transitory machine-readable media, having instructions stored on the machine- readable media, the instructions configured to, when executed, cause a machine to: 
detect a segment of audio information being received, a first portion of audio information in the segment of audio information having been currently received on a client; 
obtain first information, second information, and third information based on the first portion of audio information that has been currently received, the first information comprising text information corresponding to the first portion of audio information, the second information comprising information that meets a target condition and corresponds to the first information, and the third information comprising information to be pushed to the client, which is obtained based on a keyword in the first information; and 
display the first information, the second information, and the third information.
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning.



Claim Rejections - 35 USC § 103
9.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


10.	Claims 2-5, 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Garg et al (2018/0330730) in view of Faisman et al (2006/0161434).

Regarding claim 2 Garg teaches The method according to claim 1, wherein after the displaying, by the client, the first information, the second information, and the third information, the method further comprises: 
obtaining, by the client, response information corresponding to the second information in response to detecting a first operation performed on the second information (fig 8J, 8K. 8L; 265: user device displays a result; result includes an image that is retrieved from a remote website); and 
displaying, by the client, the response information (fig 8K), 
but does not specifically teach where Faisman teaches
wherein the target condition comprises at least one of a target syntax rule or a target semantic rule (fig 1; paragraph 8: methods and systems for converting speech to text, while improving the language properties of the output by correcting speaker errors. Speaker errors may be semantic (i.e., relating to individual words and their meaning) or syntactic (i.e., relating to the grammatical arrangement of words in a sentence).; 9: speech may be improved by applying a set of correction rules);
Garg already teaches editing and correction of STT results and it would have been obvious to one of ordinary skill in the art before the effective filing date to 

Regarding claim 3 Garg does not specifically teach where Faisman teaches The method according to claim 2, wherein the information that meets the target condition and corresponds to the first information is obtained by: 
performing error correction on a syntax of the first information based on the target syntax rule (8-9); 
rewriting the first information based on an error correction result to obtain rewrite information (53 output is improved text); and 
supplementing the rewrite information based on the target semantic rule (fig 1; 8 semantic; 9; 48: semantic errors may be corrected).  
Rejected for similar rationale and reasoning as claim 2

Regarding claim 4 Garg does not specifically teach where Faisman teaches The method according to claim 2, wherein the target syntax rule is obtained through a training process of deep learning or is a manually set rule (41; 46 – speaker profile, generated automatically or with speaker assistance, error model).  
Rejected for similar rationale and reasoning as claim 2
	
Regarding claim 5 Garg teaches The method according to claim 2, wherein the first operation comprises a tap operation (fig 8J; 56 touch screen; 265 operation is based on detecting a tap input).  


Regarding claim 17 Garg does not specifically teach where Faisman teaches The method according to claim 16, wherein the editing the first information to obtain the second information comprises: 
performing, by the server, error correction on syntax of the first information based on a target syntax rule; 
rewriting, by the server, the first information based on an error correction result to obtain rewrite information; 
supplementing, by the server, the rewrite information based on a target semantic rule to obtain information that meets the target condition and corresponds to the first information; and 
determining the information that meets the target condition and corresponds to the first information as the second information.  
Recites limitations similar to claim 3 and is rejected for similar rationale and reasoning, where Garg (as discussed in claim 15) teaches digital assistant performing correction, which can be performed on a server 


Regarding claim 18 Garg does not specifically teach where Faisman teaches The method according to claim 17, wherein the supplementing the rewrite information to obtain the information that meets the target condition and corresponds to the first information comprises: 
abstract; 36; 39 distributed – identifying errors in speech input); 
obtaining, by the server, a first tag corresponding to the first keyword, wherein the first tag is used for indicating a field to which the first keyword belongs (41; 43 -categorizing input); 
obtaining, by the server, a second target word matching the first target word from a lexicon corresponding to the first tag, wherein the second target word is a word with complete semantics (0046] Having retrieved an appropriate language error model, the language improver uses this model to identify and correct speech errors); and 
supplementing, by the server, the rewrite information with the second target word to obtain the information that meets the target condition and corresponds to the first information
([0046] Having retrieved an appropriate language error model, the language improver uses this model to identify and correct speech errors at a language improving step 76. As described in FIG. 1 hereinabove, this process may involve correction of semantic errors (phrases containing erroneous semantic content, e.g., "green sky" instead of "blue sky"), syntax errors (such as errors in grammar and word order typically made by non-native speakers of a language), as well as spelling errors, pronunciation errors and errors introduced by speech-to-text converter 24.).  
Rejected for similar rationale and reasoning as claims 2 and 3


extracting, by the server, a second keyword from the first information (server fig 1; 7A; 199; 210; 249: user audio input; 250 - word from text representation of input); 
obtaining, by the server, [historical] session information of a session to which the segment of audio information belongs (81: context; 228 ontology; 229: user data includes user specific information; 232; 235; 238); 
determining, by the server, a second tag corresponding to the session based on the second keyword and the [historical] session information, wherein the second tag is used for indicating a field to which the session belongs (81: context; 228 ontology; 229: user data includes user specific information; 232; 235; 238); 
18PCT460/US43extracting, by the server, information matching the second keyword from an information base corresponding to the second tag (figs 8H-M; 265; 267-268 – finding/extracting corresponding data from an additional database/information base); and 
determining, by the server, the information matching the second keyword as the third information, wherein the information base stores information of the field indicated by the second tag
(81: context; 228 ontology; 229: user data includes user specific information; 232; 235; 238 – user query, domain, performing action/complete a task requested in user input or provide an informational answer);
41: speaker profile, data stored during past use of system by this speaker, user history model database).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate historical session information (with the already existing speaker/user information of Garg) for an improved system.  



11.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Garg et al (2018/0330730) in view of Canton et al (2017/0124045).

Regarding claim 14 Garg teaches The method according to claim 1, wherein the displaying the first information, the second information, and the third information comprises: 
displaying a first session box[bubble] corresponding to the first information, a second session box[bubble] corresponding to the second information, and a third session box[bubble] corresponding to the third information (fig 8H-N)
but does not specifically teach where Canton teaches a bubble (162 a text transcription of the audio may be provided in a bubble).
.  


Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: See PTO-892.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic 

/SHAUN ROBERTS/
Primary Examiner, Art Unit 2655