DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1, 3-11 & 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hall et al (US 20180114522 A1 hereinafter, Hall ‘522) in combination with Lee (US 20200160838 A1 hereinafter, Lee ‘838).
Regarding claim 11; Hall ‘522 discloses a speech processing apparatus (Fig. 1, Automated Assistant i.e. Automated Assistant that performs TTS. Paragraph 0008), 
comprising: 
(Fig. 6, Processor 610) configured to: obtain context information from a speech signal of a user using a neural network-based encoder (Fig. 3, Context(s) Encoding 320, 325 & 330 i.e. The method may include receiving one or more streams of input by one or more decoders implemented on a computing device. A context vector can be generated by the one or more encoders. The context vector can be decoded by a decoding mechanism implemented on the computing device. The decoded context vectors can be fed into a neural network implemented on the computing device; and an audio file can be output by the neural network. Paragraph 0006); 
determine intent information of the speech signal based on the context information (i.e. Parser 220 may interpret a user utterance into intentions. State manager 260 allows the system to infer what objects a user means when he or she uses a pronoun or generic noun phrase to refer to an entity. The state manager may track “salience”—that is, tracking focus, intent, and history of the interactions. Paragraphs 0026 & 0029);
determine, based on the context information, attention information corresponding to a segment included in the speech signal (i.e. Decoder 230 may decode an utterance into an equivalent training sentence, trading segments, or other content that may be easily parsed by parser 220. Attention vectors may then be computed at step 440.  The attention vector is generated during a decoding phase of the neural network operation. Generating the attention vector may include computing attention scores, attention distribution, and an attended context vector. Paragraphs 0027 & 0047);
and determine, based on the attention information, a segment value of the segment (i.e. Decoder 230 may decode received utterances into equivalent language that is easier for parser 220 to parse. For example, decoder 230 may decode an utterance into an equivalent training sentence, trading segments, or other content that may be easily parsed by parser 220. Paragraph 0027).
Examiner reasonably believes that Hall ‘522 discloses wherein a portion of the context information identified as corresponding to the segment. Paragraph 0027 of Hall ‘522 teaches that Decoder 230 may decode received utterances into equivalent language that is easier for parser 220 to parse. For example, decoder 230 may decode an utterance into an equivalent training sentence, trading segments, or other content that may be easily parsed by parser 220. Although, Examiner reasonably believes that the Hall ‘522 discloses this limitation, Examiner cites Lee ‘838 to cure any deficiency of Hall ‘522.   
(i.e. The determining of the first score may include extracting a feature value from the input speech using a neural network-based encoder, and determining a first score of each of the candidate texts from the extracted feature value using a neural network-based decoder. In operation 260, the speech recognition apparatus determines the target candidate text selected in operation 250 to be a target text corresponding to a portion of the speech input. In an example, the speech recognition apparatus may repetitively perform operations 220 through 260 to sequentially determine target texts respectively corresponding to a portion of the speech input, and determine a text corresponding to an entirety of the speech input by combining the determined target texts. See Abstract and Paragraphs 0013 & 0076).
Hall ‘522 and Lee ‘838 are combinable because they are from same field of endeavor of speech systems (Lee ‘838 at “Background”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Hall ‘522 by adding by recognizing, using a decoder a portion of the context information identified as corresponding to the segment. as taught by Lee ‘838. The motivation for doing so would have been advantageous because speech recognition systems have progressed to the point where humans can interact with computing devices using speech. As a result, these systems are needed to identify the words spoken by a human user based on the various qualities of a received audio input. Therefore, it would have been obvious to combine Hall ‘522 with Lee ‘838 to obtain the invention as specified.

Regarding claim 13; Hall ‘522 discloses wherein the one or more processors are further configured to: determine type information of the segment included in the speech signal based on the context information (i.e. Parser 220 may interpret a user utterance into intentions. State manager 260 allows the system to infer what objects a user means when he or she uses a pronoun or generic noun phrase to refer to an entity. The state manager may track “salience”—that is, tracking focus, intent, and history of the interactions. Paragraphs 0026 & 0029);
(i.e. Decoder 230 may decode received utterances into equivalent language that is easier for parser 220 to parse. For example, decoder 230 may decode an utterance into an equivalent training sentence, trading segments, or other content that may be easily parsed by parser 220. The equivalent language is provided to parser 220 by decoder 230. Paragraph 0027).

Regarding claim 14; Hall ‘522 discloses wherein the one or more processors are further configured to determine, based on the intent information of the speech signal, the segment value of the segment by recognizing the portion (i.e. Parser 220 may interpret a user utterance into intentions. State manager 260 allows the system to infer what objects a user means when he or she uses a pronoun or generic noun phrase to refer to an entity. The state manager may track “salience”—that is, tracking focus, intent, and history of the interactions. Paragraphs 0026 & 0029).

Regarding claim 15; Hall ‘522 discloses wherein type information of the segment is determined based on the context information and the intent information of the speech signal (i.e. Parser 220 may interpret a user utterance into intentions. State manager 260 allows the system to infer what objects a user means when he or she uses a pronoun or generic noun phrase to refer to an entity. The state manager may track “salience”—that is, tracking focus, intent, and history of the interactions. Paragraphs 0026 & 0029);

Regarding claim 16; Hall ‘522 discloses wherein the one or more processors are further configured to determine the intent information of the speech signal based on type information of one or more segments included in the speech signal (i.e. Parser 220 may interpret a user utterance into intentions. State manager 260 allows the system to infer what objects a user means when he or she uses a pronoun or generic noun phrase to refer to an entity. The state manager may track “salience”—that is, tracking focus, intent, and history of the interactions. Paragraphs 0026 & 0029).
Regarding claim 17; Hall ‘522 discloses wherein respective segments included in the speech signal are sequentially identified by a segment classifier provided in the form of a decoder 

Regarding claim 18; Hall ‘522 discloses wherein the one or more processors are further configured to sequentially determine segment values of the respective segments in response to the segments being sequentially identified by the segment classifier provided in the form of a decoder (i.e. The encoder receives an input and encodes it into vectors. The encoder applies a sequence of transformations to the input and generates a vector representing the entire sentence. The decoder takes the encoding and outputs an audio file, which can include compressed audio frames. One representation of the speech signal is simply the value of a waveform at each time, where time is represented in steps of 1/8000 or 1/160000 of a second. Paragraphs 0005 & 0055).

Regarding claim 19; Hall ‘522 discloses wherein the one or more processors are further configured to perform an operation corresponding to the intent information based on the segment value of the segment and the type information (i.e. Parser 220 may interpret a user utterance into intentions. State manager 260 allows the system to infer what objects a user means when he or she uses a pronoun or generic noun phrase to refer to an entity. The state manager may track “salience”—that is, tracking focus, intent, and history of the interactions. Paragraphs 0026 & 0029).

Regarding claim 20; Hall ‘522 discloses a memory storing instructions that when executed by the one or more processors configures the one or more processors to perform the obtaining of the context information, the determining of the intent information, the determining of the attention information, and the determining of the segment value of the segment  (i.e. Mass storage device 630, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 610.  Paragraph 0076).
Regarding claim 1; Claim 1 contains substantially the same subject matter as claim 11. Therefore, claim 1 is rejected on the same grounds as claim 11.
Regarding claim 3; Claim 3 contains substantially the same subject matter as claim 13. Therefore, claim 3 is rejected on the same grounds as claim 13.
Regarding claim 4; Claim 4 contains substantially the same subject matter as claim 14. Therefore, claim 4 is rejected on the same grounds as claim 14.
Regarding claim 5; Claim 5 contains substantially the same subject matter as claim 15. Therefore, claim 5 is rejected on the same grounds as claim 15.
Regarding claim 6; Claim 6 contains substantially the same subject matter as claim 16. Therefore, claim 6 is rejected on the same grounds as claim 16.
Regarding claim 7; Claim 7 contains substantially the same subject matter as claim 17. Therefore, claim 7 is rejected on the same grounds as claim 17.
Regarding claim 8; Claim 8 contains substantially the same subject matter as claim 18. Therefore, claim 8 is rejected on the same grounds as claim 18.
Regarding claim 9; Claim 9 contains substantially the same subject matter as claim 19. Therefore, claim 9 is rejected on the same grounds as claim 19.
Regarding claim 10; Hall ‘522 discloses a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speech processing method of claim 1 (i.e. Mass storage device 630, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 610.  Paragraph 0076).


Allowable Subject Matter
Claims 2 & 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Examiners Statement of Reasons for Allowance
The cited reference (Hall ‘522) teaches wherein a system eliminates alignment processing and performs TTS functionality using a new neural architecture. The neural architecture includes an encoder and a decoder. The encoder receives an input and encodes it into vectors. The encoder applies a sequence of transformations to the input and generates a vector representing the entire sentence. The decoder takes the encoding and outputs an audio file, which can include compressed audio frames.
The cited reference (Lee ‘838) teaches wherein a speech recognition method and apparatus are disclosed. The speech recognition method includes determining a first score of candidate texts based on an input speech, determining a weight for an output of a language model based on the input speech, applying the weight to a second score of the candidate texts output from the language model to obtain a weighted second score, selecting a target candidate text from among the candidate texts based on the first score and the weighted second score corresponding to the target candidate text, and determining the target candidate text to correspond to a portion of the input speech.
The cited references fail to disclose wherein the one or more processors are further configured to determine whether the speech signal includes a plurality of segments, and determine segment values of the segments by recognizing, in parallel using a plurality of decoders, portions Claims 2& 12 as allowable subject matter. 

Relevant Prior Art References Not Relied Upon
1.	Hoffmeister (US 10,332,508 B1) - A speech recognition method and apparatus are disclosed. The speech recognition method includes determining a first score of candidate texts based on an input speech, determining a weight for an output of a language model based on the input speech, applying the weight to a second score of the candidate texts output from the language model to obtain a weighted second score, selecting a target candidate text from among the candidate texts based on the first score and the weighted second score corresponding to the target candidate text, and determining the target candidate text to correspond to a portion of the input speech.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARCUS T. RILEY, ESQ.
Primary Examiner
Art Unit 2677



/MARCUS T RILEY/Primary Examiner, Art Unit 2677