DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed May 20, 2022.  Claims 1, 3-9, 11-16, and 18-20 are pending in the application.  As such, Claims 1, 3-9, 11-16, and 18-20 have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
3. 	Applicant’s arguments and amendments in the Amendment and RCE filed May 20, 2022 (herein “Amendment”) with respect to rejections of independent Claims 1, 9, and 16 under 35 U.S.C. 103 have been fully considered, but do not overcome the rejections of record.  Claims 5-8, 13-15, and 18-20 are indicated as including allowable subject matter.
For clarity of the record, the additional following remarks are made: on page 20 of the Amendment, Applicant sets forth that “the amendments to the claim 1 clarify that the target audio events are determined as optimal audio data sequences and are arranged in consecutive moments,” and then that Tsai does not “perform any such function” and “does not arrange events into consecutive moments as recited in claim 1.” However, the limitations at issue of claim 1 which include “such that types of first target audio events among a plurality of target audio events determined as a first optimal audio data sequence are arranged in first consecutive moments” (emphasis added) under a broadest reasonable interpretation do not require an “arranging” step. Merely, they recite an intended result. Accordingly, as detailed further below in the rejection rationale, Tsai teaches first and second moments in a series, thus teaching an arrangement of first and second consecutive moments.
	Therefore, in view of the above, while all of Applicant’s amendments and arguments have been fully considered, they are not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1, 3, 9, 11, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Parada San Martin et al. (U.S. Patent App. Pub. No. 2016/0180838, hereinafter “Parada”) in view of Du et al. (Chinese Patent App. Pub. No. 106157953, hereinafter “Du”), Kato et al. (U.S. Patent No. 6,185,526, hereinafter “Kato”), Zhang et al. (U.S. Patent App. Pub. No. 2019/0103124, hereinafter “Zhang”), and Tsai et al. (U.S. Patent App. Pub. No. 2016/0335499, hereinafter “Tsai”).
With regard to Claim 1, Parada teaches:
“An audio event detection method, the method comprising:
performing a framing processing on an audio to obtain audio data for each of a plurality of time periods in the audio and respectively extracting a specified feature vector from the audio data for each of the plurality of time periods; (Paragraph 81 of Parada describes that the audio data is broken into frames (step 604) and a feature vector is generated for each frame (step 606))
inputting the specified feature vector of the audio data of each of the plurality of time periods to a Recurrent Neural Network/Bidirectional Recurrent Neural Network (RNN/BI- RNN) model, to obtain a posterior probability of each pre-set audio event in the audio data for each of the plurality of time periods; (Paragraph 82 of Parada describes that the feature vectors are input into an LSTM, and paragraphs 61 and 62 of Parada describe that the output of the LSTM is the posterior probabilities of an audio event in the corresponding frame.)
obtaining, for each of the plurality of time periods, a target audio event of the audio data according to the posterior probability of each pre-set audio event in the audio data and a pre-set audio decoding algorithm.” (Paragraphs 61, 66, and 67 of Parada describe that the output value indicates the probability that the frame represented by the feature vector includes at least a portion of an audio signal of a corresponding word.  Paragraph 63 of Parada describes the pre-set audio decoding algorithm.  Paragraph 64 of Parada describes that “Okay” and “Google” are exemplary words that can be identified as target audio events.)
Parada does not teach “extracting, for the target audio event of any one of the plurality of time periods, an optimal audio data sequence of the target audio event from the audio data of each of the plurality of time periods,
wherein the obtaining, for each of the plurality of time periods, the target audio event of the audio data further comprises: 
respectively determining an audio event having a maximum posterior probability in the audio data for each of the plurality of time periods, and setting the audio event having the maximum posterior probability as an initial audio event at a corresponding moment; and 
performing an error correction processing on the initial audio event according to a preset Viterbi decoding algorithm to obtain the target audio event for each of the plurality of time periods, and wherein the extracting, for the target audio event of any one of the plurality of time periods, the optimal audio data sequence of the target audio event further comprises:
extracting the optimal audio data sequence of the target audio event from the audio data of each of the plurality of time periods based on a continuity of the target audio event of each of the plurality of time periods, such that types of first target audio events among a plurality of target audio events determined as a first optimal audio data sequence are arranged in first consecutive moments and types of second target audio events among the plurality of target audio events determined as a second optimal audio data sequence are arranged in second consecutive moments.”
Regarding the claimed “an optimal audio data sequence,” the broadest reasonable interpretation of this phrase based on the discussion in paragraph 60 of the present specification, is asserted to be “the audio decoding result of the method.”
Accordingly, Du describes (Du is only cited as describing the un-bracketed portion below):
extracting, for the target audio event of any one of the plurality of time periods, an optimal audio data sequence of the target audio event from the audio data of each of the plurality of time periods, (Paragraphs 147 and 148 of Du describe that Viterbi decoding is done on the posterior probabilities to get a decoding result.) 
wherein the obtaining, for each of the plurality of time periods, the target audio event of the audio data further comprises: 
performing [an error correction] processing on the initial audio event according to a preset Viterbi decoding algorithm to obtain the target audio event for each of the plurality of time periods, and (Paragraphs 147 and 148 of Du describe that preset Viterbi decoding is done on the posterior probabilities to get a decoding result.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the Viterbi decoding of Du into the method of Parada to provide greater speech detection accuracy, as described in paragraph 149 of Du.
With regard to “performing an error correction processing on the initial audio event according to a preset Viterbi decoding algorithm to obtain the target audio event for each of the plurality of time periods,” column 2, lines 14-19 of Kato describe a Viterbi decoder that has both an error detection and error correction function.
It would have been obvious It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the error correcting decoding described by Kato to the method of Parada in view of Du to error correct speech data as it is decoded, as described at column 2 of Kato.
However, Parada in view of Du and Kato does not describe “respectively determining an audio event having a maximum posterior probability in the audio data for each of the plurality of time periods, and setting the audio event having the maximum posterior probability as an initial audio event at a corresponding moment” or “extracting the optimal audio data sequence of the target audio event from the audio data of each of the plurality of time periods based on a continuity of the target audio event of each of the plurality of time periods, such that types of first target audio events among a plurality of target audio events determined as a first optimal audio data sequence are arranged in first consecutive moments and types of second target audio events among the plurality of target audio events determined as a second optimal audio data sequence are arranged in second consecutive moments” as recited in amended Claim 1.
With regard to “respectively determining an audio event having a maximum posterior probability in the audio data for each of the plurality of time periods, and setting the audio event having the maximum posterior probability as an initial audio event at a corresponding moment,” paragraph 65 of Zhang describes setting an audio event having the maximum posterior probability as a selected event and then performing smoothing processing on a series of these selected events.
With regard to “extracting the optimal audio data sequence of the target audio event from the audio data of each of the plurality of time periods based on a continuity of the target audio event of each of the plurality of time periods,” paragraphs 63-65 of Zhang describe smoothing the series of maximum posterior probabilities (the claimed “initial audio events”) as part of speech decoding processing. This smoothing processing of Zhang is cited as processing “based on a continuity” as recited in Claim 1.  Paragraph 65 of Zhang describes that this reduces the fluctuations in the decoding result.
Accordingly, it would have been obvious before the effective filing date of the present application to include the smoothing as described by Zhang to the method of Parada in view of Du and Kato to smooth a decoded speech data, as described at paragraph 65 of Zhang.
However, Zhang does not describe “such that types of first target audio events among a plurality of target audio events determined as a first optimal audio data sequence are arranged in first consecutive moments and types of second target audio events among the plurality of target audio events determined as a second optimal audio data sequence are arranged in second consecutive moments.”
Tsai describes a device that smooths data (paragraphs 30, 31) depicted as a Cartesian curve (Figure 5).  By using a window 540, the data is smoothed such that there are multiple sections having the same value.

    PNG
    media_image1.png
    842
    1321
    media_image1.png
    Greyscale

Thus, all of the maximum posterior values in the first consecutive moments would be smoothed to a single value, which would be the “first optimal audio data sequence.”  All of the maximum posterior values in the second consecutive moments would be smoothed to a single value (different than the value for the first consecutive moments), which would be the “second optimal audio data sequence.”  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the Cartesian smoothing to create first and second optimal audio data sequences as described by Tsai to the method of Parada in view of Du, Kato, and Zhang to remove spurious values, as described at paragraph 30 of Tsai.
With regard to Claim 3, Parada teaches:
“The method according to claim 1, further comprising:
performing the framing processing on a training audio to obtain training audio data of each of the plurality of time periods in the training audio and respectively extracting a specified training feature vector from the training audio data of each of the plurality of time periods; (Paragraph 22 of Parada describes that the system 100 splits an audio signal into a plurality of frames of audio data.  Paragraph 81 of Parada describes that the audio data is broken into frames (step 604) and a feature vector is generated for each frame (step 606).  Paragraph 28 of Parada describes that the training may include predicting posterior probabilities from the feature values included in feature vectors corresponding to frames of audio data.) and
inputting the specified training feature vector of the training audio data of each of the plurality of time periods to a to-be-trained RNN/BI-RNN model and performing an iterative training based on each preset audio event, to obtain the RNN/BI-RNN model.” (Paragraph 28 of Parada describes that the training may include inputting the feature values into the model.  Paragraph 29 describes that training is an iterative process.)
With respect to Claim 9, apparatus Claim 9 and method Claim 1 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Parada teaches memory (paragraph 97) and a processor (paragraph 98).  Accordingly, claim 9 is similarly rejected under the same rationale as applied above with respect to Claim 1.
With regard to Claim 11, Parada teaches:
“The apparatus according to claim 9, further comprising:
second framing and extraction code configured to cause the at least one processor to perform the framing processing on a training audio to obtain training audio data of each of the plurality of time periods in the training audio, and respectively extracting a specified training feature vector from the training audio data of each of the plurality of time periods; (Paragraph 22 of Parada describes that the system 100 splits an audio signal into a plurality of frames of audio data.  Paragraph 81 of Parada describes that the audio data is broken into frames (step 604) and a feature vector is generated for each frame (step 606).  Paragraph 28 of Parada describes that the training may include predicting posterior probabilities from the feature values included in feature vectors corresponding to frames of audio data.) and
training code configured to cause the at least one processor to input the specified training feature vector of the training audio data of each of the plurality of time periods to a 47to-be-trained RNN/BI-RNN model and perform an iterative training based on each preset audio event, to obtain the RNN/BI-RNN model.” (Paragraph 28 of Parada describes that the training may include inputting the feature values into the model.  Paragraph 29 describes that training is an iterative process.)
With respect to Claim 16, computer readable medium Claim 16 and method Claim 1 are related as a computer readable medium including code to perform the same method, with each claimed code element's function corresponding to the claimed method step.  Further, Parada teaches a computer readable medium (paragraph 97), memory (paragraph 97), and a processor (paragraph 98) (memory and processor together are cited as “a computer”).  Accordingly, Claim 16 is similarly rejected under the same rationale as applied above with respect to Claim 1. 
 
5.	Claims 4 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Parada in view of Du, Kato, Zhang, Tsai and further in view of Lipton et al. (“A Critical Review of Recurrent Neural Networks for Sequence Learning”, hereinafter “Lipton”).
With regard to Claim 4, Parada describes:
“The method according to claim 3, wherein the inputting the specified training feature vector of the training audio data of each of the plurality of time periods to the to-be-trained RNN/BI-RNN model and performing the iterative training based on each preset audio event further comprises:
inputting the specified training feature vector of the training audio data of each of the plurality of time periods to a to-be-trained RNNm model.
Paragraphs 28 and 29 of Parada teach that the model is iteratively trained, and that during training, training probabilities for a plurality of time periods may be calculated to determine their accuracy.  Specifically, paragraph 28 describes that posterior probabilities are calculated based on input feature vectors, and these posterior probabilities correspond with entire words for the keywords or key phrases and represent the probability that the corresponding word is represented by a frame or multiple consecutive frames.
However, Parada does not explicitly describe:
performing a forward propagation training on a forward hidden layer to obtain a training probability of each audio event in the training audio data of each of the plurality of time periods output by the to-be-trained RNNm model, wherein m is a number of times that the to-be-trained RNNm model has been trained, and wherein an RNN0 model is an initial RNN model;
performing a backward propagation training on the forward hidden layer of the RNNm model based on the training probability of each audio event in the training audio data of each of the plurality of time periods and a preset reverse propagation BPTT algorithm, and updating a weight of each node in the RNNm model to obtain an updated RNNm+i model, wherein a weight of each node in the RNN0 model is a preset value; and
setting the updated RNNm+i model as a trained RNN model based on determining to stop the iterative training on the RNNm+i model; and, or updating m=m+1 based on determining to continue the iterative training on the RNNm+1 model and returning to perform the inputting the specified training feature vector of the training audio data of each of the plurality of time periods to the to-be-trained RNNm model.”
However, Du describes “performing a forward propagation training on a forward hidden layer to obtain a training probability of each audio event in the training audio data of each of the plurality of time periods output by the to-be-trained RNNm model, wherein m is a number of times that the to-be-trained RNNm model has been trained, and wherein an RNN0 model is an initial to-be-trained model” as recited above.
In this regard, paragraph 79 of Du describes that an RNN may be used for speech recognition.  Further, paragraph 103 of Du describes both a forward (first) and a backward (second) training, creating a Bi-RNN.  The input, hidden, and output layers are shown in Figure 3 of Du, as are the bi-directional training directions. 
Du further describes (other than bracketed portion) “performing a backward propagation training on the forward hidden layer of the RNNm model [based on the training probability of each audio event in the training audio data of each of the plurality of time periods] and a preset reverse propagation BPTT algorithm, and updating a weight of each node in the RNNm model to obtain an updated RNNm+i model, wherein a weight of each node in the RNN0 model is a preset value.”
The preset reverse backpropagation algorithm is shown in Figure 4 of Du and described at paragraphs 92-95.  Weight updating is described in paragraphs 99-101.  The weights are Wl in equation 3, paragraph 100 of Du.  (Explanation of equation terms is provided in paragraph 95 of Du.)  Paragraph 89 of Du describes that the initial parameters of the model may be preset.
Du finally describes “setting the updated RNNm+i model as a trained RNN model based on determining to stop the iterative training on the RNNm+i model; and, or updating m=m+1 based on determining to continue the iterative training on the RNNm+1 model and returning to perform the inputting the specified training feature vector of the training audio data of each of the plurality of time periods to the to-be-trained RNNm model” as recited above.
Paragraphs 97 and 98 of Du describe determining whether or not to end training and that the iteration counter is “i”.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the training process of Du into the method of Parada to provide speech de-noising and acoustic classification, as described in paragraph 105 of Du.  
	Although Du describes a backpropagation process as noted above, Parada in view of Du does not explicitly describe “performing a backward propagation training on the forward hidden layer of the RNNm model based on the training probability of each audio event in the training audio data of each of the plurality of time periods.”  However, this feature is described in section 2.3 of Lipton.  Specifically, the second half of page 9 describes that a feedforward pass is first done to get values for the hidden and output layers.  These results are then used in the described BPTT process. 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the BPTT process of Lipton into the method of Parada in view of Du to provide the “most successful method of training neural networks,” as described on page 8 of Lipton.   
  	 With respect to Claim 12, apparatus Claim 12 and method Claim 4 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly, Claim 12 is similarly rejected under the same rationale as applied above with respect to Claim 4.  Furthermore, Parada teaches a computer readable medium (paragraph 97), memory (paragraph 97), and a processor (paragraph 98).

Allowable Subject Matter
6.	Claims 5 and 13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  Claims 6-8, 14, 15, and 18-20 are allowable based on their dependence from Claim 5 or 13.
The following is a statement of reasons for the indication of allowable subject matter: The closest prior art, Parada and Du, do not teach or suggest:
“... respectively performing a forward propagation training on a forward hidden layer and the forward propagation training on a backward hidden layer, to obtain a first training probability and a second training probability of each audio event in the training audio data of each of the plurality of time periods output by the to-be-trained BI-RNNm model, wherein m is a number of times that the to-be-trained BI-RNNm model has been trained, and wherein a BI-RNNo model is an initial BI-RNN model;
performing a backward propagation training on the BI-RNNm model based on a third training probability of each audio event in the training audio data of each of the plurality of time periods and a preset reverse propagation BPTT algorithm, and respectively updating a weight of each node in the forward hidden layer and a weight of each node in the backward hidden layer, to obtain an updated BI-RNNm+i model, wherein the third training probability is a sum of the first training probability and the second training probability, and a weight of each node in the BI-RNNO model is a preset value” as recited in Claim 5 and similarly recited in Claim 13.

Conclusion
7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./           Examiner, Art Unit 2656                                                                                                                                                                                             

/MICHELLE M KOETH/Primary Examiner, Art Unit 2656