DETAILED ACTION
Introduction
This office action is in response to applicant’s claims filed 5/14/2019. Claims 1-20 are currently pending and have been examined. Applicant’s IDS have been considered. There is no claim to foreign priority.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Salloum et al. (Salloum, US 2019/0043486) in view of Chen et al. (Chen, Self-Attention Based Network for Punctuation Restoration).
As per claim 1, Salloum teaches a non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause a computing device to (paragraph [0062]): 
generate, by each bi-directional recurrent neural network layer of a plurality of bi-directional recurrent neural network layers, a plurality of output states corresponding to words from a sequence of words (Figs. 5 and 7-his input sequence of words, a plurality of bi-direction RNN layers, plurality of output, paragraphs [0042-0047]); 
generate, utilizing one [or more neural] attention mechanism[s], a plurality of attention outputs based on the plurality of output states (ibid-Figs. 5, 7, paragraphs [0042-0047]-his attention layer, and corresponding outputs); 
determine punctuation label probabilities for the words from the sequence of words based on the plurality of output states and the plurality of attention outputs (ibid, paragraph [0042]-his prediction of tags, and prediction based on probabilistic RNN and softmax classification); and 
generate a punctuated transcript comprising punctuation before one or more of the words from the sequence of words based on the punctuation label probabilities (ibid, Fig. 4-his generated output transcript based on sequence of words punctuation label/tag probabilities from prediction classification, Fig. 7, abstract/paragraph [0002]-see transcription, punctuation restoring discussion).
Salloum lacks explicitly teaching that which Chen teaches generate, utilizing one or more neural attention mechanisms (page 2805 columns 1 and 2, Fig. 1-his multi-head and scaled dot product attention mechanisms), a plurality of attention outputs based on the plurality of output states (ibid, see also Chen output labels discussion).
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Salloum and Chen to combine the prior art element of the attention mechanism as taught by Salloum with the plurality of attention mechanisms, including multi-head and scaled-dot product attentions as taught by Chen as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be allowing the neural network to focus on capturing relevant contexts that support the punctuation restoration (ibid-Salloum/Chen). 
As per claims 2 and 12, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 1, wherein: 
the one or more neural attention mechanisms comprise a neural attention mechanism for each bi-directional recurrent neural network layer (ibid-Chen, page 2805, Columns 1 and 2, Fig. 1-his attention weights, and corresponding attention mechanism for each bi-directional “BRNN”, page 2806 Column 2-his BRNN model); and the instructions, when executed by the at least one processor, cause the computing device to generate, by each neural attention mechanism, a layer-wise attention weight for each output state from the plurality of output states of a corresponding bi-directional recurrent neural network layer (ibid, Chen-Fig. 1, -see his layer-wise attention weights for each output from the plurality of outputs of his BRNN). 
As per claim 3, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 2, wherein the instructions, when executed by the at least one processor, cause the computing device to generate the plurality of attention outputs by concatenating, for a given state, the layer-wise attention weight corresponding to the state from each neural attention mechanism (ibid-Chen, page 2805-see concatenation of layer-wise attention weights from each attention).
As per claims 4 and 13, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 3, wherein: 
each neural attention mechanism from the one or more neural attention mechanisms comprises a multi-head neural attention mechanism (ibid-Chen, see his multi-head neural attention mechanism discussion); 
the instructions, when executed by the at least one processor, cause the computing device to: utilize each multi-head neural attention mechanism to generate a plurality of layer-wise attention weights for each state (ibid, Chen each multi-head and corresponding layer-wise attention weights for each state); and 
generate the plurality of attention outputs by concatenating, for a given state, the plurality of layer-wise attention weights corresponding to the state from each neural attention mechanism (ibid, Chen see concatenation discussion).
As per claims 5 and 16, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 1, wherein: the one or more neural attention mechanisms comprise a multi-head neural attention mechanism (ibid-Chen, see multi-head discussion); the instructions, when executed by the at least one processor, cause the computing device to: utilize the multi-head neural attention mechanism to generate a plurality of attention weights for each state (ibid, Chen, page 2805, Column 1, see attention weights and each corresponding output state discussion); and generate the plurality of attention outputs by concatenating, for a given state, the plurality of attention weights corresponding to the state (ibid-his attention output, based on concatenation of attention weights corresponding to the state).
As pers claim 6 and 15, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 1, wherein the one or more neural attention mechanisms comprise one or more scaled dot-product neural attention mechanisms (ibid-Chen, page 2805-column 1-his neural attention mechanism comprising one or more scaled dot-products). 
As per claim 7, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 1, further storing instructions that, when executed by the at least one processor, cause the computing device to generate a set of final states based on the plurality of output states (ibid-Salloum, Fig. 7 as including his set of final states based on the plurality of output states). 
As per claims 8 and 18, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 7, wherein the instructions, when executed by the at least one processor, cause the computing device to determine the punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs utilizing a fully connected layer with a SoftMax classifier to generate, for a given word of the sequence of words, a punctuation label probability for each of a plurality of punctuation marks (ibid-Salloum, Figs. 5 and 7-see his fully connected layers, attention outputs and softmax classifier, tags/labels based on probability, see Chen page 2305 Column 1-softmax, probability of label sequence discussion).
As per claim 9, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 1, wherein the instructions, when executed by the at least one processor, cause the computing device to generate, by a given bi-directional recurrent neural network layer (ibid-Salloum-Figs. 5 and 7-his BRNN, Chen-BRNN), the plurality of output states by: 
generating a plurality of forward states by processing embeddings of the sequence of words in a forward direction utilizing a forward recurrent neural network layer of the given bi-directional recurrent neural network layer (ibid-Salloum-Figs. 5 and 7); 
generating a plurality of backward states by processing the embeddings of the sequence of words in a backward direction utilizing a backward recurrent neural network layer of the given bi-directional recurrent neural network layer (ibid); and 
combining, for each state, a forward state and a backward state corresponding to the state (ibid).
As per claims 10 and 17, Salloum with Chen make obvious the non-transitory computer-readable medium of claim 1, further storing instructions that, when executed by the at least one processor, Salloum lacks that which Chen makes obvious,  cause the computing device to perform a language understanding task based on the punctuated transcript, the language understanding task comprising at least one of generating a translation, generating a transcript summary, determining an answer to a question, performing sentiment analysis, performing syntactic parsing, or extracting information (Chen, page 2803 Column 1, his punctuation result applied NLP downstream tasks, including question answering, machine translation, sentiment analysis and information extraction). 
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Salloum and Chen to combine the prior art element of the attention mechanism as taught by Salloum with the multiple NLP tasks utilizing a punctuated transcript as taught by Chen as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be utilizing the punctuation restored transcript in a downstream NLP task (ibid-Chen).
As per claim 11, claim 11 sets forth limitations similar to claims 1 and 9 and is thus rejected under similar reasons and rationale, wherein the system is deemed to embody the method, such that Salloum with Chen make obvious a system comprising: a memory comprising a punctuation restoration neural network trained to generate punctuation label probabilities, the punctuation restoration neural network comprising a plurality of bi-directional recurrent neural network layers and one or more neural attention mechanisms (Salloum, paragraphs [0062, 0063]-ibid-see claim 11, corresponding and similar limitation); at least one processor (ibid); and at least one non-transitory computer-readable medium storing instructions thereon that, when executed by the at least one processor, cause the system to (ibid): generate, by each bi-directional recurrent neural network layer of the punctuation restoration neural network, a plurality of output states by generating forward states and backward states and combining the forward states and backward states, wherein each state corresponds to words from a sequence of words (ibid-see claim 9, corresponding and similar limitation, Salloum, Figs. 5 and 7); generate a set of final states based on the plurality of output states utilizing a gated recurrent unit of the punctuation restoration neural network (ibid-Salloum, Figs. 5 and 7, paragraph [0046]); generate, utilizing one or more neural attention mechanisms, a plurality of attention outputs by combining the plurality of output states from each bi-directional layer and the set of final states (ibid-see claim 1, corresponding and similar limitation); determine punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs (ibid); and generate a punctuated transcript comprising punctuation before one or more of the words from the sequence of words based on the punctuation label probabilities (ibid).
As per claim 14, Salloum with Chen make obvious system of claim 12, wherein the instructions, when executed by the at least one processor, cause the system to generate the plurality of attention outputs by: generating, by each neural attention mechanism, the layer-wise attention weight for each output state from the plurality of output states based on the output state and at least one final state from the set of final states (ibid-see claim 3, generate discussion, Fig. 5 and 7, Salloum-his attention based on the output sate and one final state from a set of final states being input into the attention mechanism, Chen each attention mechanism, page 2805 columns 1 and 2, his attention weights, and layer-wise attention based  the final state from a set of states); and combining, for a given state, the layer-wise attention weight corresponding to the state from each neural attention mechanism (ibid-see Chen Fig. 1, page 2805-concatenation discussion). 
As per claim 19, claim 19 sets forth limitations similar to claim 1 and is thus rejected under similar reasons and rationale, such that Salloum with Chen make obvious a digital medium environment for using computer speech recognition technology to transcribe spoken language, a computer-implemented method comprising (Salloum, abstract, paragraphs [0002]-his ASR discussion): identifying a transcript comprising a sequence of words (see claim 1, transcript discussion, Figs. 3, 4); performing a step for generating punctuation label probabilities for the sequence of words based on the transcript (ibid-see claim 1, corresponding and similar discussion, see Chen page 2805, Columns 1 and 2, probabilities discussion); and generating a punctuated transcript corresponding to the transcript based on the punctuation label probabilities (ibid). 
As per claim 20, Salloum with Chen make obvious the computer-implemented method of claim 19, wherein identifying the transcript comprises generating the transcript based on received audio data (ibid-see Salloum, paragraphs [0002-0005], abstract, his speech dictated and audio received).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure (See PTO-892). 
-Orife (Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yoruba Language Text), teaches stacks of attention layers used in a restoration process, modeling a word sequence input. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LAMONT M SPOONER whose telephone number is (571)272-7613. The examiner can normally be reached 8:00 AM -5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LAMONT M SPOONER/           Primary Examiner, Art Unit 2657                                                                                                                                                                                             
lms
5/21/2022