DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed on 7/3/2020.   Claims 1-24 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
3.	The drawings filed on 7/3/2020 have been accepted and considered by the Examiner.

Information Disclosure Statement
4.	The information disclosure statement (IDS) submitted on October 9, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
5.	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 


An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
In the present Application, each element of Claims 13 and 16-18 explicitly invokes “means plus function” language, and thus each element of Claims 13 and 16-18 will be interpreted in accordance with 35 U.S.C. 112(f).


Claim Rejections - 35 USC § 103
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


7.	Claims 1, 2, 7, 8, 13, 14, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent App. Pub. No. 20190287012 (Celikyilmaz et al., hereinafter “Cel”) in view of U.S. Patent No. 11,170,287 (Zhong et al., hereinafter “Zhong”).
	With regard to Claim 1, Cel describes:
A method for operating a neural network, the method comprising:
receiving an input sequence at an encoder; (Paragraph 26, encoders 104-106 are described such that input to the multiple encoder agents may be raw input, such as sequences of words.)
Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
calculating attention weights in attention-heads of the neural network based on the hidden representations; (Paragraph 27 describes that an attention mechanism is applied over the messages to allow the encoder agents to apply different weights (including zero weights) to the messages)
calculating a context vector for each attention-head based on the attention weights and the hidden representations, each context vector corresponding to a portion of the input sequence; and (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
outputting [[an inference]] based on the context vectors.  (Paragraph 28 describes that the output is based on the context vectors.)
	Cel does not explicitly describe “an inference.”
However, Zhong describes a system 100 that generates an inference based on an input sequence.  (col. 2, lines 52-56).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the inference of Zhong into the system of Cel to create a system that generates results that a human might provide, as described in col. 1, lines 20-40 of Zhong.
With respect to Claim 2, Cel describes “input sequence comprises an acoustic feature.” (Paragraph 2 describes that the device may be used for speech recognition, and para 5 describes audio input as an input modality.).  However, Cel does not 
With regard to Claim 7, Cel describes:
An apparatus for operating a neural network, the apparatus comprising:
a memory; and (paragraph 77, memory 804)
at least one processor coupled to the memory, the at least one processor being configured: (paragraph 77, processor 1302)
to receive an input sequence at an encoder; (Paragraph 26, encoders 104-106 are described such that input to the multiple encoder agents may be raw input, such as sequences of words.)
to encode the input sequence to produce hidden representations; (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
to calculate attention weights in attention-heads of the neural network based on the hidden representations; (Paragraph 27 describes that an attention mechanism is applied over the messages to allow the encoder agents to apply different weights (including zero weights) to the messages)
to calculate a context vector for each attention-head based on the attention weights and the hidden representations, each context vector corresponding to a portion of the input sequence; and (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
to output [[an inference]] based on the context vectors. (Paragraph 28 describes that the output is based on the context vectors.)
	Cel does not explicitly describe “an inference.”
However, Zhong describes a system 100 that generates an inference based on an input sequence.  (col. 2, lines 52-56).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the inference of Zhong into the system of Cel to create a system that generates results that a human might provide, as described in col. 1, lines 20-40 of Zhong.
With respect to Claim 8, Cel describes “the input sequence comprises an acoustic feature.”   (Paragraph 2 describes that the device may be used for speech recognition and para 5 describes audio input as an input modality.).  However, Cel does not explicitly describe “to output an inference indicating whether a keyword is included in a corresponding portion of the input sequence.” Zhong describes at col. 10, line 55 to col. 11, line 44, that a word overlap score F1 may be determined for inferences corresponding to a ground truth, which is cited as “an indication of whether a keyword is included in a corresponding portion of the input sequence.”  It would have been obvious 
With regard to Claim 13, Cel describes:
An apparatus for operating a neural network, the apparatus comprising:
means for receiving an input sequence at an encoder; (Paragraph 26, encoders 104-106 are described such that input to the multiple encoder agents may be raw input, such as sequences of words.)
means for encoding the input sequence to produce hidden representations; (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
means for calculating attention weights in attention-heads of the neural network based on the hidden representations; (Paragraph 27 describes that an attention mechanism is applied over the messages to allow the encoder agents to apply different weights (including zero weights) to the messages)
means for calculating a context vector for each attention-head based on the attention weights and the hidden representations, each context vector corresponding to a portion of the input sequence; and (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
means for outputting [[an inference]] based on the context vectors. (Paragraph 28 describes that the output is based on the context vectors.)

However, Zhong describes a system 100 that generates an inference based on an input sequence.  (col. 2, lines 52-56).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the inference of Zhong into the system of Cel to create a system that generates results that a human might provide, as described in col. 1, lines 20-40 of Zhong.
With respect to Claim 14, Cel describes “the input sequence comprises an acoustic feature.  (Paragraph 2 describes that the device may be used for speech recognition, and para 5 describes audio input as an input modality).  However, Cel does not explicitly describe “means for outputting an inference indicating whether a keyword is included in a corresponding portion of the input sequence.”  Zhong describes at col. 10, line 55 to col. 11, line 44, that a word overlap score F1 may be determined for inferences corresponding to a ground truth, which is cited as “an indication of whether a keyword is included in a corresponding portion of the input sequence.”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the inference of Zhong into the system of Cel to create a system that generates results that a human might provide, as described in col. 1, lines 20-40 of Zhong.
With regard to Claim 19, Cel describes:
A non-transitory computer readable medium (paragraph 77, memory 804) having encoded thereon program code for operating a neural network, the program code being executed by a processor (paragraph 77, processor 1302) and comprising:
Paragraph 26, encoders 104-106 are described such that input to the multiple encoder agents may be raw input, such as sequences of words.)
program code to separate the input sequence into sequence parts; (Paragraph 26, encoders 104-106 are described such that input to the multiple encoder agents may be raw input, such as sequences of words.)
program code to encode the sequence parts to produce hidden representations; (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
program code to calculate attention weights in attention-heads of the neural network based on the hidden representations; (Paragraph 27 describes that an attention mechanism is applied over the messages to allow the encoder agents to apply different weights (including zero weights) to the messages)
program code to calculate a context vector for each attention-head based on the attention weights and the hidden representations, each context vector corresponding to a portion of the input sequence; and (Paragraph 27 describes that vectors are computed from hidden-state output of one or more layers of the respective sending encoder agents)
program code to output [[an inference]] based on the context vectors. (Paragraph 28 describes that the output is based on the context vectors.)
	Cel does not explicitly describe “an inference.”
However, Zhong describes a system 100 that generates an inference based on an input sequence.  (col. 2, lines 52-56).  It would have been obvious to one of ordinary 
With respect to Claim 20, Cel describes “the input sequence comprises an acoustic feature.”  (Paragraph 2 describes that the device may be used for speech recognition, and para 5 describes audio input as an input modality).  However, Cel does not explicitly describe “to output an inference indicating whether a keyword is included in a corresponding portion of the input sequence.” Zhong describes at col. 10, line 55 to col. 11, line 44, that a word overlap score F1 may be determined for inferences corresponding to a ground truth, which is cited as “an indication of whether a keyword is included in a corresponding portion of the input sequence.”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the inference of Zhong into the system of Cel to create a system that generates results that a human might provide, as described in col. 1, lines 20-40 of Zhong.

8.	Claims 3-6, 9-12, 15-18, and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Cel in view of Zhong and further in view of “Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism” (Tao et al., hereinafter “Tao”).
With regard to Claim 3, Cel in view of Zhong does not describe “each of the context vectors are orthogonal to other context vectors of other attention-heads.”  However, Tao describes at section 2.3, pages 4420-4421 that the attention head 

    PNG
    media_image1.png
    301
    722
    media_image1.png
    Greyscale

With regard to Claim 4, Cel in view of Zhong does not describe “the attention weight is calculated based on a score function for focus determination.”  However, Tao discusses the score function (et,i) which influences the attention mechanism weights (Tao, Equation 5).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the score function of Tao into the system of Cel in view of Zhong to allow the decoder to pay different attention to each part of input at every timestep, as described in section 2.1 of Tao.

    PNG
    media_image2.png
    484
    748
    media_image2.png
    Greyscale

With regard to Claim 5, Cel in view of Zhong does not describe “selectively regularizing the neural network based on orthogonality constraints of the attention weights and context vectors.”  However, Tao describes at section 2.3, pages 4420-4421 that the orthogonality constraints are put on the model.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the orthogonal attention head vector constraints of Tao into the system of Cel in view of Zhong to create a system where the attention vectors for each head all concentrate on a single word and different heads attend to different words, as described in section 2.3 of Tao.

    PNG
    media_image3.png
    428
    729
    media_image3.png
    Greyscale

With regard to Claim 6, Cel in view of Zhong does not describe “the selectively regularizing further comprises calculating regularization terms only for positive samples in the input sequence.”    Equation 10 of Tao computes the average weight across all input words. Thus, zero samples (noise, empty space which are not words) would not be included.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the calculating only positive samples of Tao into the system of Cel in view of Zhong to create a system that performs a mean pooling across different decoding time (input words) and over different semantic spaces, as described in section 2.3 of Tao.

    PNG
    media_image4.png
    338
    755
    media_image4.png
    Greyscale

With regard to Claim 9, Cel in view of Zhong does not describe “each of the context vectors are orthogonal to other context vectors of other attention-heads.”  However, Tao describes at section 2.3, pages 4420-4421 that the attention head vectors may be orthogonal.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the orthogonal attention head vectors of Tao into the system of Cel in view of Zhong to create a system where the attention vectors for each head all concentrate on a single word and different heads attend to different words, as described in section 2.3 of Tao.
With regard to Claim 10, Cel in view of Zhong does not describe “to calculate the attention weight based on a score function for focus determination.”  However, Tao discusses the score function (et,i) which influences the attention mechanism weights (Tao, Equation 5).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the score function of Tao into the system of Cel in view of Zhong to allow the decoder to pay different attention to each part of input at every timestep, as described in section 2.1 of Tao.

With regard to Claim 12, Cel in view of Zhong does not describe “to calculate regularization terms only for positive samples in the input sequence.”    Equation 10 of Tao computes the average weight across all input words. Thus, zero samples (noise, empty space which are not words) would not be included.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the calculating only positive samples of Tao into the system of Cel in view of Zhong to create a system that performs a mean pooling across different decoding time (input words) and over different semantic spaces, as described in section 2.3 of Tao.
With regard to Claim 15, Cel in view of Zhong does not describe “each of the context vectors are orthogonal to other context vectors of other attention-heads.”  However, Tao describes at section 2.3, pages 4420-4421 that the attention head vectors may be orthogonal.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the orthogonal attention head vectors of Tao into the system of Cel in view of Zhong to create a system 
With regard to Claim 16, Cel in view of Zhong does not describe “means for calculating the attention weight based on a score function for focus determination.”  However, Tao discusses the score function (et,i) which influences the attention mechanism weights (Tao, Equation 5).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the score function of Tao into the system of Cel in view of Zhong to allow the decoder to pay different attention to each part of input at every timestep, as described in section 2.1 of Tao.
With regard to Claim 17, Cel in view of Zhong does not describe “means for selectively regularizing the neural network based on orthogonality constraints of the attention weights and context vectors.”  However, Tao describes at section 2.3, pages 4420-4421 that the orthogonality constraints are put on the model.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the orthogonal attention head vector constraints of Tao into the system of Cel in view of Zhong to create a system where the attention vectors for each head all concentrate on a single word and different heads attend to different words, as described in section 2.3 of Tao.
With regard to Claim 18, Cel in view of Zhong does not describe “means for calculating regularization terms only for positive samples in the input sequence.”    Equation 10 of Tao computes the average weight across all input words. Thus, zero samples (noise, empty space which are not words) would not be included.  It would 
With regard to Claim 21, Cel in view of Zhong does not describe “each of the context vectors are orthogonal to other context vectors of other attention-heads.”  However, Tao describes at section 2.3, pages 4420-4421 that the attention head vectors may be orthogonal.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the orthogonal attention head vectors of Tao into the system of Cel in view of Zhong to create a system where the attention vectors for each head all concentrate on a single word and different heads attend to different words, as described in section 2.3 of Tao.
With regard to Claim 22, Cel in view of Zhong does not describe “program code to calculate the attention weight based on a score function for focus determination.”  However, Tao discusses the score function (et,i) which influences the attention mechanism weights (Tao, Equation 5).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the score function of Tao into the system of Cel in view of Zhong to allow the decoder to pay different attention to each part of input at every timestep, as described in section 2.1 of Tao.
With regard to Claim 23, Cel in view of Zhong does not describe “program code to selectively regularize the neural network based on orthogonality constraints of the 
With regard to Claim 24, Cel in view of Zhong does not describe “program code to calculate regularization terms only for positive samples in the input sequence.”    Equation 10 of Tao computes the average weight across all input words. Thus, zero samples (noise, empty space which are not words) would not be included.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the calculating only positive samples of Tao into the system of Cel in view of Zhong to create a system that performs a mean pooling across different decoding time (input words) and over different semantic spaces, as described in section 2.3 of Tao.
	
Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent App. Pub. No. 20210182504 (Tu et al.) describes a device that includes an attention model and creates context vectors.

U.S. Patent App. Pub. No. 20200027444 (Prabhavalkaret al.) describes a device that includes a multi-head attention model and creates context vectors.
U.S. Patent App. Pub. No. 20190318725 (Le Roux al.) describes a device that includes an attention model and creates context vectors.                                                                                                                                                                                          

10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.  Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For 

/EDWARD TRACY JR./           Examiner, Art Unit 2656                                                                                                                                                                                             


/MICHELLE M KOETH/Primary Examiner, Art Unit 2656