Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Status
This instant application No. 16/851361 has claims 1-18 pending.

Priority /Filing Date
Applicant claimed Foreign Priority from Korean Application No. KR10-2019-0161676. The priority filing date of this application is December 6, 2019.

Information Disclosure Statement
As required by M.P.E.P. 609(C), the Applicant’s submissions of the Information Disclosure Statements dated December 24, 2020 and April 16, 2021 are acknowledged by the Examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P. 609 C(2), a copy of each of the PTOL-1449s initialed and dated by the Examiner is attached to the instant Office action.






Claim Rejections - 35 USC § 112 
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.



Claim 1-18 are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
i)	As per claims 1 and 6 -last limitation, it is uncertain what ‘a final feature map’ entails and how it is being generated from ‘the intermediate maps’. Without additional details, these limitations and overall meaning of the claims are indefinite.  Appropriate correction is required.
ii) 	As per claim 12- last limitation it is uncertain what ‘an output feature map’ entails and how it is being generated by manipulating the intermediate maps. Without additional details these limitations and overall meaning of the claims are indefinite. Appropriate correction is required.
iii)	 As per claim 12- 2nd last limitation. The term “the input feature map” lack antecedent basis in the claim. Also, as per claims 1, 6 and 12, it is not clear what “input feature map” entails and how it is being used in the matrix multiplication operation-making the limitation and overall meaning of the claim indeterminate. Appropriate correction is required.
iv) 	As per claims 1 and 6, it is not clear the purpose of reshape operation and a transpose operation. It appears to be a redundant operation based on the subsequent limitations. Appropriate correction/clarification is required. 
Depended claims are rejected by virtue of their dependency on the rejected	 independent claims.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


5.	Claims 1-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. 
	Step 2A Prong One:
Independent claim 1 recite 
“determine whether to divide the initial weight in a column direction or a row direction according to whether a reshape operation and a transpose operation are performed before or after a matrix multiplication operation,
generate division weights by dividing the initial weight by a head count in the determined column direction or row direction,
generate intermediate feature maps by performing the matrix multiplication operation between the input feature map and the division weights;”
Independent claim 6 recite 
“determining whether to divide the initial weight in a column direction or a row direction according to whether a reshape operation and a transpose operation are performed before or after a matrix multiplication operation;
generating division weights by dividing the initial weight by a head count in the determined column direction or row direction;
generating intermediate feature maps by performing the matrix multiplication operation between the input feature map and the division weights;”
Independent claim 12 recite 
“dividing the initial weight into division weights;
performing a matrix multiplication operation between the input feature map and each of the division weights to generate intermediate feature maps;”
 all of the aforesaid limitations of the independent claims 1, 6 and 12 recites mathematical concepts such a s mathematical calculations/ relationships/expressions.  Said limitations in claims 1, 6 and 12 are a process that under its broadest reasonable interpretation, covers performance of the limitations constituting mathematical calculation/relationships/expressions, but for the recitation of generic computer components.  Other than reciting “a neural network apparatus”, “a memory having at least one program stored therein”, “a processor configured to perform one or more operations by executing the at least one program” in the claims nothing in the claim elements precludes the aforesaid steps being conceived as a of mathematical concepts that could be performed using simple pen/paper or mentally.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation that are mathematical calculations but for the recitation of generic computer components, then it falls within the “mathematical concepts” grouping of abstract ideas.  As such claims 1, 6 and 12 recite an abstract idea.
	Step 2A Prong Two:
	This judicial exception is not integrated into a practical application.  The claims recite the additional element of a “a neural network apparatus”, “a memory having at least one program stored therein”, “a processor configured to perform one or more operations by executing the at least one program” to perform the method steps at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component.  This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  
	The additional elements of  “acquire(ing) an input feature map and an initial weight from the memory’ is a data gathering step and an insignificant pre-solution activity, and ”generate a final feature map based on the intermediate feature maps” is an insignificant post-solution activity because of any details of  what a final feature map entails and how it is being generated from intermediate feature maps. As such these additional elements also does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
	Step 2B:
	Finally, the pre-processing step of acquiring input values is categorized as insignificant extra solution activity under 2106.05(g).  Claims 1, 6 and 12 only recite “a neural network apparatus”, “a memory having at least one program stored therein”, “a processor configured to perform one or more operations by executing the at least one program”  perform the method steps and therefore only recite a general purpose computer rather than a specific machine under MPEP 2106.05(b), and are directed to mere instructions to apply the exception under MPEP 2106.05(f), and do not result in anything significantly more than the judicial exception.  The additional elements have been considered both individually and as an ordered combination in the significantly more consideration.  The inclusion of the computer or memory and controller to perform the selecting and generating steps amount to nor more than mere instructions to apply the exception using generic computer components.  Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.  Claims 1, 6, and 12 are not patent eligible.
	The dependent claims 2-5, 7-11 and 13-18 recite additional steps of generating the division weights, matrix multiplication etc. which can be terms as further extension of mathematical concepts including mathematical calculation /relationships/expressions.  Because the dependent claims recite additional steps which are all directed to judicial exceptions, claims 2-5, 7-11 and 13-18 are also not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
 
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.



6.	Claims 1-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Shazeer et al., hereafter Shazeer (Pub. No.: US 2019/0130213 A1).

Regarding Claim 1, Shazeer discloses a neural network apparatus (Shazeer: Figure 1), comprising:
a memory having at least one program stored therein (Shazeer: [0095]-[0096]); and
a processor configured to perform one or more operations by executing the at least one
program (Shazeer: [0095]-[0096]), wherein the processor is configured to:
acquire an input feature map and an initial weight from the memory (Shazeer: [0015]-[0019], [0024]: conditioning input 102; [0049]: weight assigned to each value).
determine whether to divide the initial weight in a column direction or a row direction according to whether a reshape operation and a transpose operation are performed before or after a matrix multiplication operation (Shazeer: [0050]: the attention sublayer computes the dot products of the query with all of the keys, divides each of the dot products by a scaling factor; [0051]: the attention sub-layer can generate a matrix that includes the vectors as the columns of the matrix; [0052]: The attention sub-layer then performs a matrix multiply (MatMul) between the matrix Q and the transpose of the matrix K to generate a matrix of compatibility function outputs; [0066]: self-attention sub-layer implements multi-head attention; [0028]: d/2 of the dimensions of each embedding encode the row number of the channel-pixel pair and the other d/2 of the dimensions encode the column and the specific color channel of the channel-pixel pair),
generate division weights by dividing the initial weight by a head count in the determined column direction or row direction (Shazeer: [0053]: dividing each element of the matrix by the scaling factor;[0054]: The attention sub-layer then applies a softmax over the scaled output matrix to generate a matrix of weights and performs a matrix multiply (MatMul) between the weight matrix and the matrix V to generate an output matrix that includes the output of the attention mechanism for each of the values; [0066]: self-attention sub-layer implements multi-head attention; [0028]: d/2 of the dimensions of each embedding encode the row number of the channel-pixel pair and the other d/2 of the dimensions encode the column and the specific color channel of the channel-pixel pair),
generate intermediate feature maps by performing the matrix multiplication operation between the input feature map and the division weights (Shazeer: [0054]: The attention sub-layer then applies a softmax over the scaled output matrix to generate a matrix of weights and performs a matrix multiply (MatMul) between the weight matrix and the matrix V to generate an output matrix that includes the output of the attention mechanism for each of the values; Figure 2A, [0059]: The attention layer then applies the attention mechanism described above using these layer-specific queries, keys, and values to generate initial outputs for the attention layer), and
generate a final feature map based on the intermediate feature maps (Shazeer: Figure 2A, [0060]: The attention sub-layer then combines the initial outputs of the attention layers to generate the final output of the attention sub-layer; [0066]: multi-head attention).

Regarding claim 6, the claim recites the same substantive limitations as claim 1 and is rejected using the same teachings.

Regarding Claim 2, Shazeer further discloses the neural network apparatus of claim 1, wherein the processor is configured to:
generate the division weights by dividing the initial weight by a head count in the column
direction of the initial weight when the reshape operation and the transpose operation are performed after the matrix multiplication operation (Shazeer: [0052]: The attention sub-layer then performs a matrix multiply (MatMul) between the matrix Q and the transpose of the matrix K to generate a matrix of compatibility function outputs; [0053]: dividing each element of the matrix by the scaling factor;[0054]: The attention sub-layer then applies a softmax over the scaled output matrix to generate a matrix of weights and performs a matrix multiply (MatMul) between the weight matrix and the matrix V to generate an output matrix that includes the output of the attention mechanism for each of the values; [0066]: self-attention sub-layer implements multi-head attention), and
generate the final feature map by concatenating the intermediate feature maps (Shazeer: Figure 2A, [0060]: the attention sub-layer concatenates (concat) the outputs of the attention
layers and applies a learned linear transformation to the concatenated output to generate the output of the attention sub-layer; [0066]: multi-head attention).

Regarding claim 7, the claim recites the same substantive limitations as claim 2 and is rejected using the same teachings.

Regarding Claim 3, Shazeer further discloses the neural network apparatus of claim 1, wherein the processor is configured to:
generate the division weights by dividing the initial weight by a head count in the row
direction of the initial weight when the reshape operation and the transpose operation are
performed before the matrix multiplication operation (Shazeer: [0052]: The attention sub-layer then performs a matrix multiply (MatMul) between the matrix Q and the transpose
of the matrix K to generate a matrix of compatibility function outputs; [0025]: For an image of width wand height h, the system then combines the width and channel dimensions yielding a 3-dimensional input representation tensor with shape [h, w·3, d] [0053]: dividing each element of the matrix by the scaling factor;[0054]: The attention sub-layer then applies a softmax over the scaled output matrix to generate a matrix of weights and performs a matrix multiply (MatMul) between the weight matrix and the matrix V to generate an output matrix that includes the output of the attention mechanism for each of the values; [0066]: self-attention sub-layer implements multi-head attention), and
generate the final feature map through an element-wise sum of the intermediate feature
maps (Shazeer: [0053]: The attention sub-layer then scales the compatibility function output matrix, i.e., by dividing each element of the matrix by the scaling factor [0059]: The attention layer then applies the attention mechanism described above using these layer-specific queries, keys, and values to generate initial outputs for the attention layer; [0060]: The attention sub-layer then combines the initial outputs of the attention layers to generate the final output of the attention sub-layer).

Regarding claim 8, the claim recites the same substantive limitations as claim 3 and is rejected using the same teachings.

Regarding Claim 4, Shazeer further discloses the neural network apparatus of claim 1, wherein the matrix multiplication operation between the input feature map and the division weights is one of a one-dimensional convolution operation and a two-dimensional convolution operation (Shazeer: [0026]: the system 100 generates the representation 104 of the output image (with placeholder values for intensity values that have not already been generated) by applying a 1 x3 window size, 1 x3 strided convolution over the output image to combine the 3 channels per pixel to form an input representation tensor with shape [h, w, d]; [0052]-[0054]: matrix multiply (MatMul)).

Regarding claim 9, the claim recites the same substantive limitations as claim 4 and is rejected using the same teachings.

Regarding Claim 5, Shazeer further discloses the neural network apparatus of claim 1, wherein the processor comprises a weight divider, and the weight divider is configured to divide the initial weight by the head count in the column direction and the row direction (Shazeer: [0028]: d/2 of the dimensions of each embedding encode the row number of the channel-pixel pair and the other d/2 of the dimensions encode the column and the specific color channel of the channel-pixel pair; [0049]: The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key;  [0052]-[0054]: matrix multiply (MatMul).

Regarding claim 10, the claim recites the same substantive limitations as claim 5 and is rejected using the same teachings.

	Regarding Claim 11, Shazeer discloses a non-transitory computer-readable recording medium that stores a program that, when executed by a computer, performs the method of claim 6 (Shazeer: [0089]).

Regarding Claim 12, Shazeer discloses a method comprising:
receiving an initial feature map and an initial weight (Shazeer: [0015]-[0019], [0024]: conditioning input 102; [0049]: weight assigned to each value);
dividing the initial weight into division weights (Shazeer: [0053]: dividing each element of the matrix by the scaling factor;[0054]: The attention sub-layer then applies a softmax over the scaled output matrix to generate a matrix of weights and performs a matrix multiply (MatMul) between the weight matrix and the matrix V to generate an output matrix that includes the output of the attention mechanism for each of the values; [0066]: self-attention sub-layer implements multi-head attention; [0028]: d/2 of the dimensions of each embedding encode the row number of the channel-pixel pair and the other d/2 of the dimensions encode the column and the specific color channel of the channel-pixel pair);
performing a matrix multiplication operation between the input feature map and each of
the division weights to generate intermediate feature maps (Shazeer: [0054]: The attention sub-layer then applies a softmax over the scaled output matrix to generate a matrix of weights and performs a matrix multiply (MatMul) between the weight matrix and the matrix V to generate an output matrix that includes the output of the attention mechanism for each of the values; Figure 2A, [0059]: The attention layer then applies the attention mechanism described above using these layer-specific queries, keys, and values to generate initial outputs for the attention layer); and
manipulating the intermediate feature maps to generate an output feature map (Shazeer: Figure 2A, [0060]: The attention sub-layer then combines the initial outputs of the attention layers to generate the final output of the attention sub-layer; [0066]: multi-head attention).

.	Regarding Claim 13, Shazeer further discloses the method of claim 12, further comprising determining whether the input feature map has been subjected to a reshape operation and a transpose operation (Shazeer: [0050]: the attention sublayer computes the dot products of the query with all of the keys, divides each of the dot products by a scaling factor; [0051]: the attention sub-layer can generate a matrix that includes the vectors as the columns of the matrix; [0052]: The attention sub-layer then performs a matrix multiply (MatMul) between the matrix Q and the transpose of the matrix K to generate a matrix of compatibility function outputs; [0066]: self-attention sub-layer implements multi-head attention; [0025]: For an image of width wand height h, the system then combines the width and channel dimensions yielding a 3-dimensional input representation tensor with shape [h, w·3, d]).

Regarding Claim 14, Shazeer further discloses the method of claim 13, wherein, in a case in which the input feature map has been subjected to the reshape operation and the transpose operation, the initial weight is divided into the division weights based on a head count of the initial weight in a row direction (Shazeer: [0028]: d/2 of the dimensions of each embedding encode the row number of the channel-pixel pair and the other d/2 of the dimensions encode the column and the specific color channel of the channel-pixel pair).

Regarding Claim 15, Shazeer further discloses the method of claim 14, further comprising generating the output feature map as an element-wise sum of the intermediate feature maps (Shazeer: [0053]: The attention sub-layer then scales the compatibility function output matrix, i.e., by dividing each element of the matrix by the scaling factor [0059]: The attention layer then applies the attention mechanism described above using these layer-specific queries, keys, and values to generate initial outputs for the attention layer; [0060]: The attention sub-layer then combines the initial outputs of the attention layers to generate the final output of the attention sub-layer).

Regarding Claim 16, Shazeer further discloses the method of claim 13, wherein, in a case in which the input feature map has not been subjected to the reshape operation and the transpose operation, the initial weight is divided into the division weights based on a head count of the initial weight in a column direction (Shazeer: [0051]: the attention sub-layer can generate a matrix that includes the vectors as the columns of the matrix; [0052]: The attention sub-layer then performs a matrix multiply (MatMul) between the matrix Q and the transpose of the matrix K to generate a matrix of compatibility function outputs; [0066]: self-attention sub-layer implements multi-head attention).

Regarding Claim 17, Shazeer further discloses the method of claim 16, further comprising generating the output feature map by concatenating the intermediate feature maps (Shazeer: Figure 2A, [0060]: the attention sub-layer concatenates (concat) the outputs of the attention layers and applies a learned linear transformation to the concatenated output to generate the output of the attention sub-layer; [0066]: multi-head attention).

Regarding Claim 18, the claim recites the same substantive limitations as claim 11 and is rejected using the same teachings.

Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Ali Al-Shamma (Patent No .: US 10,692,570 B2) teaches accelerating multiplication operations, which can be employed in neural network operations, among other applications ..
Xiao et al.  (CNN–MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites, Neural Networks 125 (2020) 303–312) teaches CNN–MHSA, a Convolutional Neural Network (CNN) and the MHSA combined approach for highly-precise. To achieve this goal, CNN–MHSA first takes a URL string as the input data and feeds it into a mature CNN model so as to extract its features. In the meanwhile, MHSA is applied to exploit characters’ relationships in the URL so as to calculate the corresponding weights for the CNN learned features.
Chrzanowski et al. (Pub. No.: 20190354858 A1) conceptually presents memory-based neural network that is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.
Bai et al.  (A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting, INTERSPEECH 2019, pp 2190-2194) teaches a time delay neural network with shared weight self-attention for small-footprint keyword spotting. By sharing weights, the parameters of self-attention are reduced but without performance reduction. The publicly available Google Speech Commands dataset is used to evaluate the models.

8.	Examiner’s Remarks: Examiner has cited particular columns and line numbers in the references applied to the claims above for the convenience of the applicant.  Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested from the applicant in preparing responses, to fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. In the case of amending the claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.

9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to IFTEKHAR KHAN whose telephone number is (571)272-5699.  The examiner can normally be reached on 7:30AM-5:00PM (EST); M-F. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamini Shah can be reached on (571)-272-2279.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/IFTEKHAR A KHAN/Primary Examiner, Art Unit 2146