DETAILED ACTION

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 3 – 9, 11 – 17, 19, and 20 are rejected under 35 U.S.C. 101
 because the claimed invention is directed to an abstract idea without significantly more. 
As to claims 1, 9, and 17, 
Step 2A, Prong One
The claim recites in part:
identifying a two-dimensional weight matrix corresponding to the two-dimensional input matrix, the two-dimensional weight matrix including a plurality of weight values;
identifying a first block of elements of the two-dimensional input matrix, the first block of elements comprising a plurality of columns of the two-dimensional input matrix;
loading a first weight block of the two-dimensional weight matrix, the first weight block comprising a plurality of rows of the two-dimensional weight matrix, wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements;
calculating a first partial output for the first block of elements by performing a first dot product operation using a first row of elements of the first block of elements and the first weight block, wherein the first row of elements of the first block of elements corresponds to a first batch of elements;
Under the broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

Accordingly, at Step 2A, Prong One, the claim is directed to an abstract idea.


Step 2A, Prong Two
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional element of “receiving a two-dimensional input matrix that includes a plurality of elements, wherein each row of the two-dimensional input matrix corresponds to a batch of elements” which amounts to extra-solution activity of gathering data for use in the claimed process.  As described in MPEP 2106.05(g), limitations that amount to merely adding insignificant extra-solution activity to a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.  

The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional element of “storing the first partial output” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).

The claim further recites “generating a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements.” These elements are recited at a high-level of generality and amounts to no more than adding the words “apply it” to the judicial exception.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).   These limitations also amount to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).  

Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application.




Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  
As discussed above, the additional elements of “receiving a two-dimensional input matrix that includes a plurality of elements, wherein each row of the two-dimensional input matrix corresponds to a batch of element” which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As also discussed above, the additional elements of “storing the first partial output” which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
The limitation “generating a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements” amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).  The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”).

Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.

As to claims 3, 11, and 19. Under the broadest reasonable interpretation, the limitations “wherein the two-dimensional weight matrix is arranged in column-major order” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.



As to claims 4 and 12, The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional element of “wherein the method is implemented by at least one processor of the computing device, and the at least one processor includes a vector processing unit” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).

As to claims 5, 13, and 20. The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional element of “further comprising, in response to storing the first partial output for the first block of elements: reloading the first weight block ” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).

As to claims 6 and 14, Under the broadest reasonable interpretation, the limitations “calculating a second partial output for the first block of elements by performing a second dot product operation using a second row of elements of the first block of elements and the first weight block, wherein the second row of elements of the first block of elements corresponds to a second batch of elements” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

As to claims 7 and 15, the claim recites “generating a second output element using the second partial output for the first block of elements and at least one other partial output corresponding to the second batch of elements.” These elements are recited at a high-level of generality and amounts to no more than adding the words “apply it” to the judicial exception.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).   These limitations also amount to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).  

As to claims 8 and 16, Under the broadest reasonable interpretation, the limitations “further comprising: identifying a second block of elements of the two-dimensional input matrix; loading a second weight block of the two-dimensional weight matrix; and calculating a first partial output for the second block of elements by performing a third dot product operation using a first row of elements of the second block of elements and the second weight block, wherein the first row of elements of the second block of elements corresponds to a first batch of elements” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 3 – 9, 11 – 17, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over DEISHER et al (US 2018/0121796) in view of Gu et al (US 2019/0187898).
As to claim 1, DEISHER et al teaches a method for classifying information using a fully-connected layer of a convolutional neural network (paragraph [0044]...spoken utterance classification may be based on deep neural networks (DNNs) as described herein. Such neural networks may be, or may have layers of, convolutional neural networks (CNNs)), the method comprising, at a computing device: 
receiving (paragraph [0046]...NN Accelerator (NNA) 202)  a two-dimensional input matrix that includes a plurality of elements (paragraph [0066]...the input elements in the input vector), wherein each row of the two-dimensional input matrix (paragraph [0053]... a two dimensional matrix with the input per iteration being a column in the matrix and in sequential order in memory, and the rows being one element per input, and changed to a structure arranged so that a set of inputs of the same iteration can be executed at once which is practically using a column of the matrix) corresponds to a batch of elements (paragraph [0163]... neural network layer input can be viewed as a 2D matrix. One of the dimensions is the input vector length and the other dimension is the grouping factor (i.e., batch size) where each group forms a different output of a layer); 
identifying a two-dimensional weight matrix (paragraph [0034]...a weight matrix) corresponding to the two-dimensional input matrix, the two-dimensional weight matrix including a plurality of weight values (paragraph [0048]...external memory 248 also may have one or more pre-allocated NN buffers (or application buffers) 256 including buffers for a matrix of input values, weights, scale factors, bias values, and other constants. These NN buffers 256 initially hold the data for the neural network before running the neural network or at least before a layer associated with the data is being processed. Eventually, the data in the NN buffers 256 are read by the NNA 202 to be placed into the internal buffers 238 to be used to compute NN outputs as explained below. The data for each layer in the NN buffers 256, such as the input values, scale factors, weights, and other data, also may be pre-ordered in the NN buffers 256, such as in pre-ordered single or two dimensional arrays);
identifying a first block of elements (paragraph [0044]...the components (logic elements) of individual or each logic block are arranged to give a programmer the option to use weights with different bit lengths and a scale factor may be applied to the weights depending on the bit length of the weights as explained herein) of the two-dimensional input matrix; 
loading a first weight block of the two-dimensional weight matrix (paragraph [0061]...the input array may be provided in a de-interleaved form or an interleaved form. In most cases, the input array will be provided in an interleaved form. When a neural network has an RNN layer, in this case, the de-interleaved form may be provided. In the de-interleaved form, and when the memory uses row-major storage, the input elements are divided into groups along rows, and as shown in FIG. 16, where input array 1600 is shown in de-interleaved form. In this case, the memory stores the groups group after group. Thus, when the input array is uploaded from external memory to the input buffer at internal memory 314, the data of a first group is loaded, or at least as much as will fit in the input buffer, and then the next group, and so on. Again, this may be used only in the case of an RNN layer where the order of the processing of the layers in the neural network is important, by one example); 
calculating a first partial output (paragraph [0032]...the NNA also may provide partial (subset) output computation-supporting active state lists processing such that a selected portion of a layer that provides outputs for less than all of the nodes on a neural network layer may be processed when processing of the entire layer is not desired) for the first block of elements by performing a first dot product operation (paragraph [0080]...an MAC 401 is shown to determine a dot product (or sum output) of the weights and input values for a single node or output of a layer of a neural network. The MAC 401 may have mathematically and/or logically parallel logic blocks 402-0 to 402-N (or generally referred to as logic blocks 402), and by one example, 48 logic blocks are provided but more or less may be provided. The logic blocks 402 are fixed function hardware logic blocks formed of well understood transistors or other semiconductor components. Fixed function here refers to the use of an MAC 401 with particular logic components or elements in an arrangement that does not change) using a first row of elements of the first block of elements and the first weight block (paragraph [0082]... neural network propagation, the input to the MAC 401 may include an input set or feature vector from the input buffer and from the input array as explained above, a weight vector from the weight buffer, and a scale factor. These are all used to compute a single output (for a single node)), wherein the first row of elements of the first block of elements corresponds to a first batch of elements (paragraph [0162]...one of the dimensions is the input vector length and the other dimension is the grouping factor (i.e., batch size) where each group forms a different output of a layer. Thus, the transpose layer groups input data from multiple groups into a single array so that this array can be fetched from memory together thereby reducing the number of memory transactions);
storing the first partial output (paragraph [0067]...the intermediate sum stored in the sum buffer 326 is saved to memory as an intermediate sum to allow handling of additional outputs); 
and generating a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements (paragraph [0076]...the weighted inputs are computed in parallel, and are provided to an accumulator section of the MAC 319 to provide a single weighted input sum (also referred to as a sum output (or more likely a partial sum when more than one input set is included in an input vector (group) or more than one iteration is provided))).
DEISHER et al fails to explicitly show/teach the first block of elements comprising a plurality of columns of the two-dimensional input matrix; and the first weight block comprising a plurality of rows of the two-dimensional weight matrix, wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements.
Gu et al teaches the first block of elements comprising a plurality of columns of the two-dimensional input matrix (paragraph [0048]...pluarlity of input feature map 710 may be a two dimensional matrix of values having a width and height); and the first weight block comprising a plurality of rows of the two-dimensional weight matrix (paragraph [0048]... The kernel 720 may be a two dimensional array of weights having a width and a height), wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements (paragraph [0048]... The height of the input feature map 710 (and therefore the number of rows) may be equal to the width of the kernel 720 (and therefore the number of columns). To generate an output for a kernel 720, each weight in a column of the kernel 720 may be multiplied with a corresponding value in each row of each input feature map 710. For example, a weight of a kernel may be referred to as kj_wi, where j is the column of the weight and i is the position of the weight in the column, and a value of an input feature map may be referred to as Ix_Py_wz, where x identifies the input feature map, y is the row of the value, and z is the position of the value in the row. As shown in FIG. 7, in order to generate an output for each input feature map 710 in the batch, the weight k1_w1 is multiplied with each of the values I1_P1_w1, I1_P2_w1, I1_P3_w1, I2_P1_w1, I2_P2_w1, I3_P3_w1, I3_P2_w1, I3_P3_w1, and similar values in each position in each row of each input feature map 710)
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for DEISHER et al’s first block of elements comprising a plurality of columns of the two-dimensional input matrix; and the first weight block comprising a plurality of rows of the two-dimensional weight matrix, wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements, as in Gu et al, for the purpose of performing computations in memory blocks rather than in a CPU.

As to claim 3, DEISHER et al teaches the method, wherein the two-dimensional weight matrix is arranged in column-major order (paragraph [0069]...the weight matrix held at the weight buffer 320 may represent slightly different values depending on the type of layer that is being processed. For affine and recurrent layers, the weight matrix may have one row for each input to a layer and column for each output (or node) of the layer that is to be obtained. This assumes row major organization of memory. It is possible to use the transverse with a column major organization instead. This may be kept consistent for any of the arrays that provide a row or column major option).

As to claim 4, DEISHER et al teaches the method, wherein the method is implemented by at least one processor (paragraph [0259]...a processor, may provide the functionality described) of the computing device, and the at least one processor includes a vector processing unit (paragraph [0037]...processing operations such as weight functions, feature vector stacking and transformations).

As to claim 5, DEISHER et al teaches the method, further comprising, in response to storing the first partial output (paragraph [0067]...the intermediate sum stored in the sum buffer 326 is saved to memory as an intermediate sum to allow handling of additional outputs) for the first block of elements reloading the first weight block (paragraph [0145]...fully connected layers are operating on an interleaved array, where multiple groups of data (each group from a different output) are interleaved to improve efficiency of memory bandwidth via re-use of the weight matrix read for all groups).

As to claim 6, DEISHER et al teaches the method, further comprising, calculating a second partial output (paragraph [0032]...the NNA also may provide partial (subset) output computation-supporting active state lists processing such that a selected portion of a layer that provides outputs for less than all of the nodes on a neural network layer may be processed when processing of the entire layer is not desired) for the first block of elements (paragraph [0044]...the components (logic elements) of individual or each logic block are arranged to give a programmer the option to use weights with different bit lengths and a scale factor may be applied to the weights depending on the bit length of the weights as explained herein) by performing a second dot product operation (paragraph [0080]...an MAC 401 is shown to determine a dot product (or sum output) of the weights and input values for a single node or output of a layer of a neural network. The MAC 401 may have mathematically and/or logically parallel logic blocks 402-0 to 402-N (or generally referred to as logic blocks 402), and by one example, 48 logic blocks are provided but more or less may be provided. The logic blocks 402 are fixed function hardware logic blocks formed of well understood transistors or other semiconductor components. Fixed function here refers to the use of an MAC 401 with particular logic components or elements in an arrangement that does not change)  using a second row of elements of the first block of elements and the first weight block, wherein the second row of elements of the first block of elements corresponds to a second batch of elements (paragraph [0223]... the process 1020 may include "accumulate weighted inputs with accumulator circuit to obtain sum for an output" 1038, and as explained above, by a tree structure of adders to obtain a single sum (or dot-product) of weighted inputs for a single output referred to as a sum output herein. When the input vector has a number of input values that is the same or less than the number of parallel logic blocks, the sum output is a final sum output).

As to claim 7, DEISHER et al teaches the method, further comprising, generating a second output element using the second partial output for the first block of elements and at least one other partial output corresponding to the second batch of elements (paragraph [0076]...the weighted inputs are computed in parallel, and are provided to an accumulator section of the MAC 319 to provide a single weighted input sum (also referred to as a sum output (or more likely a partial sum when more than one input set is included in an input vector (group) or more than one iteration is provided))).

Claim 9 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 


Claim 11 has similar limitations as claim 3. Therefore, the claim is rejected for the same reasons as above. 

Claim 12 has similar limitations as claim 4. Therefore, the claim is rejected for the same reasons as above. 

Claim 13 has similar limitations as claim 5. Therefore, the claim is rejected for the same reasons as above. 

Claim 14 has similar limitations as claim 6. Therefore, the claim is rejected for the same reasons as above. 

Claim 15 has similar limitations as claim 7. Therefore, the claim is rejected for the same reasons as above. 

Claim 15 has similar limitations as claim 7. Therefore, the claim is rejected for the same reasons as above. 

As to claim 17, DEISHER et al teaches a computing device configured to classify information using a fully-connected layer of a convolutional neural network (paragraph [0044]...spoken utterance classification may be based on deep neural networks (DNNs) as described herein. Such neural networks may be, or may have layers of, convolutional neural networks (CNNs)), the computing device comprising: 
at least one a memory (paragraph [0067]... memory 248), storing:
a two-dimensional input matrix that includes a plurality of elements (paragraph [0066]...the input elements in the input vector), wherein each row of the two-dimensional input matrix (paragraph [0053]... a two dimensional matrix with the input per iteration being a column in the matrix and in sequential order in memory, and the rows being one element per input, and changed to a structure arranged so that a set of inputs of the same iteration can be executed at once which is practically using a column of the matrix) corresponds to a batch of elements (paragraph [0163]... neural network layer input can be viewed as a 2D matrix. One of the dimensions is the input vector length and the other dimension is the grouping factor (i.e., batch size) where each group forms a different output of a layer), and
a two-dimensional weight matrix (paragraph [0034]...a weight matrix)  corresponding to the two-dimensional input matrix, the two-dimensional weight matrix including a plurality of weight values (paragraph [0048]...external memory 248 also may have one or more pre-allocated NN buffers (or application buffers) 256 including buffers for a matrix of input values, weights, scale factors, bias values, and other constants. These NN buffers 256 initially hold the data for the neural network before running the neural network or at least before a layer associated with the data is being processed. Eventually, the data in the NN buffers 256 are read by the NNA 202 to be placed into the internal buffers 238 to be used to compute NN outputs as explained below. The data for each layer in the NN buffers 256, such as the input values, scale factors, weights, and other data, also may be pre-ordered in the NN buffers 256, such as in pre-ordered single or two dimensional arrays), and
a vector processor (paragraph [0037]...processing operations such as weight functions, feature vector stacking and transformations) coupled to the at least one memory and configured to cause the computing device to:
identify a first block of elements (paragraph [0044]...the components (logic elements) of individual or each logic block are arranged to give a programmer the option to use weights with different bit lengths and a scale factor may be applied to the weights depending on the bit length of the weights as explained herein) of the two-dimensional input matrix,
load a first weight block of the two-dimensional weight matrix (paragraph [0061]...the input array may be provided in a de-interleaved form or an interleaved form. In most cases, the input array will be provided in an interleaved form. When a neural network has an RNN layer, in this case, the de-interleaved form may be provided. In the de-interleaved form, and when the memory uses row-major storage, the input elements are divided into groups along rows, and as shown in FIG. 16, where input array 1600 is shown in de-interleaved form. In this case, the memory stores the groups group after group. Thus, when the input array is uploaded from external memory to the input buffer at internal memory 314, the data of a first group is loaded, or at least as much as will fit in the input buffer, and then the next group, and so on. Again, this may be used only in the case of an RNN layer where the order of the processing of the layers in the neural network is important, by one example);,
calculate a first partial output (paragraph [0032]...the NNA also may provide partial (subset) output computation-supporting active state lists processing such that a selected portion of a layer that provides outputs for less than all of the nodes on a neural network layer may be processed when processing of the entire layer is not desired) for the first block of elements by performing a dot product operation (paragraph [0080]...an MAC 401 is shown to determine a dot product (or sum output) of the weights and input values for a single node or output of a layer of a neural network. The MAC 401 may have mathematically and/or logically parallel logic blocks 402-0 to 402-N (or generally referred to as logic blocks 402), and by one example, 48 logic blocks are provided but more or less may be provided. The logic blocks 402 are fixed function hardware logic blocks formed of well understood transistors or other semiconductor components. Fixed function here refers to the use of an MAC 401 with particular logic components or elements in an arrangement that does not change) using a first row of elements of the first block of elements and the first weight block (paragraph [0082]... neural network propagation, the input to the MAC 401 may include an input set or feature vector from the input buffer and from the input array as explained above, a weight vector from the weight buffer, and a scale factor. These are all used to compute a single output (for a single node)), wherein the first row of elements of the first block of elements corresponds to a first batch of elements wherein the first row of elements of the first block of elements corresponds to a first batch of elements (paragraph [0162]...one of the dimensions is the input vector length and the other dimension is the grouping factor (i.e., batch size) where each group forms a different output of a layer. Thus, the transpose layer groups input data from multiple groups into a single array so that this array can be fetched from memory together thereby reducing the number of memory transactions),
store the first partial output (paragraph [0067]...the intermediate sum stored in the sum buffer 326 is saved to memory as an intermediate sum to allow handling of additional outputs), and generate a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements and generating a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements (paragraph [0076]...the weighted inputs are computed in parallel, and are provided to an accumulator section of the MAC 319 to provide a single weighted input sum (also referred to as a sum output (or more likely a partial sum when more than one input set is included in an input vector (group) or more than one iteration is provided))).
DEISHER et al fails to explicitly show/teach the first block of elements comprising a plurality of columns of the two-dimensional input matrix; and the first weight block comprising a plurality of rows of the two-dimensional weight matrix, wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements.
Gu et al teaches the first block of elements comprising a plurality of columns of the two-dimensional input matrix (paragraph [0048]...pluarlity of input feature map 710 may be a two dimensional matrix of values having a width and height); and the first weight block comprising a plurality of rows of the two-dimensional weight matrix (paragraph [0048]... The kernel 720 may be a two dimensional array of weights having a width and a height), wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements (paragraph [0048]... The height of the input feature map 710 (and therefore the number of rows) may be equal to the width of the kernel 720 (and therefore the number of columns). To generate an output for a kernel 720, each weight in a column of the kernel 720 may be multiplied with a corresponding value in each row of each input feature map 710. For example, a weight of a kernel may be referred to as kj_wi, where j is the column of the weight and i is the position of the weight in the column, and a value of an input feature map may be referred to as Ix_Py_wz, where x identifies the input feature map, y is the row of the value, and z is the position of the value in the row. As shown in FIG. 7, in order to generate an output for each input feature map 710 in the batch, the weight k1_w1 is multiplied with each of the values I1_P1_w1, I1_P2_w1, I1_P3_w1, I2_P1_w1, I2_P2_w1, I3_P3_w1, I3_P2_w1, I3_P3_w1, and similar values in each position in each row of each input feature map 710)
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for DEISHER et al’s first block of elements comprising a plurality of columns of the two-dimensional input matrix; and the first weight block comprising a plurality of rows of the two-dimensional weight matrix, wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements, as in Gu et al, for the purpose of performing computations in memory blocks rather than in a CPU.


As to claim 19, DEISHER et al teaches a computing device, wherein the two-dimensional weight matrix is arranged in column-major order (paragraph [0069]...the weight matrix held at the weight buffer 320 may represent slightly different values depending on the type of layer that is being processed. For affine and recurrent layers, the weight matrix may have one row for each input to a layer and column for each output (or node) of the layer that is to be obtained. This assumes row major organization of memory. It is possible to use the transverse with a column major organization instead. This may be kept consistent for any of the arrays that provide a row or column major option).

As to claim 20, DEISHER et al teaches a computing device, wherein the vector processor (paragraph [0037]...processing operations such as weight functions, feature vector stacking and transformations). s further configured to cause the computing device to, in response to storing the first partial output (paragraph [0067]...the intermediate sum stored in the sum buffer 326 is saved to memory as an intermediate sum to allow handling of additional outputs) for the first block of elements: reload the first weight block (paragraph [0145]...fully connected layers are operating on an interleaved array, where multiple groups of data (each group from a different output) are interleaved to improve efficiency of memory bandwidth via re-use of the weight matrix read for all groups).

Response to Arguments
Applicant's arguments filed 5/31/2022 have been fully considered but they are not persuasive. 
DEISHER et al fails to explicitly show/teach the first block of elements comprising a plurality of columns of the two-dimensional input matrix; and the first weight block comprising a plurality of rows of the two-dimensional weight matrix, wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements.
Gu et al teaches the first block of elements comprising a plurality of columns of the two-dimensional input matrix (paragraph [0048]...pluarlity of input feature map 710 may be a two dimensional matrix of values having a width and height); and the first weight block comprising a plurality of rows of the two-dimensional weight matrix (paragraph [0048]... The kernel 720 may be a two dimensional array of weights having a width and a height), wherein a number of the plurality of rows in the first weight block corresponds to a number of the plurality of columns in the first block of elements (paragraph [0048]... The height of the input feature map 710 (and therefore the number of rows) may be equal to the width of the kernel 720 (and therefore the number of columns). To generate an output for a kernel 720, each weight in a column of the kernel 720 may be multiplied with a corresponding value in each row of each input feature map 710. For example, a weight of a kernel may be referred to as kj_wi, where j is the column of the weight and i is the position of the weight in the column, and a value of an input feature map may be referred to as Ix_Py_wz, where x identifies the input feature map, y is the row of the value, and z is the position of the value in the row. As shown in FIG. 7, in order to generate an output for each input feature map 710 in the batch, the weight k1_w1 is multiplied with each of the values I1_P1_w1, I1_P2_w1, I1_P3_w1, I2_P1_w1, I2_P2_w1, I3_P3_w1, I3_P2_w1, I3_P3_w1, and similar values in each position in each row of each input feature map 710)
Therefore, DEISHER et al in view of Gu et al clearly shows all the limitations as claimed. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075. The examiner can normally be reached Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRANDON S COLE/           Primary Examiner, Art Unit 2128