DETAILED ACTION
This Office Action is in response to the amendment submitted May 3rd, 2022 for application No. 16/158,660 filed on October 12th, 2018. Claims 1, 3-15, and 17-26 are presented for examination and are currently pending.
	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, filed May 3rd, 2022, with respect to the objection to the title have been fully considered and are persuasive.  The objection to the title of May 3rd, 2022 has been withdrawn. 

Applicant's arguments filed in regards to the objection to claim 15 have been fully considered but they are not persuasive. “the apparatus comprising: a processor is configured to:…” is not correct grammar and still requires correction. An example of an acceptable amendment is: “the apparatus comprising: a processor that is configured to:…”. The objection of claim 15 is not withdrawn.
Applicant’s arguments filed in regards to the 112(b) rejection of claim 14 have been fully considered and are persuasive.  The 112(b) rejection of claim 14 has been withdrawn. 
Applicant’s arguments in regards to the 103 rejection(s) of claims 1, 3-12, 14, 15, 17-24, and 26 are not persuasive. The applicant begins arguments on Pg. 2 of “Remarks” by stating that [ Woolley (0068) ] can not be interpreted as equivalent to the claim language which recites “kernel weight value” and “pixel value”. The examiner respectfully disagrees. As applicant has correctly stated, Woolley teaches an image matrix and/or image features alongside a matrix of kernel weights. The values being in matrix format does not remove them from being eligible as art nor does it transform them to something that is not equivalent to the claim language. Applicant also states that another reason this citation is not equivalent is because Woolley discusses these data elements in the context of matrix multiplication between the matrices. Examiner points to: [ Woolley (0068) “As previously described herein, the convolution subsystem 180 leverages the optimized matrix multiplication capabilities of the SM 310 to efficiently perform the multi-convolution operation” ] This shows that the citation that was used in the rejection was not taken out of context but the multiplication operations are performed in the context of convolution operations. Second, Examiner notes that there is no reason that the data being in the format of a matrix would hinder the operations being performed and applicant’s specification makes no reference to the operations being excluded from being performed in matrix format. Lastly, examiner notes that the matrices holding the data is irrelevant as the claim language talks about the specific pieces of pixel values and weight data which is what the examiner pointed to for the rejection. The argument is not persuasive.
Applicant’s next argument on Pg. 3 of “Remarks” states that the matrix structure can not be interpreted as the individual data pieces being multiplied. The examiner respectfully disagrees. Examiner notes that matrix multiplication, as taught by Woolley implies that the individual data elements will be multiplied in normal matrix multiplication fashion as recognized by one of ordinary skill in the art, prior to the effective filing date. Applicant’s argument that the data being in the format of a matrix prevents individual data multiplication is not persuasive. 
Applicant’s argument on Pg. 4-5 of “Remarks” recites Fig. 5 of Woolley and discusses the relationships between the various matrix and tiles for the image data and weight data, and quotes the abstract of Woolley that states the pipeline performs matrix multiplication between the tiles to create the output matrix and states that the matrix multiplication is not equivalent to the convolution operation. The examiner respectfully disagrees. Applicant is reminded that cited prior art references must be considered in their entirety and not only the cited sections [ MPEP 2141.02(VI) ]. Examiner points to [ Woolley (0068) “As previously described herein, the convolution subsystem 180 leverages the optimized matrix multiplication capabilities of the SM 310 to efficiently perform the multi-convolution operation” ] Examiner notes that applicant points to this citation and says that breaking the convolution into sub-matrices to perform the convolution operation is different than the claim language but fails to explain why besides just reciting the reference. The applicant’s arguments are not persuasive.
Applicant’s argument on Pg. 5-6 of “Remarks” recites Zhang’s references and figures and the applicant discusses and explains why Zhang does not cover the deficiencies of Woolley in teaching the claim limitations containing “determining m first-bit feature map operands and n second-bit weight operands from input feature maps and kernels”. The arguments are moot as Zhang was not relied upon to teach the limitation and was relied upon to teach: “generating m x n outputs by performing addition and accumulation operations on results of multiplication operations performed by the decomposed sub-multipliers” which applicant then says is not taught by Zhang but and just recites multiple paragraphs from Zhang and states that the operations being performed between decomposed matrices precludes it from being equivalent. Examiner notes that the Zhang reference being in matrix format does not preclude it from teaching the claim limitations. As the matrix operations being performed will still be performed on each data operation within the matrix, the claim limitation is still being taught. 
Applicant’s argument on Pg. 7-8 of “Remarks” argues that Zhang fails to teach the “generating m x n outputs by performing addition and accumulation operations on results of multiplication operations performed by the decomposed sub-multipliers”. The examiner respectfully disagrees. Specifically, the applicant states that various decompositions in Zhang are not equivalent to the decomposed sub-multipliers in the claim language. Examiner points to [ Specification (0015) “The decomposed sub-multipliers may respectively correspond to sub-logics of a k-bit multiplier, in response to the convolution operator comprising the k-bit multiplier having full precision of k bits, the first-bit and the second-bit may be each smaller than the k-bit, and each of the decomposed sub-multipliers may correspond to a multiplier of the first-bit or a multiplier of the second-bit” ] The applicant’s specification recites that the decomposed sub-multipliers are just sub-logics of the k-bit multiplier. The virtual core of Zhang is equivalent to the sub-logics of the k-bit multiplier with the neural network basic unit being equivalent to the k-bit multiplier [ Zhang (0181) ]. 
	Examiner further points to [ Zhang (0181) “in a case where a size of the neural network basic unit exceeds the hardware constraint, it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation)” (Emphasis added) ] to show that these operations are indeed within the context of convolution operations and not arbitrary matrix operations.  



Applicant’s arguments in regards to the 103 rejection of claims 13 and 25 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Park  ("Zero and data Reuse-aware Fast Convolution for Deep Neural Networks on GPU"). Please see the 103 rejection section for detailed claim analysis and mapping. 


Claim Objections
Claim 15 is objected to for the following informalities: the claim recites: “the apparatus comprising: a processor is configured to” the correct grammar should recite: “the apparatus comprising: a processor that is configured to”. Applicant is respectfully asked to fix all similar mistakes.
Claims 11 and 23 are objected because of the following informalities: Both claims recite “and a forth decomposed sub-multiplier”. The correct spelling is “fourth”. Appropriate correction is requested and applicant is respectfully asked to fix all similar mistakes.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, 6, 9, 10, 14, 15, 17, 19, 21, 22, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley (US 20160162402 A1) and further in view of Zhang (US 20200026992 A1).
In regards to claim 1, Woolley teaches the following:
A method of processing a convolution operation in a neural network, the method comprising: determining m first-bit feature map operands and n second-bit weight operands from input feature maps and kernels, respectively, on which the convolution operation is to be performed in parallel, 
[ (¶0068) and (Fig. 5)
	This paragraph from Woolley teaches that there is a filter matrix (filter being equivalent to the kernel; with the individual data pieces within the matrix and/or tile being equivalent to the n second-bit weight operand and being derived from said filter) and an image matrix (image equivalent to the feature map; with the individual pieces of data within the matrix/tile being equivalent to the m first-bit feature map operand and being derived from said image) which are going to be used in the convolution operation. ]
[ (¶0078) “The convolution subsystem 180 may set the size of the image tile 542 in any technically feasible fashion that optimizes the capabilities of the SM 310. For example, the convolution subsystem 180 may set the size of the image tile 542 based on any number and combination of the size of the shared memory”
	This citation teaches the determining of the claim language and shows a self-determining size for the image tiles (tiles are sub-matrices that can be of any size). This would be equivalent to the m x n size (m for the image tiles; n for the filter tiles) ]
[ (Abstract) “In one embodiment of the present invention, a convolution engine configures a parallel processing pipeline to perform multi-convolution operations”
	This citation teaches that the operations can be performed in parallel. ]
wherein m and n are each a natural number;  
[ (¶0078) “The convolution subsystem 180 sets the size of the filter tile 544 based on the size of the image tile” 
This citation also shows that the filter tile can be adjusted to match the image tile creating an “n” number of operands (image size of “m” is taught below). ]
and where each first-bit feature map operands is a pixel value or portion of the pixel value and each second-bit weight operand is a kernel weight value or a portion of the kernel weight value;
[ (¶0065)
	This paragraph from Woolley discusses the various different parameters being chosen for the convolution operation. It includes filter and image data (n and m operands respectively) of the input which include various pixel data. Examiner notes that although the data is in the structure of a matrix/tile, the data within the matrix/tile is the component being selected and operated on with the size of m and n being determined by the size of the tiles that are variable. ]
dispatching each of m x n operand pairs, each of a feature map operand and a weight operand, that are respectively combined from the m first-bit feature map operands and the n second-bit weight operands, respectively, to different decomposed sub-multipliers in a convolution operator;
[ (¶0011) “a tile-based convolution engine can be implemented that configures a parallel processing pipeline to independently expand and process individual tiles of the image matrix. In such an approach, the parallel processing pipeline performs address calculations to expand each tile of the image matrix in shared memory on an as-needed basis. The parallel processing pipeline then performs matrix multiplication operations between the image tile and the filter stack.” (Emphasis added)
	This citation teaches there being an “m” number of tiles of the image matrix which would be equivalent to operands. Examiner notes that there is no limiting aspect of the claim language preventing “m = n = 1”. Lastly, the citation also shows the matrix multiplication operation, which examiner notes is a part of the convolution operation, being carried out between both sets of the operands. Examiner also notes that although the data is in the structure of a matrix/tile, the data within the matrix/tile is the component being selected and operated on with the size of m and n being determined by the size of the tiles that are variable.]
and obtaining pixel values of output feature maps corresponding to results of the convolution operation based on the m x n outputs.
[ (Fig. 4) and (¶0062)
	This illustration shows an output batch (reference 470) that contains output feature maps on image data from the result of the convolution operations. Examiner notes that pixel data is inherent in images and therefore the output feature maps containing processed image data also contain pixel values. ]
	What is not distinctly disclosed by Woolley and is instead taught by Zhang is the following:
generating m x n outputs by performing addition and accumulation operations on results of multiplication operations performed by the decomposed sub-multipliers; 
[ (¶0184) “then each virtual core in the computation group 23132 executes a matrix vector multiplication operation of the M-dimensional vector and the M*N-dimensional matrix, to obtain a result which is an N-dimensional vector, and divides the N-dimensional vector into two halves which are respectively output to two virtual cores that execute reduction; and the virtual core used for reduction accumulates output data of respective small matrices with respect to a same neuron, to obtain a final output,” (Emphasis added)
	This citation teaches the same m x n matrix, which was split from A x B size matrices (which are equivalent to larger size feature map and kernels), and are composed of image data and weights, having the matrix multiplication operation and then the accumulation operation of the small matrices to form the final output. ]
[ (¶0184) 
	This paragraph explains the multiplication operation being performed by decomposed virtual cores which would be equivalent to the sub-multipliers of the claim. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley with the addition and accumulation operation as taught by Zhang. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the operational speed of the neural network [ Zhang (¶0006) ]. This facilitates the obvious benefit of improved efficiency and faster calculation times for the process overall.




In regards to claim 3, The method of claim 1, is taught by Woolley/Zhang as seen in the rejection for claim 1 above. Woolley continues teaching the following limitations of claim 3 as seen below:
wherein the first-bit feature map operands are pixel values at different pixel locations in an input feature map.
[ (¶0010) – (¶0011)
These paragraphs detail the process of the pixel values within the image matrix being broken up into the parallel processing tiles with the pixel information not being repeated in the subsets. Therefore if the subsets of pixel data are not repeating then the image tiles (equivalent to the first-bit feature map operands) are based on pixel values at different pixel locations from the image matrix (equivalent to the input feature map). ]


In regards to claim 6, The method of claim 1, is taught by Woolley/Zhang as seen in the rejection for claim 1 above. Woolley continues teaching the following limitations as seen below:
wherein the first-bit feature map operands are pixel values at corresponding pixel locations in different input feature maps from among plural input feature maps,
[ (¶0070)
This paragraph explains the value “D4” from the image matrix is used four times and “D4” is considered the same pixel location each time it is used (center pixel).  ]
wherein the different input feature maps correspond to different input channels.
[ (¶0070) “as depicted for the value ‘D4,’ the center of each of the three-by-three color planes 410 is used four times to compute each of four columns in the output matrix and, consequently, each of the center values (e.g., the “D4” values) is associated with four separate columns of the virtual image matrix”
	This citation shows that there is specifically different input channels that correspond to the colors. ]
In regards to claim 9, The method of claim 1, is taught by Woolley/Zhang as seen in the rejection for claim 1 above. Zhang continues teaching the following limitations of the claim as seen below:
wherein: each of the decomposed sub-multipliers corresponds to sub-logics of a k-bit multiplier, in response to the convolution operator comprising the k-bit multiplier having full precision of k bits,
[ (¶0181) “the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores).”
	This citation teaches that the virtual cores are decomposed pieces of a neural network processing unit. ]
[ (Fig. 9) and (¶0183)-(¶0185)
	This figure and the corresponding paragraphs showcase the matrix being broken up by the virtual cores (which would be the sub-multipliers) and correspond to the same size bits as the original matrix. It also showcases in paragraph 0185 how the number of virtual cores will be scaled to the basic unit and matrix size so that all the virtual cores would be equivalent to the processing unit. ]
the first-bit and the second-bit are each smaller than the k-bit, and
[ (¶0185) “the fully expanded operation performed on the neural network basic unit is illustrated with the virtual core where M=N and the actual neural network basic unit where A=B=2;”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two which makes it so that each one is smaller than the original. ]
each of the decomposed sub-multipliers corresponds to a multiplier of the first-bit or a multiplier of the second-bit.
[ (¶0184) “small matrices (there are 4 M*N matrices in FIG. 9), which are distributed in the group of virtual cores for real computation; each virtual core is responsible for a small matrix operation”
	This citation from Zhang teaches that the sub-multipliers which are the virtual cores are being assigned specific operands to perform the convolution operation on. ]


In regards to claim 10, The method of claim 9, is taught by Woolley/Zhang as seen in the rejection for claim 9 above. Woolley continues teaching the following limitations of the claim as seen below:
and the operand pairs, in which the first-bit feature map operands and the second-bit weight operands are mapped to each other, are respectively dispatched to different decomposed sub-multipliers.
[ (¶0011) “a tile-based convolution engine can be implemented that configures a parallel processing pipeline to independently expand and process individual tiles of the image matrix. In such an approach, the parallel processing pipeline performs address calculations to expand each tile of the image matrix in shared memory on an as-needed basis. The parallel processing pipeline then performs matrix multiplication operations between the image tile and the filter stack.” (Emphasis added)
The citation shows the matrix multiplication operation being carried out between both sets of the operands. Examiner notes that the reference teaches the operands being mapped to each other and although possesses the parallel processing pipeline, lacks the decomposed sub-multipliers which are taught instead taught by the Zhang reference as combined previously. ]
What is not distinctly disclosed by Woolley and is subsequently taught by Zhang is seen below:
wherein: the first-bit feature map operands and the second-bit weight operands correspond to k/2-bit operands, each of the decomposed sub-multipliers corresponds to a k/2-bit multiplier, 
[ (¶0185) “the fully expanded operation performed on the neural network basic unit is illustrated with the virtual core where M=N and the actual neural network basic unit where A=B=2;”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two. ]

In regards to claim 14, method of claim 1, is taught by Woolley/Zhang as seen in the rejection for claim 1 above. Woolley continues teaching the following limitations of the claim as seen below:
A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to
[ (¶0113) “These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor”
	This citation teaches the processor carrying out the instructions. ]
[ (¶0016) “Further embodiments provide, among other things, a non-transitory computer-readable medium” ]
perform the method of claim 1
[ Please see the rejection for claim 1 above. ]



In regards to claim 15, Woolley teaches the following:
An apparatus for processing a convolution operation in a neural network, the apparatus comprising: a processor is configured to:
[ (¶0113) “These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor”
	This citation teaches a processor capable of carrying out the instructions for a convolution operation. ]
determine m first-bit feature map operands and n second-bit weight operands from input feature maps and kernels, respectively, on which the convolution operation is to be performed in parallel, 
[ (¶0068) and (Fig. 5)
	This paragraph from Woolley teaches that there is a filter matrix (filter being equivalent to the kernel; with the individual data pieces within the matrix and/or tile being equivalent to the n second-bit weight operand and being derived from said filter) and an image matrix (image equivalent to the feature map; with the individual pieces of data within the matrix/tile being equivalent to the m first-bit feature map operand and being derived from said image) which are going to be used in the convolution operation. ]
[ (¶0078) “The convolution subsystem 180 may set the size of the image tile 542 in any technically feasible fashion that optimizes the capabilities of the SM 310. For example, the convolution subsystem 180 may set the size of the image tile 542 based on any number and combination of the size of the shared memory”
	This citation teaches the determining of the claim language and shows a self-determining size for the image tiles (tiles are sub-matrices that can be of any size). This would be equivalent to the m x n size (m for the image tiles; n for the filter tiles) ]
[ (Abstract) “In one embodiment of the present invention, a convolution engine configures a parallel processing pipeline to perform multi-convolution operations”
	This citation teaches that the operations can be performed in parallel. ]
wherein m and n are each a natural number;  
[ (¶0078) “The convolution subsystem 180 sets the size of the filter tile 544 based on the size of the image tile” 
This citation also shows that the filter tile can be adjusted to match the image tile creating an “n” number of operands (image size of “m” is taught below). ]
and where each first-bit feature map operands is a pixel value or portion of the pixel value and each second-bit weight operand is a kernel weight value or a portion of the kernel weight value;
[ (¶0065)
	This paragraph from Woolley discusses the various different parameters being chosen for the convolution operation. It includes filter and image data (n and m operands respectively) of the input which include various pixel data. Examiner notes that although the data is in the structure of a matrix/tile, the data within the matrix/tile is the component being selected and operated on with the size of m and n being determined by the size of the tiles that are variable. ]
dispatch each of m x n operand pairs, each of a feature map operand and a weight operand, that are respectively combined from the m first-bit feature map operands and the n second-bit weight operands, respectively, to different decomposed sub-multipliers in a convolution operator;
[ (¶0011) “a tile-based convolution engine can be implemented that configures a parallel processing pipeline to independently expand and process individual tiles of the image matrix. In such an approach, the parallel processing pipeline performs address calculations to expand each tile of the image matrix in shared memory on an as-needed basis. The parallel processing pipeline then performs matrix multiplication operations between the image tile and the filter stack.” (Emphasis added)
	This citation teaches there being an “m” number of tiles of the image matrix which would be equivalent to operands. Examiner notes that there is no limiting aspect of the claim language preventing “m = n = 1”. Lastly, the citation also shows the matrix multiplication operation, which examiner notes is a part of the convolution operation, being carried out between both sets of the operands. Examiner also notes that although the data is in the structure of a matrix/tile, the data within the matrix/tile is the component being selected and operated on with the size of m and n being determined by the size of the tiles that are variable.]
and obtaining pixel values of output feature maps corresponding to results of the convolution operation based on the m x n outputs.
[ (Fig. 4) and (¶0062)
	This illustration shows an output batch (reference 470) that contains output feature maps on image data from the result of the convolution operations. Examiner notes that pixel data is inherent in images and therefore the output feature maps containing processed image data also contain pixel values. ]
	What is not distinctly disclosed by Woolley and is instead taught by Zhang is the following:
generate m x n outputs by performing addition and accumulation operations on results of multiplication operations performed by the decomposed sub-multipliers; 
[ (¶0184) “then each virtual core in the computation group 23132 executes a matrix vector multiplication operation of the M-dimensional vector and the M*N-dimensional matrix, to obtain a result which is an N-dimensional vector, and divides the N-dimensional vector into two halves which are respectively output to two virtual cores that execute reduction; and the virtual core used for reduction accumulates output data of respective small matrices with respect to a same neuron, to obtain a final output,” (Emphasis added)
	This citation teaches the same m x n matrix, which was split from A x B size matrices (which are equivalent to larger size feature map and kernels), and are composed of image data and weights, having the matrix multiplication operation and then the accumulation operation of the small matrices to form the final output. ]
[ (¶0184) 
	This paragraph explains the multiplication operation being performed by decomposed virtual cores which would be equivalent to the sub-multipliers of the claim. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration apparatus as taught by Woolley with the addition and accumulation operation as taught by Zhang. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the operational speed of the neural network [ Zhang (¶0006) ]. This facilitates the obvious benefit of improved efficiency and faster calculation times for the process overall.




In regards to claim 17, The apparatus of claim 15, is taught by Woolley/Zhang as seen in the rejection for claim 15 above. Woolley continues teaching the following limitations as seen below:
wherein the first-bit feature map operands are pixel values at different pixel locations in an input feature map.
[ (¶0010) – (¶0011)
These paragraphs detail the process of the pixel values within the image matrix being broken up into the parallel processing tiles with the pixel information not being repeated in the subsets. Therefore if the subsets of pixel data are not repeating then the image tiles (equivalent to the first-bit feature map operands) are based on pixel values at different pixel locations from the image matrix (equivalent to the input feature map). ]



In regards to claim 19, The apparatus of claim 16, is taught by Woolley/Zhang as seen in the rejection for claim 16 above. Woolley continues teaching the following limitations as seen below:
wherein the first-bit feature map operands are pixel values at corresponding pixel locations in different input feature maps,
[ (¶0070)
This paragraph explains the value “D4” from the image matrix is used four times and “D4” is considered the same pixel location each time it is used (center pixel).  ]
wherein the different input feature maps correspond to different input channels.
[ (¶0070) “as depicted for the value ‘D4,’ the center of each of the three-by-three color planes 410 is used four times to compute each of four columns in the output matrix and, consequently, each of the center values (e.g., the “D4” values) is associated with four separate columns of the virtual image matrix”
	This citation shows that there is specifically different input channels that correspond to the colors. ]




In regards to claim 21, The apparatus of claim 15, is taught by Woolley/Zhang as seen in the rejection for claim 15 above. Zhang continues teaching the following limitations of the claim as seen below:
each of the decomposed sub-multipliers corresponds to sub-logics of a k-bit multiplier, in response to the convolution operator comprising the k-bit multiplier having full precision of k bits,
[ (¶0181) “the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores).”
	This citation teaches that the virtual cores are decomposed pieces of a neural network processing unit. ]
[ (Fig. 9) and (¶0183)-(¶0185)
	This figure and the corresponding paragraphs showcase the matrix being broken up by the virtual cores (which would be the sub-multipliers) and correspond to the same size bits as the original matrix. It also showcases in paragraph 0185 how the number of virtual cores will be scaled to the basic unit and matrix size so that all the virtual cores would be equivalent to the processing unit. ]
the first-bit and the second-bit are each smaller than the k-bit,
[ (¶0185) “the fully expanded operation performed on the neural network basic unit is illustrated with the virtual core where M=N and the actual neural network basic unit where A=B=2;”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two which makes it so that each one is smaller than the original. ]
each of the decomposed sub-multipliers corresponds to a multiplier of the first-bit or a multiplier of the second-bit.
[ (¶0184) “small matrices (there are 4 M*N matrices in FIG. 9), which are distributed in the group of virtual cores for real computation; each virtual core is responsible for a small matrix operation”
	This citation from Zhang teaches that the sub-multipliers which are the virtual cores are being assigned specific operands to perform the convolution operation on. ]



In regards to claim 22, The apparatus of claim 21, is taught by Woolley/Zhang as seen in the rejection for claim 21 above. Woolley continues teaching the following limitations of the claim as seen below:
and the operand pairs, in which the first-bit feature map operands and the second-bit weight operands are mapped to each other, are respectively dispatched to the decomposed sub-multipliers.
[ (¶0011) “a tile-based convolution engine can be implemented that configures a parallel processing pipeline to independently expand and process individual tiles of the image matrix. In such an approach, the parallel processing pipeline performs address calculations to expand each tile of the image matrix in shared memory on an as-needed basis. The parallel processing pipeline then performs matrix multiplication operations between the image tile and the filter stack.” (Emphasis added)
The citation shows the matrix multiplication operation being carried out between both sets of the operands. Examiner notes that the reference teaches the operands being mapped to each other and although possesses the parallel processing pipeline, lacks the decomposed sub-multipliers which are taught instead taught by the Zhang reference as combined previously. ]
What is not distinctly disclosed by Woolley and is subsequently taught by Zhang is seen below:
wherein: the first-bit feature map operands and the second-bit weight operands correspond to k/2-bit operands, each of the decomposed sub-multipliers corresponds to a k/2-bit multiplier, 
[ (¶0185) “the fully expanded operation performed on the neural network basic unit is illustrated with the virtual core where M=N and the actual neural network basic unit where A=B=2;”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two. ]



In regards to claim 26, The apparatus of claim 15, is taught by Woolley/Zhang as seen in the rejection for claim 15 above. Woolley continues teaching the following limitations of as seen below:
further comprising a memory storing instructions that, when executed, configure the processor to determine the m first-bit feature map operands and the n second-bit weight operands
[ (¶0078) “The convolution subsystem 180 may set the size of the image tile 542 in any technically feasible fashion that optimizes the capabilities of the SM 310. For example, the convolution subsystem 180 may set the size of the image tile 542 based on any number and combination of the size of the shared memory”
	This citation teaches the determining of the claim language and shows a self-determining size for the image tiles (tiles are sub-matrices that can be of any size). This would be equivalent to the m x n size (m for the image tiles; n for the filter tiles) ]
dispatch the m×n operand pairs,
[ (¶0011) “a tile-based convolution engine can be implemented that configures a parallel processing pipeline to independently expand and process individual tiles of the image matrix. In such an approach, the parallel processing pipeline performs address calculations to expand each tile of the image matrix in shared memory on an as-needed basis. The parallel processing pipeline then performs matrix multiplication operations between the image tile and the filter stack.” (Emphasis added)
	This citation teaches there being an “m” number of tiles of the image matrix which would be equivalent to operands. The first Examiner notes that there is no limiting aspect of the claim language preventing “m = n = 1”. Lastly, the citation also shows the matrix multiplication operation being carried out between both sets of the operands. ]
and obtain the output feature maps.
[ (¶0022) and (Fig. 4)
	These citations from Woolley show an example of the operations above being carried out and the reference figure 490 in figure 4 being an output image feature map. ]



Claims 4 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley/Zhang as applied above, and further in view of Gohkale ("Snowflake: An Efficient Hardware Accelerator for Convolutional Neural Networks").

In regards to claim 4, The method of claim 3, is taught by Woolley/Zhang as seen in the rejection for claim 3 above. Gohkale teaches the following limitations as seen below:
wherein the second-bit weight operands are weight values at corresponding locations in different kernels from among plural kernels
[ (Pg. 2, Column 2, Paragraph 3) “These will be stored such that the first element in the weights array is the first pixel of the first red kernel’s first row. This is illustrated in figure 3b. The second element will be the first pixel of the second red kernel’s first row.”… “In this type of mapping, each MAC unit produces an output pixel.” (emphasis added)
	This citation from Gohkale teaches an embodiment of the weights array (equivalent to operands) containing values (equivalent to the weights) from kernels in a corresponding fashion where it would be the same location in each kernel for a plurality of kernels. ]
wherein the different kernels reference an input channel and different output channels of the input feature map.
[ (Pg. 1, Column 1, Paragraph 2) “CNNs use as input an image, with its red, green and blue channels split into three separate two dimensional input feature maps”
	In general, this paragraph under the header of “Introduction” goes over the process of convolution and explains how there is an explicit difference in the input and output channels. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration method as taught by Woolley/Zhang with the corresponding kernel layering as taught by Gohkale. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the operational efficiency of the neural network [ Gohkale (Abstract) ]. This facilitates the obvious benefit of improved power or resource consumption for the same amount of work done and maintains times for the process overall.

In regards to claim 7, The method of claim 6, is taught by Woolley/Zhang as seen in the rejection for claim 6 above. Gohkale teaches the following limitations as seen below:
wherein the second-bit weight operands are weight values at corresponding locations in different kernels from among plural kernels
[ (Pg. 2, Column 2, Paragraph 3) “These will be stored such that the first element in the weights array is the first pixel of the first red kernel’s first row. This is illustrated in figure 3b. The second element will be the first pixel of the second red kernel’s first row.”… “In this type of mapping, each MAC unit produces an output pixel.” (emphasis added)
	This citation from Gohkale teaches an embodiment of the weights array (equivalent to operands) containing values (equivalent to the weights) from kernels in a corresponding fashion where it would be the same location in each kernel for a plurality of kernels. ]
wherein the different kernels reference an input channel and different output channels of the input feature map.
[ (Pg. 1, Column 1, Paragraph 2) “CNNs use as input an image, with its red, green and blue channels split into three separate two dimensional input feature maps”
	In general, this paragraph under the header of “Introduction” goes over the process of convolution and explains how there is an explicit difference in the input and output channels. ]
	As this claim is similar to claim 4, it is rejected for the same reasons, motivation and art. Please refer to claim 4 to see the motivation to combine. 


Claims 5 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley/Zhang as applied above, and further in view of Boesche (US 20180189642 A1).
In regards to claim 5, The method of claim 3, is taught by Woolley/Zhang as seen in the rejection for claim 3 above. Boesch teaches the following limitations as seen below:
wherein the second-bit weight operands are weight values at different locations in a kernel,
[ (Fig. 1E) and (¶0015) – (¶0018)
	The figure and paragraphs cited above teach a plurality of kernels and within each kernel the weights are based on different columns and rows. ]
wherein the kernel references an input channel and any one output channel of the input feature map.
[ (¶0248) “To process a kernel of a convolution layer, each value (i.e., each pixel) of the input feature at a first position (e.g., upper right corner, upper left corner, or some other position) is multiplied with each corresponding value of the kernel, and the products are summed to generate one output result.”
	This citation from Boesch teaches the pixel and kernel convolution operation with the added detail of it being output to one result (equivalent to one output channel). ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang with the weight operations as taught by Boesch. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the speed and resource consumption of the neural network [ Boesch (¶0124) ]. This facilitates the obvious benefit of improved power or resource and an increase in speed provides faster results for the neural network overall. 


In regards to claim 8, The method of claim 6, is taught by Woolley/Zhang as seen in the rejection for claim 6 above. Boesch teaches the following limitations as seen below:
wherein the second-bit weight operands are weight values at different locations in a kernel from among plural kernels,
[ (Fig. 1E) and (¶0015) – (¶0018)
	The figure and paragraphs cited above teach a plurality of kernels and within each kernel the weights are based on different columns and rows. ]
wherein the kernel references an input channel and any one output channel of the input feature map.
[ (¶0248) “To process a kernel of a convolution layer, each value (i.e., each pixel) of the input feature at a first position (e.g., upper right corner, upper left corner, or some other position) is multiplied with each corresponding value of the kernel, and the products are summed to generate one output result.”
	This citation from Boesch teaches the pixel and kernel convolution operation with the added detail of it being output to one result (equivalent to one output channel). ]
As this claim is similar to claim 5, it is rejected for the same reasons, motivation and art. Please refer to claim 5 to see the motivation to combine.





Claims 11, 12, 23, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley/Zhang as applied above, and further in view of Sankaran (US 7391915 B1).


In regards to claim 11, The method of claim 9, is taught by Woolley/Zhang as seen in the rejection for claim 9 above. Zhang continues teaching the following limitations of the claim as seen below:
are respectively dispatched to a first decomposed sub-multiplier and a second decomposed sub-multiplier, in response to the first-bit feature map operands being k/2-bit operands and the second-bit weight operands being k-bit operands, and
[ (¶0181) “it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation) in the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores)”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two (or whatever size is requested) and are dispatched to perform the multiplication/convolution operations. ]
are respectively dispatched to a third decomposed sub-multiplier and a fourth decomposed sub-multiplier, in response to the first-bit feature map operands being k-bit operands and the second-bit weight operands being k/2-bit operands,
[ (¶0181) “it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation) in the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores)”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two (or whatever size is requested) and are dispatched to perform the multiplication/convolution operations. ]
However, what is not distinctly disclosed by Zhang and is subsequently taught by Sankaran is the following:
wherein: operand pairs, in which the first-bit feature map operands and most significant bits of k/2 bits in the second-bit weight operands are mapped to each other, and operand pairs, in which the first-bit feature map operands and least significant bits of k/2 bits in the second-bit weight operands are mapped to each other,
[ (Col. 8, Lines 4-15) “1) least signification 16-bits of the first operand times the least significant 16-bits of the second operand; 2) least signification 16-bits of the first operand times the most significant 16-bits of the second operand”
	This citation from Sankaran and the paragraph in general, teaches the multiplication mapping with least and most significant bits being assigned in a switching fashion. ]
operand pairs, in which most significant bits of k/2 bits in the first-bit feature map operands and the second-bit weight operands are mapped to each other, and operand pairs, in which least significant bits of k/2 bits in the first-bit feature map operands and the second-bit weight operands are mapped to each other,
[ (Col. 8, Lines 4-15)  “3) most signification 16-bits of the first operand times the least significant 16-bits of the second operand; and 4) most signification 16-bits of the first operand times the most significant 16-bits of the second operand.”
	This citation from Sankaran and the paragraph in general, teaches the new multiplication mapping with the least and most significant bits being assigned in a switching fashion. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang with the least/most significant bit operations as taught by Sankaran. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would provide a more efficient use of memory bandwidth. [ Sankaran (Abstract)]. This facilitates the obvious benefit of improved speed or reduced resource usage in regards to the memory for the neural network overall.




In regards to claim 12, The method of claim 9, is taught by Woolley/Zhang as seen in the rejection for claim 9 above. Zhang continues teaching the following limitations of the claim as seen below:
is respectively dispatched to the decomposed sub-multipliers, in response to the first-bit feature map operands and the second-bit weight operands being k-bit operands.
[ (¶0181) “it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation) in the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores)”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two (or whatever size is requested) and are dispatched to perform the multiplication/convolution operations. ]
However, what is not distinctly disclosed by Zhang and is subsequently taught by Sankaran is the following:
operand pairs, in which most significant bits and least significant bits of k/2 bits in the first-bit feature map operands and most significant bits and least significant bits of k/2 bits in the second-bit weight operands are mapped to each other
[ (Col. 8, Lines 4-15)  “1) least signification 16-bits of the first operand times the least significant 16-bits of the second operand;” … “4) most signification 16-bits of the first operand times the most significant 16-bits of the second operand.”
	This citation shows the most significant bits mapped to the most significant bits and the least significant bits mapped to the least significant bits as in the claim. ]
As this claim is similar to claim 11, it is rejected for the same reasons, motivation and art. Please refer to claim 11 to see the motivation to combine.


In regards to claim 23, The apparatus of claim 21, is taught by Woolley/Zhang as seen in the rejection for claim 21 above. Zhang continues teaching the following limitations of the claim as seen below:
are respectively dispatched to a first decomposed sub-multiplier and second decomposed sub-multiplier, in response to the first-bit feature map operands being k/2-bit operands and the second-bit weight operands being k-bit operands, and
[ (¶0181) “it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation) in the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores)”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two (or whatever size is requested) and are dispatched to perform the multiplication/convolution operations. ]
are respectively dispatched to a third decomposed sub-multiplier and a forth decomposed sub-multiplier, in response to the first-bit feature map operands being k-bit operands and the second-bit weight operands being k/2-bit operands.
[ (¶0181) “it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation) in the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores)”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two (or whatever size is requested) and are dispatched to perform the multiplication/convolution operations. ]
However, what is not distinctly disclosed by Zhang and is subsequently taught by Sankaran is the following:
Wherein: operand pairs, in which the first-bit feature map operands and most significant bits of k/2 bits in the second-bit weight operands are mapped to each other, and operand pairs, in which the first-bit feature map operands and least significant bits of k/2 bits in the second-bit weight operands are mapped to each other,
[ (Col. 8, Lines 4-15)  “1) least signification 16-bits of the first operand times the least significant 16-bits of the second operand; 2) least signification 16-bits of the first operand times the most significant 16-bits of the second operand”
	This citation from Sankaran and the paragraph in general, teaches the multiplication mapping with least and most significant bits being assigned in a switching fashion. ]
operand pairs, in which most significant bits of k/2 bits in the first-bit feature map operands and the second-bit weight operands are mapped to each other, and operand pairs, in which least significant bits of k/2 bits in the first-bit feature map operands and the second-bit weight operands are mapped to each other,
[ (Col. 8, Lines 4-15)  “3) most signification 16-bits of the first operand times the least significant 16-bits of the second operand; and 4) most signification 16-bits of the first operand times the most significant 16-bits of the second operand.”
	This citation from Sankaran and the paragraph in general, teaches the new multiplication mapping with the least and most significant bits being assigned in a switching fashion. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang with the least/most significant bit operations as taught by Sankaran. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would provide a more efficient use of memory bandwidth. [ Sankaran (Abstract) ]. This facilitates the obvious benefit of improved speed or reduced resource usage in regards to the memory for the neural network overall.





In regards to claim 24, The apparatus of claim 21, is taught by Woolley/Zhang as seen in the rejection for claim 21 above. Zhang continues teaching the following limitations of the claim as seen below:
is respectively dispatched to the decomposed sub-multipliers, in response to the first-bit feature map operands and the second-bit weight operands being k-bit operands.
[ (¶0181) “it is necessary to decompose and combine a large-size matrix multiplication (optionally, together with a convolution operation) in the neural network basic unit, which is referred to herein as a fully expanded operation; and after fully expansion, the neural network basic unit is decomposed into interconnections between basic module virtual entities (or referred to as virtual cores)”
	This citation shows an example where the virtual cores (decomposed sub-multipliers) from the processing unit are originally broken into two (or whatever size is requested) and are dispatched to perform the multiplication/convolution operations. ]
However, what is not distinctly disclosed by Zhang and is subsequently taught by Sankaran is the following:
wherein operand pairs, in which most significant bits and least significant bits of k/2 bits in the first-bit feature map operands and most significant bits and least significant bits of k/2 bits in the second-bit weight operands are mapped to each other
[ (Col. 8, Lines 4-15)  “1) least signification 16-bits of the first operand times the least significant 16-bits of the second operand;” … “4) most signification 16-bits of the first operand times the most significant 16-bits of the second operand.”
	This citation shows the most significant bits mapped to the most significant bits and the least significant bits mapped to the least significant bits as in the claim. ]
As this claim is similar to claim 23, it is rejected for the same reasons, motivation and art. Please refer to claim 23 to see the motivation to combine.



Claims 13 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley/Zhang as applied above, and further in view of Park ("Zero and data Reuse-aware Fast Convolution for Deep Neural Networks on GPU").


In regards to claim 13, The method of claim 1, is taught by Woolley/Zhang as seen in the rejection for claim 1 above. Park continues teaching the following limitations of the claim as seen below:
further comprising clock-gating a multiplication operation of a sub-multiplier to which a zero operand is dispatched, for zero skipping, in response to the zero operand being present in the m×n operand pairs.
[ (Abstract) “In order to exploit abundant zero weights, we propose a low-overhead and efficient hardware mechanism that skips multiplications that will always give zero results regardless of input data (called ZeroSkip)” ][ (Pg. 1, Col. 2) “we propose to detect zero input data at runtime and skip the execution of associated load and multiply instructions. For this purpose, we add a small hardware unit inside the GPU architecture to perform zero check and instruction flow control (i.e., program
counter control)”
	The above citations from Park teach the use of clock-gating (which examiner notes is stopping or skipping computations by signals sent to the processor) when there is a zero present in the operand.  ]
Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang with clock-gating operations as taught by Park. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would provide an enhanced performance for convolution algorithms. [ Park (Abstract) ]. This facilitates the obvious benefit of improved speed or reduced resource usage in regards to the memory for the neural network overall.



In regards to claim 25, The apparatus of claim 15, is taught by Woolley/Zhang as seen in the rejection for claim 15 above. Park continues teaching the following limitations of the claim as seen below:
wherein the processor is further configured to, clock-gate a multiplication operation of a sub-multiplier to which a zero operand is dispatched, for zero skipping, in response to the zero operand being present in the m×n operand pairs.
[ (Abstract) “In order to exploit abundant zero weights, we propose a low-overhead and efficient hardware mechanism that skips multiplications that will always give zero results regardless of input data (called ZeroSkip)” ][ (Pg. 1, Col. 2) “we propose to detect zero input data at runtime and skip the execution of associated load and multiply instructions. For this purpose, we add a small hardware unit inside the GPU architecture to perform zero check and instruction flow control (i.e., program
counter control)”
	The above citations from Park teach the use of clock-gating (which examiner notes is stopping or skipping computations by signals sent to the processor) when there is a zero present in the operand.  ]
Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang with clock-gating operations as taught by Park. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would provide an enhanced performance for convolution algorithms. [ Park (Abstract) ]. This facilitates the obvious benefit of improved speed or reduced resource usage in regards to the memory for the neural network overall.




Claims 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley/Zhang as applied above, and further in view of Gohkale ("Snowflake: An Efficient Hardware Accelerator for Convolutional Neural Networks") and Boesch (US 20180189642 A1).


In regards to claim 18, The apparatus of claim 17, is taught by Woolley/Zhang as seen in the rejection for claim 17 above. Gohkale teaches the following limitations as seen below:
wherein the second-bit weight operands are based on weights at corresponding locations in different kernels from among the kernels
[ (Pg. 2, Column 2, Paragraph 3) “These will be stored such that the first element in the weights array is the first pixel of the first red kernel’s first row. This is illustrated in figure 3b. The second element will be the first pixel of the second red kernel’s first row.”… “In this type of mapping, each MAC unit produces an output pixel.” (emphasis added)
	This citation from Gohkale teaches an embodiment of the weights array (equivalent to operands) containing values (equivalent to the weights) from kernels in a corresponding fashion where it would be the same location in each kernel for a plurality of kernels. ]
wherein the different kernels reference an input channel and different output channels of the input feature map.
[ (Pg. 1, Column 1, Paragraph 2) “CNNs use as input an image, with its red, green and blue channels split into three separate two dimensional input feature maps”
	In general, this paragraph under the header of “Introduction” goes over the process of convolution and explains how there is an explicit difference in the input and output channels. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration apparatus as taught by Woolley/Zhang with the corresponding kernel layering as taught by Gohkale. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the operational efficiency of the neural network [ Gohkale (Abstract) ]. This facilitates the obvious benefit of improved power or resource consumption for the same amount of work done and maintains times for the process overall.
What is not distinctly disclosed by Woolley/Zhang or Gohkale however, is the following which is taught by Boesch:
wherein the second-bit weight operands are based on weights at different locations in a kernel from among the kernels,
[ (Fig. 1E) and (¶0015) – (¶0018)
	The figure and paragraphs cited above teach a plurality of kernels and within each kernel the weights are based on different columns and rows. ]
wherein the kernel references an input channel and any one output channel of the input feature map.
[ (¶0248) “To process a kernel of a convolution layer, each value (i.e., each pixel) of the input feature at a first position (e.g., upper right corner, upper left corner, or some other position) is multiplied with each corresponding value of the kernel, and the products are summed to generate one output result.”
	This citation from Boesch teaches the pixel and kernel convolution operation with the added detail of it being output to one result (equivalent to one output channel). ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang, the kernel layering as taught by Gohkale, and the weight operations as taught by Boesch. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the speed and resource consumption of the neural network [ Boesch (¶0124) ]. This facilitates the obvious benefit of improved power or resource and an increase in speed provides faster results for the neural network overall.

In regards to claim 20, The apparatus of claim 19, is taught by Woolley/Zhang as seen in the rejection for claim 19 above. Gohkale teaches the following limitations as seen below:
wherein the second-bit weight operands are based on weights at corresponding locations in different kernels from among the kernels
[ (Pg. 2, Column 2, Paragraph 3) “These will be stored such that the first element in the weights array is the first pixel of the first red kernel’s first row. This is illustrated in figure 3b. The second element will be the first pixel of the second red kernel’s first row.”… “In this type of mapping, each MAC unit produces an output pixel.” (emphasis added)
	This citation from Gohkale teaches an embodiment of the weights array (equivalent to operands) containing values (equivalent to the weights) from kernels in a corresponding fashion where it would be the same location in each kernel for a plurality of kernels. ]
the second-bit weight operands are based on weights at corresponding locations in different kernels from among the kernels,
[ (Pg. 2, Column 2, Paragraph 3) “These will be stored such that the first element in the weights array is the first pixel of the first red kernel’s first row. This is illustrated in figure 3b. The second element will be the first pixel of the second red kernel’s first row.”… “In this type of mapping, each MAC unit produces an output pixel.” (emphasis added)
	This citation from Gohkale teaches an embodiment of the weights array (equivalent to operands) containing values (equivalent to the weights) from kernels in a corresponding fashion where it would be the same location in each kernel for a plurality of kernels. ]
wherein the different kernels reference an input channel and different output channels of the input feature map.
[ (Pg. 1, Column 1, Paragraph 2) “CNNs use as input an image, with its red, green and blue channels split into three separate two dimensional input feature maps”
	In general, this paragraph under the header of “Introduction” goes over the process of convolution and explains how there is an explicit difference in the input and output channels. ]
	Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration apparatus as taught by Woolley/Zhang with the corresponding kernel layering as taught by Gohkale. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the operational efficiency of the neural network [ Gohkale (Abstract) ]. This facilitates the obvious benefit of improved power or resource consumption for the same amount of work done and maintains times for the process overall.
	What is not distinctly disclosed by Woolley/Zhang or Gohkale and subsequently is taught by Boesch is seen below:
wherein the different kernels correspond to the different input channels and any one output channel, 
[ (¶0248) “To process a kernel of a convolution layer, each value (i.e., each pixel) of the input feature at a first position (e.g., upper right corner, upper left corner, or some other position) is multiplied with each corresponding value of the kernel, and the products are summed to generate one output result.”
	This citation from Boesch teaches the pixel and kernel convolution operation with the added detail of it being output to one result (equivalent to one output channel). ]
	 Therefore, it would be obvious to one of ordinary skill in the art, prior to the earliest effective filing date, to combine the convolution acceleration processes as taught by Woolley/Zhang, the kernel layering as taught by Gohkale, and the weight operations as taught by Boesch. It would be obvious to one of ordinary skill in the art, prior to the effective filing date, because it would greatly improve the speed and resource consumption of the neural network [ Boesch (¶0124) ]. This facilitates the obvious benefit of improved power or resource and an increase in speed provides faster results for the neural network overall.





Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL MERABI whose telephone number is (571)272-9685. The examiner can normally be reached Mon-Fri 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.A.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123