Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on June 15, 2022, in which claims 1-3, 5-8, 13, and 15 are currently amended. Claims 4 and 14 are canceled. Claims 1-3, 5-13, and 15 are currently pending. 

Response to Arguments
The rejections to claims 1-3, 5-13, and 15 with respect to a “weight parameter” under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejections to claims 1-3, 5-13, and 15 with respect to the column and row dimensions both comprising one or more channel, Examiner asserts that Applicant's arguments actually further reinforce the indefiniteness of the claim language.  Applicant asserts that the instant specification clearly states that the columns/column dimension correlates to exactly one channel, however the claim language states "respective operational parameters in each column of the operational parameter array being from different subsets of the set of kernels of the weight parameter respectively and having the same one or more channels" which appears to contradict the intended interpretation.  For this reason Examiner asserts that it is appropriate to maintain the rejection.  
Applicant’s arguments with respect to rejection of claims 1-3, 5-13, and 15 under 35 U.S.C. 101 based on amendment have been considered and are persuasive.  The rejection of claims 1-3, 5-13, and 15 under 35 U.S.C. 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-3, 5-13, and 15 under 35 U.S.C. 103(a) based on amendment have been considered, however, have not been deemed persuasive.  Applicant's arguments do not comply with 37 CFR 1.111(c) because they do not clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. Further, they do not show how the amendments avoid such references or objections.
With respect to Applicant's arguments that Yang does not teach the amended limitations of claim 1, Examiner respectfully disagrees.  Examiner asserts that the arguments are almost entirely directed towards the specification and not the claim language.  With respect to Applicant's argument that Yang does not teach a dimension of depth and a dimension of a number of kernels, Examiner respectfully disagrees.  Yang explicitly teaches that the convolutional layers are three-dimensional representations with a depth/kernel dimension ([p. 2 §2.1] "both image and kernels have the same depth dimension").  With respect to the differences between the partitioning of the convolutional layers in Yang and the claimed invention, Examiner respectfully disagrees.  Both arts are directed towards accelerating convolutional neural network calculations by parallelization which is achieved through partitioning convolution kernels in a variety of dimensions. For these reasons Examiner asserts that it is reasonable to maintain the rejection.  

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-3, 5-13, and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claims 1, 13, and 15, the limitation "respective operational parameters in each column of the operational parameter array being from different subsets of the set of kernels of the weight parameter respectively and having the same one or more channels" is indefinite.  The previous limitation suggests "respective operational parameters in each row of the operational parameter array being from a same subset of a set of kernels of the weighted parameter and having different channels respectively" but it would be unclear to one of ordinary skill in the art how both the column direction and row direction can both represent multiple channels unless the column direction and row direction were the same direction.  One of ordinary skill in the art would recognize that the channel/kernel direction in a convolutional neural network is in the depth direction/z-axis and that the corresponding elements in the X and Y directions correspond to the same channel.  In the interest of further examination "respective operational parameters in each column of the operational parameter array being from different subsets of the set of kernels of the weight parameter respectively and having the same one or more channels" is interpreted as "respective operational parameters in each column of the operational parameter array being from different subsets of the set of kernels of the weight parameter respectively and having the same channel".  

The remaining claims are rejected with respect to their dependence on the rejected claims. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

	Claims 1-3, 5-11, 13, and 15 are rejected under U.S.C. §103 as being unpatentable over the combination of Yang ("A Systematic Approach to Blocking Convolutional Neural Networks", 2016).

	 Regarding claim 1, Yang teaches A method for performing operations in a convolutional neural network, comprising:([Abstract] "This paper explores how to block CNN computations for memory locality by creating an analytical model for CNN-like loop nests. Using this model we automatically derive optimized blockings for common networks that improve the energy efficiency of custom hardware implementations by up to an order of magnitude")
	splitting a weight parameter of a selected layer in the convolutional neural network in at least one of dimension of depth and number of kernels to obtain an operational parameter array including a plurality of operational parameters,([p. 2 §2] "A convolutional layer (Conv) corresponds to a filter bank. In the standard case of 3D input and output, a convolutional layer maps a C×X×Y input to a K×X×Y output using K shift-invariant 3D stencils, where each stencil is of the size Fw×Fh×C (i.e., a set of K 3-dimensional convolutions). These K Fw×Fh×C stencil coefficients are the “weights” of the convolutional layer. Here, (X,Y) and (Fw,Fh) are the image and kernel width and height dimensions and both image and kernels have the same depth dimension, which we define as C, or the number of channels. Typically the dimensions of the kernels are much smaller than the image dimensions." [p. 4 §3.1] "The computation being performed by a convolutional layer can be easily expressed as a 6 layer loop nest as shown in Algorithm 1...blocking can be thought of as simply splitting a number of loops, and then exchanging the order in which these split loops are executed" See also Figure 1.  Splitting a kernel depthwise (splitting a weight parameter in a depth dimension) interpreted as synonymous with blocking along the channel dimension as described in Yang.)
	respective operational parameters in each row of the operational parameter array being from a same subset of a set of kernels of the weighted parameter and having different channels respectively, ([p. 5 §3.2] "Figure 1 demonstrates two levels of nested blocking for each dimension, and the associated buffers. The inner loop takes a small amount of input data with block size X0Y0C0 and convolves it with K0 kernels to create some partial outputs with block size X0Y0K0. A complete output cannot be generated until all the channels of the input are processed for that kernel and the output pixel is generated, which will happen only when all of the channels (C2 loop) finish." Yang explicitly teaches that each depthwise layer corresponds to a channel and shows depthwise blocking such that the depthwise direction can be considered the column axis.  Operational parameters interpreted as synonymous with block. Weight parameter interpreted as synonymous with filter bank.  With respect to figure 1 a row is interpreted as being in the depthwise direction.)
	and respective operational parameters in each column of the operational parameter array being from different subsets of the set of kernels of the weight parameter respectively and having the same one or more channels;([p. 2 §2.1] "Pooling and LRN layers have no learned parameters (weights)." [p. 5 §3.2] "A complete output cannot be generated until all the channels of the input are processed for that kernel and the output pixel is generated, which will happen only when all of the channels (C2 loop) finish...Figure 2: Multicore partitioning. Top: kernel partitioning broadcasts a shared input to separate cores, each of which processes a disjoint subset of the kernels to produce a disjoint slab of the output (in the K dimension)." With respect to figure 1 of Yang, X or Y direction is interpreted as synonymous with the column direction from the same subset of a set of kernels of the weighted parameter corresponding to one or more channel.)
	performing, by using each operational parameter in the operational parameter array, operations of the selected layer on data of input data for the selected layer that are in the channel corresponding to the channel of the operational parameter that is in use, to obtain a partial operation result array including a plurality of partial operation results; and([p. 4] "Figure 1: Hierarchical blocking of a single convolutional layer. The six-dimensional overall problem domain (X,Y,C,Fw,Fh,K) depicted in Figure 1 is blocked to three levels in the input domain ({X,Y,C}{0,1,2}), and two levels in the set of kernels (K) which correspond to the third dimension of the output domain ({X,Y}{0,1,2},{K}{0,1}). Partial results for each output pixel are accumulated hierarchically across the three levels of blocking in C")
	generating one or more output data of the selected layer based on the partial operational result array.([p. 4 §3] "Partial results for each output pixel are
accumulated hierarchically across the three levels of blocking in C")
	wherein splitting the weight parameter matrix comprises: splitting the weight parameter matrix in a case where the weight parameter matrix has a number of kernels greater than or equal to a first predetermined number, such that the operational parameter array obtained by the splitting has a number of rows equal to a multiple of the first predetermined number, ([p. 5 §3.3] "The first constraint is that we need to block the application such that the dimension being unrolled, e.g. Cp, is S times that of the previous level, Cp−1. The parallelism can be performed by partitioning the problem across the input XY, the kernels K, or the channels C" S interpreted as multiple of predetermined number (Cp-1).)
	wherein the first predetermined number is set according to the number of processors or processor cores used to process the operations in the convolutional neural network.([p. 5 §3.2] "Figure 2: Multicore partitioning. Top: kernel partitioning broadcasts a shared input to separate cores, each of which processes a disjoint subset of the kernels to produce a disjoint slab of the output (in the K dimension). Bottom: input partitioning broadcasts all kernels across cores which each process a different subset of the input to produce a disjoint subset of the output, shown here in the Y dimension.").

	 Regarding claim 2, Yang teaches The method of claim 1 wherein splitting the weight parameter comprises: splitting the weight parameter in a case where a size of the weight parameter exceeds a first threshold, such that each operational parameter in the operational parameter array obtained by the splitting has a size less than or equal to the first threshold.([p. 5 §3.2] "When a new C loop Ci is added, a series of images and kernels are streamed and Ci channels reductions are being performed on the same set of outputs. Therefore those partial outputs are being reduced Ci/Ci-1 times, and should be stored in a new output buffer to prevent these fetches from going to a larger memory at a higher level in the memory hierarchy" First threshold interpreted as Ci-1 such that if Ci is equal to Ci-1 no splitting occurs.).

	 Regarding claim 3, Yang teaches The method of claim 1 wherein splitting the weight parameter comprises: splitting the weight parameter in a case where a number of kernels of the weight parameter exceeds a second threshold, such that each operational parameter in the operational parameter array obtained by the splitting has a number of kernels less than or equal to the second threshold.([p. 5 §3.2] "When a new C loop Ci is added, a series of images and kernels are streamed and Ci channels reductions are being performed on the same set of outputs. Therefore those partial outputs are being reduced Ci/Ci-1 times, and should be stored in a new output buffer to prevent these fetches from going to a larger memory at a higher level in the memory hierarchy" [p. 5 §3.2] "Suppose we apply parallelism for S cores at a given level p by unrolling that loop p across the processors. The first constraint is that we need to block the application such that the dimension being unrolled, e.g. Cp, is S times that of the previous level, Cp􀀀1. The parallelism can be performed by partitioning the problem across the input XY, the kernels K, or
the channels C" First threshold interpreted as Ci-1 such that if Ci is equal to Ci-1 no splitting occurs.  Yang explicitly teaches that the partitioning may occur as a function of kernels.  The second threshold is interpreted as Ki-1 such that if Ki=Ki-1 no splitting occurs.).

	 Regarding claim 4, Yang teaches The method of claim 1 wherein splitting the weight parameter comprises: splitting the weight parameter in a case where the weight parameter has a number of kernels greater than or equal to a first predetermined number, such that the operational parameter array obtained by the splitting has a number of rows equal to a multiple of the first predetermined number.([p. 5 §3.3] "The first constraint is that we need to block the application such that the dimension being unrolled, e.g. Cp, is S times that of the previous level, Cp−1. The parallelism can be performed by partitioning the problem across the input XY, the kernels K, or the channels C" S interpreted as multiple of predetermined number (Cp-1).).

	 Regarding claim 5, Yang teaches The method of claim 1 wherein splitting the weight parameter comprises: splitting the weight parameter in a case where the weight parameter has a number of channels exceeding a third threshold, such that each operational parameter in the operational parameter array obtained by the splitting has a number of channels less than or equal to the third threshold.([p. 5 §3.2] "When a new C loop Ci is added, a series of images and kernels are streamed and Ci channels reductions are being performed on the same set of outputs. Therefore those partial outputs are being reduced Ci/Ci-1 times, and should be stored in a new output buffer to prevent these fetches from going to a larger memory at a higher level in the memory hierarchy" [p. 5 §3.2] "Suppose we apply parallelism for S cores at a given level p by unrolling that loop p across the processors. The first constraint is that we need to block the application such that the dimension being unrolled, e.g. Cp, is S times that of the previous level, Cp􀀀1. The parallelism can be performed by partitioning the problem across the input XY, the kernels K, or
the channels C" Third threshold interpreted as Cp-1 such that if Cp is equal to Cp-1 no splitting occurs.  Yang explicitly teaches that the partitioning may occur as a function of channels (Cp = Cp-1).).

	 Regarding claim 6, Yang teaches The method of claim 1 wherein splitting the weight parameter comprises: splitting the weight parameter in a case where the weight parameter has a number of channels greater than or equal to a second predetermined number, such that the operational parameter array obtained by the splitting has a number of columns equal to a multiple of the second predetermined number.([p. 5 §3.3] "The first constraint is that we need to block the application such that the dimension being unrolled, e.g. Cp, is S times that of the previous level, Cp−1. The parallelism can be performed by partitioning the problem across the input XY, the kernels K, or the channels C" S interpreted as multiple of predetermined number (Cp-1).).


	 Regarding claim 7, Yang teaches The method of claim 1 wherein splitting the weight parameter comprises: when the selected layer receives a plurality of partial input data, any two of which do not have the same channel, and the plurality of partial input data collectively correspond to a complete input data of the selected layer, ([p. 5 §3.2] "The inner loop takes a small amount of input data with block size X0Y0C0 and convolves it with K0 kernels to create some partial outputs with block size X0Y0K0. A complete output cannot be generated until all the channels of the input are processed for that kernel and the output pixel is generated" Small amount of input data interpreted as synonymous with plurality of partial input data.  If block X0 and Y0 are 1 (which is explicitly taught in Algorithm 1) then any two of the partial input data would not have the same channel.  Therefore Yang explicitly teaches receiving a plurality of partial input data of which any two do not have the same channel.)
	then the weight parameter is split according to each partial input data such that the operational parameter array obtained by the splitting has a number of columns equal to the number of the received plurality of partial input data, and all the operational parameters in each column correspond to the same one or more channels as one of the plurality of partial input data.([p. 5 §3.2] "The inner loop takes a small amount of input data with block size X0Y0C0 and convolves it with K0 kernels to create some partial outputs with block size X0Y0K0. A complete output cannot be generated until all the channels of the input are processed for that kernel and the output pixel is generated").

	 Regarding claim 8, Yang teaches The method of claim 1 wherein splitting the weight parameter further comprises: subdividing at least a row and/or column of the operational parameter array in at least one of dimensions of depth and number of kernels when the row and/or column includes an operational parameter having a size exceeding a first threshold, such that each operational parameter in the operational parameter array obtained by the subdividing has a size less than or equal to the first threshold.([p. 5 §3.3] "The first constraint is that we need to block the application such that the dimension being unrolled, e.g. Cp, is S times that of the previous level, Cp−1. The parallelism can be performed by partitioning the problem across the input XY, the kernels K, or the channels C" S interpreted as multiple of predetermined number (Cp-1).  See also figure 1.  Cp interpreted as operational parameter exceeding the threshold.  Cp-1 interpreted as synonymous with operational parameter having a size less than Cp.).

	 Regarding claim 9, Yang teaches The method of claim 1 wherein each partial operation result in the partial operation result array corresponds to one output data of the selected layer.([p. 4] "Partial results for each output pixel are accumulated hierarchically across the three levels of blocking in C" Output pixel interpreted as synonymous with one output data of the selected layer.).

	 Regarding claim 10, Yang teaches The method of claim 1 where generating the output data comprises: compressing the partial operation result array into one column by adding up all the partial operation results in each row of the partial operation result array in a point-to-point manner when the partial operation result array includes a plurality of columns, each partial operation result in the compressed partial operation result array corresponding to an output data of the selected layer.([p. 2] "Partial results for each output pixel are accumulated hierarchically across the three levels of blocking in C" [p. 5 §3.2] "The inner loop takes a small amount of input data with block size X0Y0C0 and convolves it with K0 kernels to create some partial outputs with block size X0Y0K0. A complete output cannot be generated until all the channels of the input are processed for that kernel and the output pixel is generated, which will happen only when all of the channels (C2 loop) finish" For the iteration of X0=1 partial output is compressed into a single column.  Yang explicitly teaches that the partial results are accumulated (summed) from the channels (rows).).

	 Regarding claim 11, Yang teaches The method of claim 1 wherein generating the output data comprises: compressing the partial operation result array into one row by combining all the partial operation results in each column of the partial operation result array in the depth direction when the partial operation result array includes a plurality of rows, each partial operation result in the compressed partial operation result array corresponding to an output data of the selected layer.([p. 2] "Partial results for each output pixel are accumulated hierarchically across the three levels of blocking in C" [p. 5 §3.2] "The inner loop takes a small amount of input data with block size X0Y0C0 and convolves it with K0 kernels to create some partial outputs with block size X0Y0K0. A complete output cannot be generated until all the channels of the input are processed for that kernel and the output pixel is generated, which will happen only when all of the channels (C2 loop) finish" For the iteration of X0=1 partial output is compressed into a single row.  See Figure 1 of how the partial operation results correspond to the output.).

	Regarding claims 13 and 15, claims 13 and 15 are directed towards an apparatus for performing the method of claim 1.  Therefore, the rejection applied to claim 1 also applies to claims 13 and 15.  Claims 13 and 15 also mention additional elements including a processor to perform the method ([p. 1 §1] "Early attempts [20, 1, 24, 2] to optimize CPU and GPU CNN implementations treated the convolutional layers as matrix multiplication and used an optimized BLAS matrix matrix-multiplication (GEMM) routine") as well as memory to store the instructions performed by the processor ([p. 1 §1] "the design of the memory hierarchy and how the data is choreographed has a dramatic effect on the energy required for the computation.").

	Claim 12 is rejected under U.S.C. §103 as being unpatentable over the combination of Yang and Stanford (“CS231n Convolutional Neural Networks for Visual Recognition”, 2015).

	 Regarding claim 12, Yang teaches The method of claim 1.
	However, Yang doesn't explicitly teach generating the output data comprises: generating an output data of the selected layer by adding up all the partial operation results in each row of the partial operation result array in a point-to-point manner and then combining, in the depth direction, all the partial operation results in each column of the partial operation result array compressed by the adding up, or by combining all the partial operation results in each column of the partial operation result array in the depth direction and then adding up all the partial operation results in each row of the partial operation result array compressed by the combining in a point-to-point manner, when the partial operation result array includes a plurality of rows and a plurality of columns..

	Stanford, in the same field of endeavor, teaches generating the output data comprises: generating an output data of the selected layer by adding up all the partial operation results in each row of the partial operation result array in a point-to-point manner and then combining, in the depth direction, all the partial operation results in each column of the partial operation result array compressed by the adding up, or by combining all the partial operation results in each column of the partial operation result array in the depth direction and then adding up all the partial operation results in each row of the partial operation result array compressed by the combining in a point-to-point manner, when the partial operation result array includes a plurality of rows and a plurality of columns.([p. 11] "The visualization below iterates over the output activations (green), and shows that each element is computed by elementwise multiplying the highlighted input (blue) with the filter (red), summing it up, and then offsetting the result by the bias." See FIG. on p. 12.).

	Yang and Stanford are both directed towards accelerating convolutional neural networks. Therefore, Yang and Stanford are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Yang with the teachings of Stanford.  While the content on the Stanford convnet course website would be considered to be well-understood to one of ordinary skill in the art, a motivation for combination with regards to matrix factorization of convolutional neural networks has been provided ([p. 13 "Implementation as Matrix Multiplication"] "the benefit is that there are many very efficient implementations of Matrix Multiplication that we can take advantage of (for example, in the commonly used BLAS API). Moreover, the same im2col idea can be reused to perform the pooling operation").  

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124