Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities: 
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“A first computer configured to” in claim 1.
“A control device configured to” in claim 1.
“A second computer configured to” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 3, 6, and 7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 3, “The second crossbar” lacks antecedent basis.

Claims 6 and 7 are rejected with respect to their dependence on claim 3.

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


Claim 17 rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  

Regarding claim 17, claiming the computation method of claim 9 does not further limit the claim. It is therefore an improper dependent claim.  	

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 4, 5, 8-11, 13, 14, 17, and 18 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mathematical calculations, which falls under the “Mathematical Concepts” grouping of abstract ideas.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: a first computer configured to perform a first operation on a first matrix M times to obtain a second matrix, wherein the first operation is performed by the Kth neural network layer, and wherein M is a positive integer not less than 1 (mathematical calculation), a second computer coupled to the first computer and configured to perform a second operation on the second matrix, wherein the second operation is performed by the (K+1)th neural network layer, and wherein K is a positive integer greater than or equal to 1, a control device, configured to:control the first computer to perform an ith first operation of the M first operations on the first matrix to obtain an ith data element of the second matrix, wherein 1≤i≤M (mathematical calculation).  Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “a computing device”, “a control device”, “neural network layer”, “storage unit”, and “data element”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 1 also recites additional elements “configured to: store the ith data element of the second matrix into a first storage unit” which amounts to gathering data which is insignificant extra-solution activity, “control the second computer to perform the second operation one time in response to data elements stored in the first storage unit being sufficient for performing the second operation on time” which is a generic function of a generic computer component, and “wherein the first operation is a convolution operation and the second operation is a convolution operation or a pooling operation” which amounts to selection of a data type which is insignificant extra-solution activity.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 10 which recites corresponding features.  Therefore, claims 1 and 10 recite an abstract idea which is a judicial exception.

Regarding Claim 2:  Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 2 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: “the third matrix is a matrix that is obtained after zero adding is performed on the second matrix while performing the second operation on the second matrix, wherein N=(h−1)×(W+p)+w, wherein h represents a quantity of rows of a kernel corresponding to the second operation, w represents a quantity of columns of the kernel corresponding to the second operation, W represents a quantity of columns of the second matrix, p represents a quantity of rows or a quantity of columns of elements 0 that are to be added to the second matrix to perform the second operation on the second matrix, and wherein h, w, p, W, and N are all positive integers not less than 1” (observation, evaluation and judgement as can be performed with pen and paper).  Therefore, claim 2 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 2 recites the additional elements introduced in claim 1.  Claim 2 recites additional elements “line buffer”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application. Claim 2 also recites additional elements, “wherein the N registers in the first line buffer sequentially store elements of a third matrix in row-major order or column-major order” which amounts to gathering data which is insignificant extra-solution activity.  Therefore, claim 2 is directed to a judicial exception.
Step 2B Analysis:  Claim 2 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 2 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 11 which recites corresponding features.  Therefore, claims 2 and 11 recite an abstract idea which is a judicial exception.


Regarding Claim 4:  Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 4 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: “perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, wherein the ith data element of the second matrix is located in a last column of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located;” (mathematical calculation), and “perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1” (mathematical calculation).  Therefore, claim 4 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 4 recites the additional elements introduced in claim 2.  However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application. Claim 4 also recites additional elements, “wherein the N registers in the first line buffer sequentially store elements of a third matrix in row-major order or column-major order” which amounts to gathering data which is insignificant extra-solution activity.  Claim 4 also recites additional elements “store, in at least one clock cycle of an(n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer” which amounts to gathering data which is insignificant extra-solution activity. Therefore, claim 4 is directed to a judicial exception.
Step 2B Analysis:  Claim 4 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 4 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 13 which recites corresponding features.  Therefore, claims 4 and 13 recite an abstract idea which is a judicial exception.

Regarding Claim 5:  Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 5 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: “perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, wherein the ith data element of the second matrix is located in a last row of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a column next to a column in which the ith data element is located;” (mathematical calculation), and “perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1” (mathematical calculation).  Therefore, claim 5 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 5 recites the additional elements introduced in claim 2.  However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application. Claim 5 also recites additional elements, “wherein the N registers in the first line buffer sequentially store elements of a third matrix in row-major order or column-major order” which amounts to gathering data which is insignificant extra-solution activity.  Claim 5 also recites additional elements “store, in at least one clock cycle of an(n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer” which amounts to gathering data which is insignificant extra-solution activity. Therefore, claim 5 is directed to a judicial exception.
Step 2B Analysis:  Claim 5 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 5 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 14 which recites corresponding features.  Therefore, claims 5 and 14 recite an abstract idea which is a judicial exception.

Regarding Claim 8:  Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 8 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  Therefore, claim 8 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 8 recites the additional elements introduced in claim 4.  However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 8 also recites additional elements “wherein t=(s−1)×(W+p)+(w−1), and the control device is configured to control, in the (n+1)th clock cycle to the (n+t)th clock cycle, the first line buffer to sequentially store (s−1)×(W+p)+(w−1) elements 0, wherein s represents a sliding step of the first operation.” which amounts to gathering data which is insignificant extra-solution activity. Therefore, claim 8 is directed to a judicial exception.
Step 2B Analysis:  Claim 8 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 8 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 17 which recites corresponding features.  Therefore, claims 8 and 17 recite an abstract idea which is a judicial exception.

Regarding Claim 9:  Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a computing device, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 9 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  Therefore, claim 9 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 9 recites the additional elements introduced in claim 1.  Claim 9 also recites additional elements “crossbar”.  However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application. Therefore, claim 9 is directed to a judicial exception.
Step 2B Analysis:  Claim 9 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 9 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 18 which recites corresponding features.  Therefore, claims 9 and 18 recite an abstract idea which is a judicial exception.


Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1, 2, 4, 5, 8-11, 13, 14, 17, and 18 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5, 7-12, 14, and 16-18 are rejected under 35 U.S.C. 102 as being unpatentable over Woolley (US 2016/0162402 A1). 

Regarding claim 1, Woolley teaches A computing device for neural network computation implemented in a neural network comprising a Kth neural network layer and a (K+1)th neural network layer, wherein the computing device comprises: ([¶0007] "In this regard, a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" ).
a first computer configured to perform a first operation on a first matrix M times to obtain a second matrix, wherein the first operation is performed by the Kth neural network layer, ([Abstract] "The pipeline then performs matrix multiplication operations between the image tile and a filter tile to generate a contribution of the image tile to an output matrix." [¶0009] "the convolution engine performs matrix multiplication operations between the image matrix and the filter stack. Notably, the dimensions of the image matrix" convolution layer interpreted as a Kth layer.).
and wherein M is a positive integer not less than 1; ([¶0015] "performing one or more matrix multiplication operations between the first image tile and a first filter tile." ).
a second computer coupled to the first computer and configured to perform a second operation on the second matrix, wherein the second operation is performed by the (K+1)th neural network layer, and wherein K is a positive integer greater than or equal to 1, (See FIG. 2 230 for cluster of general purpose computers. [¶0062] "In the context of FIG. 4, the streaming multiprocessor (SM) 310 is configured to perform a multi-convolution operation between the image batch 410 and the filter stack 440 to produce the output batch 470" See FIG. 3 streaming multiprocessor is on GPC. [¶0107] "The convolution engine divides the virtual image matrix into separate image tiles and then assigns the processing of each image tile to a different thread group." Image tile interpreted as synonymous with matrix. ).
a control device, configured to:control the first computer to perform an ith first operation of the M first operations on the first matrix to obtain an ith data element of the second matrix, wherein 1≤i≤M; and ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] "In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device).
a control device coupled to the first computer and configured to: store the ith data element of the second matrix into a first storage unit; and ([¶0015] "computing a first source address included in an image batch that is stored in a second memory based on the first start address and the first offset; copying data from the first source address to the first destination address" See also FIG. 2 202, 204.).
control the second computer to perform the second operation one time in response to data elements stored in the first storage unit being sufficient for performing the second operation on time, ([¶0043] " The task/work unit 207 receives tasks from the front end 212 and ensures that GPCs 208 are configured to a valid state before the processing task specified by each one of the TMDs is initiated. A priority may be specified for each TMD that is used to schedule the execution of the processing task." ).
wherein the first operation is a convolution operation and the second operation is a convolution operation or a pooling operation, or ([¶0017] "One advantage of the disclosed techniques is that applications may perform multi-convolution operations via an optimized matrix multiplication routinewhile optimizing parallel processing memory usage" [¶0062] "In the context of FIG. 4, the streaming multiprocessor (SM) 310 is configured to perform a multi-convolution operation between the image batch 410 and the filter stack 440 to produce the output batch 470" ). 

Regarding claim 2, Woolley teaches The computing device according to claim 1, wherein the first storage unit comprises a first line buffer, wherein the first line buffer comprises N registers, wherein the N registers in the first line buffer sequentially store elements of a third matrix in row-major order or column-major order, ([¶0009] " the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix" [¶0098] "In general, components included in the computer system 100 may store any of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 in any type of memory structure included in the PP memory... any number, including zero, of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 may be included in a frame buffer." Frame buffer is interpreted as synonymous with line buffer. ).
the third matrix is a matrix that is obtained after zero adding is performed on the second matrix while performing the second operation on the second matrix, wherein N=(h−1)×(W+p)+w, wherein h represents a quantity of rows of a kernel corresponding to the second operation, w represents a quantity of columns of the kernel corresponding to the second operation, W represents a quantity of columns of the second matrix, p represents a quantity of rows or a quantity of columns of elements 0 that are to be added to the second matrix to perform the second operation on the second matrix, and wherein h, w, p, W, and N are all positive integers not less than 1. ([¶0066] "For example, in some embodiments, the parameters 465 may include a padding height and a padding width. The padding height and the padding width append, respectively, rows of zeros and columns of zeros to output images" zero padding interpreted as synonymous with zero adding. ). 

Regarding claim 3, Woolley teaches The computing device according to claim 2, wherein the second crossbar is a crossbar, wherein X target registers of the N registers are directly connected to X rows of the crossbar respectively, wherein the X target registers are a [1+k×(W+p)]th register to a [w+k×(W+p)]th register of the N registers, wherein a value of k is a positive integer ranging from 0 to h−1, wherein X=h×w, and wherein the control device is configured to: (See FIG. 2 210.  [¶0046] "A given GPCs 208 may process data to be written to any of the DRAMs 220 within PP memory 204. Crossbar unit 210 is configured to route the output of each GPC 208 to the input of any partition unit 215 or to any other GPC 208 for further processing." FIG. 3 384 L1 cache interpreted as synonymous with register. The crossbar routing to any other GPC (which is shown to contain registers) is interpreted as synonymous with the crossbar directly connected to the X registers. ).
store the ith data element of the second matrix into the first line buffer; and ([¶0015] "computing a first source address included in an image batch that is stored in a second memory based on the first start address and the first offset; copying data from the first source address to the first destination address" See also FIG. 2 202, 204. ).
control the crossbar to operate and perform the second operation on data elements stored in the X target registers in response to the data elements currently stored in the X target registers being sufficient for performing the second operation. ([¶0050] "Operation of GPC 208 is controlled via a pipeline manager 305 that distributes processing tasks received from a work distribution unit (not shown) within task/work unit 207 to one or more streaming multiprocessors (SMs) 310. Pipeline manager 305 may also be configured to control a work distribution crossbar 330 by specifying destinations for processed data output by SMs 310." FIG. 3 384 L1 cache interpreted as synonymous with register. ). 

Regarding claim 5, Woolley teaches The computing device according to claim 2, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device).
wherein the ith data element of the second matrix is located in a last row of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a column next to a column in which the ith data element is located; and ([¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." Starting location of a next column interpreted as synonymous with first row of a matrix.  Limitation interpreted as synonymous with reading matrix in column-major order.).
perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix. ).
store, in at least one clock cycle of an (n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0081] "As outlined in conjunction with FIG. 5, while the serpentine pattern of each column is offset from the serpentine pattern of the other columns, the serpentine pattern represents a uniform sequence of offsets for every row of the virtual image matrix 510" [¶0082] "For example, the first column of the virtual image matrix 510 is associated with the source address sequence 0, 4, 12, 16, 26, 40, 48, 52, 72, 76, 84, and 88" [¶0098] " For example, any number, including zero, of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 may be included in a frame buffer." Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer. ). 

Regarding claim 7, Woolley teaches The computing device according to claim 3, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] "In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device).
wherein the ith data element of the second matrix is located in a last row of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a column next to a column in which the ith data element is located; and ([¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." Starting location of a next column interpreted as synonymous with first row of a matrix.  Limitation interpreted as synonymous with reading matrix in column-major order. ).
perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix.).
store, in at least one clock cycle of an (n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0081] "As outlined in conjunction with FIG. 5, while the serpentine pattern of each column is offset from the serpentine pattern of the other columns, the serpentine pattern represents a uniform sequence of offsets for every row of the virtual image matrix 510" [¶0082] "For example, the first column of the virtual image matrix 510 is associated with the source address sequence 0, 4, 12, 16, 26, 40, 48, 52, 72, 76, 84, and 88" [¶0098] " For example, any number, including zero, of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 may be included in a frame buffer." Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer. ). 

Regarding claim 8, the combination of Woolley, and Clemons teaches The computing device according to claim 4, wherein t=(s−1)×(W+p)+(w−1), and the control device is configured to control, in the (n+1)th clock cycle to the (n+t)th clock cycle, the first line buffer to sequentially store (s−1)×(W+p)+(w−1) elements 0, wherein s represents a sliding step of the first operation. (Woolley [¶0066] "the parameters 465 may include a padding height and a padding width. The padding height and the padding width append, respectively, rows of zeros and columns of zeros to output images" Any integer value of t is interpreted as conforming to the given equation. Appending rows in row-major order or appending columns in column-major order for zero padding is interpreted as synonymous with sequentially storing 0 in a line buffer.). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 4, 6, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley and in view of Clemons (US 2017/0004089 A1). 

Regarding claim 4, Woolley teaches The computing device according to claim 2, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] "In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device).
perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix. ).
store, in at least one clock cycle of an(n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer. ). However, Woolley does not explicitly teach wherein the ith data element of the second matrix is located in a last column of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located;  

Clemons who teaches a related art of accessing segmented multi-dimensional matrices teaches wherein the ith data element of the second matrix is located in a last column of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located; ([¶0047] "A patch may be specified by a data structure that identifies the patch relative to an origin of the digital image 300. The pixel data of the digital image 300 may be stored in row-major order in a contiguous group of memory addresses, either physical addresses or virtual addresses, and the patch data structure may include a first field that specifies an origin of the patch as a location of a particular pixel in the digital image 300" Starting location of a next row interpreted as synonymous with first column of a matrix.  Limitation interpreted as synonymous with reading matrix in row-major order.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the column-major CNN matrices in Woolley with the row-major matrix representations in Clemons. The substitution would have been obvious because Woolley teaches that column-major order is an appropriate storage type for the convolution operations ([¶0009] “the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix.”).  Furthermore, Clemons teaches ([¶0003] “In conventional systems, a digital image is often stored in memory in row-major or col-major order.”) Therefore the substitution would be obvious to one of ordinary skill in the art.  

Woolley teaches The computing device according to claim 3, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ; ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device).
perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix. ).
store, in at least one clock cycle of an (n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer.). However, Woolley does not explicitly teach wherein the ith data element of the second matrix is located in a last column of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located  

Clemons who teaches a related art of accessing segmented multi-dimensional matrices teaches wherein the ith data element of the second matrix is located in a last column of the second matrix, and an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located ([¶0047] "A patch may be specified by a data structure that identifies the patch relative to an origin of the digital image 300. The pixel data of the digital image 300 may be stored in row-major order in a contiguous group of memory addresses, either physical addresses or virtual addresses, and the patch data structure may include a first field that specifies an origin of the patch as a location of a particular pixel in the digital image 300" Starting location of a next row interpreted as synonymous with first column of a matrix.  Limitation interpreted as synonymous with reading matrix in row-major order.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the column-major CNN matrices in Woolley with the row-major matrix representations in Clemons. The substitution would have been obvious because Woolley teaches that column-major order is an appropriate storage type for the convolution operations ([¶0009] “the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix.”).  Furthermore, Clemons teaches ([¶0003] “In conventional systems, a digital image is often stored in memory in row-major or col-major order.”) Therefore the substitution would be obvious to one of ordinary skill in the art.  

Regarding claim 13, claim 13 effectively mirrors claim 4 and is therefore rejected under a similar interpretation.

Regarding claim 15, claim 15 effectively mirrors claim 6 and is therefore rejected under a similar interpretation.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Robinson (“EFFICIENT GAUSSIAN FILTERING USING CASCADED PREFIX SUMS”, 2012).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	
/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124