Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on April 7, 2022, in which claims 1-18 are amended. Claims 19-20 have been added.  Claims 1-20 are currently pending.

Specification
Applicant's amendments made to the specification are acknowledged. Examiner’s objection to the specification are hereby withdrawn, as necessitated by Applicant’s amendments made to the specification.

Response to Arguments
The rejections to claims 1-18 under 35 U.S.C. § 112(b)/(d)/(f) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-18 under 35 U.S.C. 101 based on amendment have been considered and are persuasive.  The rejections to claims 1-18 under 35 U.S.C. § 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-18 under 35 U.S.C. 102/103 based on amendment have been considered, however, are not deemed persuasive. 
Regarding Applicant's argument that Woolley does not teach a control device which controls computers to perform the neural network operations, Examiner respectfully disagrees.  Applicant admits that the processor controls operations of system components ([¶0040] "In operation, CPU 102 is the master processor of computer system 100, controlling and coordinating operations of other system components. In particular, CPU 102 issues commands that control the operation of PPU 202.") by writing TMD to a queue by way of the PPU front end 212, which controls the slave subsystems including a plurality of PPUs which contain a plurality of GPC's which perform the methods of the claimed invention.  Both the PPUs and GPC's are coupled to a plurality of memory units.  Woolley teaches ([¶0035] "the system memory 104 also includes any number of software applications that execute on the CPU 102, may issue commands that control the operation of the PPUs, and may leverage the convolution subsystem 180 to efficiently execute CNNs.").  As Woolley explicitly teaches that the software to execute the CNNs is performed on the CPU which further controls the subsytems, Examiner asserts that Woolley fully anticipates a plurality of computer subsystems controlled by a controller and attached to a plurality of memory to perform the described convolution operations.  The hardware implementation can be seen by the bidirectional connection between the CPU to the memory bridge and then to the parallel processing subsystem containing the PPUs, and then the further connection on FIG. 3 of the memory bridge to the I/O unit followed by the host interface 206.  Examiner asserts that the lack of mention of metadata or an intermediary step between the control CPU and the slave computers/processors in the claimed invention should not be interpreted as a novel feature.  As Applicant admits, Woolley teaches that the CPU is relied upon to control the devices (computers) which perform the neural network methods.  Woolley further teaches ([¶0036] "the parallel processing subsystem 112 may be integrated with the CPU 102 and other connection circuitry on a single chip to form a system on chip (SoC).").  It would be obvious to one of ordinary skill in the art that a system on a chip by name represents a larger, distributed system incorporated into a single chip.  Examiner interprets each of the computers in the system of the claimed invention to be a processor broadly connected to a memory, such that a multiprocessor system on a chip where each processor is connected to memory as demonstrated in Woolley is fully synonymous.  This is further supported by the instant specifications description of a computer system ([¶0158] "computer device (which may be a personal computer, a server, a network device, or the like)").  Woolley further teaches that each of the processing elements (computers) of the claimed invention performs the neural network tensor operations as described in the claimed invention.  For these reasons, the rejection is maintained.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 7-12, 14, and 16-18 are rejected under 35 U.S.C. 102 as being unpatentable over Woolley (US 2016/0162402 A1). 

	Regarding claim 1, Woolley teaches A computing device for neural network computation implemented in a neural network comprising a Kth neural network layer and a (K+1)th neural network layer, wherein the computing device comprises: ([¶0007] "In this regard, a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack")
	a first computer comprising: a first memory configured to store first instructions; and a first processor coupled to the first memory (FIG. 2 DRAM 220(0) interpreted as synonymous with first memory.  FIG. 2)
	configured to execute the first instructions to cause the first computer to to perform a first operation on a first matrix M times to obtain a second matrix, wherein the first operation is performed by the Kth neural network layer, ([Abstract] "The pipeline then performs matrix multiplication operations between the image tile and a filter tile to generate a contribution of the image tile to an output matrix." [¶0009] "the convolution engine performs matrix multiplication operations between the image matrix and the filter stack. Notably, the dimensions of the image matrix" convolution layer interpreted as a Kth layer.)
	and wherein M is a positive integer not less than 1; ([¶0015] "performing one or more matrix multiplication operations between the first image tile and a first filter tile.").
	a second computer coupled to the first computer and comprising: a second memory configured to store second instructions; and a second processor coupled to the second memory and configured to execute the second instructions to cause the second computer to (FIG. 2 DRAM 220(1) interpreted as synonymous with first memory coupled to GPC 208(1) interpreted as synonymous with second computer storing second instructions)
	perform a second operation on the second matrix, wherein the second operation is performed by the (K+1)th neural network layer, and wherein K is a positive integer greater than or equal to 1, (See FIG. 2 230 for cluster of general purpose computers. [¶0062] "In the context of FIG. 4, the streaming multiprocessor (SM) 310 is configured to perform a multi-convolution operation between the image batch 410 and the filter stack 440 to produce the output batch 470" See FIG. 3 streaming multiprocessor is on GPC. [¶0107] "The convolution engine divides the virtual image matrix into separate image tiles and then assigns the processing of each image tile to a different thread group." Image tile interpreted as synonymous with matrix.)
	a control device coupled to the first computer and the second computer  and comprising: ([¶0015] "computing a first source address included in an image batch that is stored in a second memory based on the first start address and the first offset; copying data from the first source address to the first destination address" See also FIG. 2 202, 204.)
	a third memory configured to store third instructions; and a third processor coupled to the third memory and configured to execute the third instructions to cause the control device to: (FIG. 2 DRAM 220(D-1) interpreted as synonymous with third memory coupled to GPC 208(C-1) interpreted as synonymous with third computer storing second instructions)
	control the first computer to perform an ith first operation of M first operations on the first matrix to obtain an ith data element of the second matrix, wherein the ith first operation comprises only a portion of a total number of the M first operations, and wherein 1<iKM: ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device)
	store the ith data element of the second matrix; and control, in response to the ith data element being stored, the second computer to perform the second operation one time before all of the M first operations are performed, wherein the ith data element is sufficient for performing the second operation one time ([¶0015] "copying data from the first source address to the first destination address; and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile" Copying data from the first source address to the first destination address interpreted as synonymous with storing the data element of the second matrix.  Performing the second operation one time interpreted as synonymous with performing one matrix multiplication operation as explicitly taught by Woolley.  The M first operations are interpreted as being the matrix multiplication operations of the output image and subsequent filter file in the neural network following the image and filter tile corresponding to the second matrix.  Woolley also teaches [¶0077] “Consequently, at any given point in time, the shared memory 382 includes the image tiles 542 that the SM 310 is currently processing, but does not necessarily include the image tiles 542 that the SM 310 has already processed or has not begun processing.” Which is interpreted as ensuring that the ith data element is sufficient for performing the second operation.).
	wherein either the first operation is a convolution operation and the second operation is a convolution operation or a pooling operation, or wherein the first operation is a pooling operation and the second operation is a convolution operation ([¶0017] "One advantage of the disclosed techniques is that applications may perform multi-convolution operations via an optimized matrix multiplication routine while optimizing parallel processing memory usage" [¶0062] "In the context of FIG. 4, the streaming multiprocessor (SM) 310 is configured to perform a multi-convolution operation between the image batch 410 and the filter stack 440 to produce the output batch 470" Woolley teaches performing multiple convolution operations in a row. ). 

	Regarding claim 2, Woolley teaches The computing device of to claim 1, wherein the ith data element is stored in a first storage unit, wherein the first storage unit comprises a first line buffer, wherein the first line buffer comprises N registers, wherein the N registers in the first line buffer sequentially store elements of a third matrix in row-major order or column-major order, ([¶0009] " the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix" [¶0098] "In general, components included in the computer system 100 may store any of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 in any type of memory structure included in the PP memory... any number, including zero, of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 may be included in a frame buffer." Frame buffer is interpreted as synonymous with line buffer.)
	the third matrix is a matrix that is obtained after zero adding is performed on the second matrix while performing the second operation on the second matrix, wherein N=(h−1)×(W+p)+w, wherein h represents a quantity of rows of a kernel corresponding to the second operation, w represents a quantity of columns of the kernel corresponding to the second operation, W represents a quantity of columns of the second matrix, p represents a quantity of rows or a quantity of columns of elements 0 that are to be added to the second matrix to perform the second operation on the second matrix, and wherein h, w, p, W, and N are all positive integers not less than 1. ([¶0066] " For example, in some embodiments, the parameters 465 may include a padding height and a padding width. The padding height and the padding width append, respectively, rows of zeros and columns of zeros to output images" zero padding interpreted as synonymous with zero adding. [¶0010] "suppose that the image width were W, the image height were H, the number of color planes per image were C, and the number of images in the image batch were N. Further, suppose that the dimensions of each of the output images were (P×Q). In such a scenario, the dimensions of the image matrix would be (N×P×Q)×(C×R×S)." With respect to Woolley NxPxQ=(h-1) and (CxRxS)=W. Woolley further explicitly teaches that a padding height and padding width are appended and that the operations are applied to a buffer in column-major form.). 

	Regarding claim 3, Woolley teaches The computing device of claim 2, further comprising a crossbar, wherein X target registers of the N registers are directly connected to X rows of the crossbar respectively, wherein the X target registers are a [1+k×(W+p)]th register to a [w+k×(W+p)]th register of the N registers, wherein a value of k is a positive integer ranging from 0 to h−1, wherein X=h×w, and wherein the control device is configured to: (See FIG. 2 210.  [¶0046] "A given GPCs 208 may process data to be written to any of the DRAMs 220 within PP memory 204. Crossbar unit 210 is configured to route the output of each GPC 208 to the input of any partition unit 215 or to any other GPC 208 for further processing." FIG. 3 384 L1 cache interpreted as synonymous with register. The crossbar routing to any other GPC (which is shown to contain registers) is interpreted as synonymous with the crossbar directly connected to the X registers.  Value of k seen as simply an iterative term, the substitution of k with (h-1) yields the exact equation given for N in the previous claim limitation.)
	store the ith data element of the second matrix into the first line buffer; and ([¶0015] "computing a first source address included in an image batch that is stored in a second memory based on the first start address and the first offset; copying data from the first source address to the first destination address" See also FIG. 2 202, 204.)
	control the crossbar to operate and perform the second operation on data elements stored in the X target registers in response to the data elements currently stored in the X target registers being sufficient for performing the second operation. ([¶0050] "Operation of GPC 208 is controlled via a pipeline manager 305 that distributes processing tasks received from a work distribution unit (not shown) within task/work unit 207 to one or more streaming multiprocessors (SMs) 310. Pipeline manager 305 may also be configured to control a work distribution crossbar 330 by specifying destinations for processed data output by SMs 310." FIG. 3 384 L1 cache interpreted as synonymous with register.). 

	Regarding claim 5, Woolley teaches The computing device of claim 2, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device)
	wherein the ith data element of the second matrix is located in a last row of the second matrix, and wherein an (i+1)th data element of the second matrix is located at a starting location of a column next to a column in which the ith data element is located; and ([¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." Starting location of a next column interpreted as synonymous with first row of a matrix.  Limitation interpreted as synonymous with reading matrix in column-major order.)
	perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix.)
	store, in at least one clock cycle of an (n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0081] "As outlined in conjunction with FIG. 5, while the serpentine pattern of each column is offset from the serpentine pattern of the other columns, the serpentine pattern represents a uniform sequence of offsets for every row of the virtual image matrix 510" [¶0082] "For example, the first column of the virtual image matrix 510 is associated with the source address sequence 0, 4, 12, 16, 26, 40, 48, 52, 72, 76, 84, and 88" [¶0098] " For example, any number, including zero, of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 may be included in a frame buffer." Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer.). 

	Regarding claim 7, Woolley teaches The computing device of claim 3, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device)
	wherein the ith data element of the second matrix is located in a last row of the second matrix, and wherein an (i+1)th data element of the second matrix is located at a starting location of a column next to a column in which the ith data element is located; and ([¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." Starting location of a next column interpreted as synonymous with first row of a matrix.  Limitation interpreted as synonymous with reading matrix in column-major order.)
	perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix.)
	store, in at least one clock cycle of an (n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0081] "As outlined in conjunction with FIG. 5, while the serpentine pattern of each column is offset from the serpentine pattern of the other columns, the serpentine pattern represents a uniform sequence of offsets for every row of the virtual image matrix 510" [¶0082] "For example, the first column of the virtual image matrix 510 is associated with the source address sequence 0, 4, 12, 16, 26, 40, 48, 52, 72, 76, 84, and 88" [¶0098] " For example, any number, including zero, of the image batch 410, the filter stack 440, the offset sequence 640, and/or the output matrix 860 may be included in a frame buffer." Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer.). 

	Regarding claim 8, the combination of Woolley, and Clemons teaches The computing device of claim 4, wherein t=(s−1)×(W+p)+(w−1), wherein the control device is configured to control, in the (n+1)th clock cycle to the (n+t)th clock cycle, the first line buffer to sequentially store (s−1)×(W+p)+(w−1) elements 0, and wherein s represents a sliding step of the first operation. (Woolley [¶0066] " the parameters 465 may include a padding height and a padding width. The padding height and the padding width append, respectively, rows of zeros and columns of zeros to output images" Any integer value of t is interpreted as conforming to the given equation. Appending rows in row-major order or appending columns in column-major order for zero padding is interpreted as synonymous with sequentially storing 0 in a line buffer.). 

	Regarding claim 9, Woolley teaches The computing device of claim 1, wherein the first computer or the second computer is a crossbar. ([¶0046] "GPCs 208 communicate with memory interface 214 via crossbar unit 210 to read from or write to various DRAMs 220" See also FIG. 2). 

Claims 10-12, 14, and 16-18 are substantially similar to claims 1-3, 5, and 7-9, respectively.  Therefore, the rejections applied to claims 1-3, 5, and 7-9, also apply to claims 10-12, 14, and 16-18.

Claims 19-20 are substantially similar to claims 1-2.  Therefore, the rejections applied to claims 1-2 also apply to claims 19-20.  

Claims 4, 6, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Woolley and in view of Clemons (US 2017/0004089 A1). 

	Regarding claim 4, Woolley teaches The computing device of claim 2, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device)
	perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix.)
	store, in at least one clock cycle of an(n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer.)
	However, Woolley does not explicitly teach wherein the ith data element of the second matrix is located in a last column of the second matrix, and wherein an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located.  

Clemons teaches wherein the ith data element of the second matrix is located in a last column of the second matrix, and wherein an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located; ([¶0047] "A patch may be specified by a data structure that identifies the patch relative to an origin of the digital image 300. The pixel data of the digital image 300 may be stored in row-major order in a contiguous group of memory addresses, either physical addresses or virtual addresses, and the patch data structure may include a first field that specifies an origin of the patch as a location of a particular pixel in the digital image 300" Starting location of a next row interpreted as synonymous with first column of a matrix.  Limitation interpreted as synonymous with reading matrix in row-major order.). 

	Woolley and Clemons are both directed towards accessing segmented multi-dimlensional matrices in a distributed system.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the column-major CNN matrices in Woolley with the row-major matrix representations in Clemons. The substitution would have been obvious because Woolley teaches that column-major order is an appropriate storage type for the convolution operations ([¶0009] “the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix.”).  Furthermore, Clemons teaches ([¶0003] “In conventional systems, a digital image is often stored in memory in row-major or col-major order.”) Therefore the substitution would be obvious to one of ordinary skill in the art.  

	Regarding claim 6, Woolley teaches The computing device of claim 3, wherein the control device is configured to: perform, in an nth clock cycle, the ith first operation on the first matrix to obtain the ith data element of the second matrix, ; ([¶0015] "The method includes selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address...and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile." [¶0040] " In particular, CPU 102 issues commands that control the operation of PPU 202" See also FIG. 1, FIG. 2.  CPU 102 interpreted as synonymous with control device)
	perform, in an (n+t)th clock cycle, an (i+1)th first operation of the M first operations on the first matrix, wherein t is a positive integer greater than 1; and ([¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" n is not bounded by the limitations.  Variable t is interpreted as necessarily being an integer value since a fractional clock cycle would not be understood by one of ordinary skill in the art.  Therefore the limitation is interpreted as simply being performed after the first operation for the second matrix.)
	store, in at least one clock cycle of an (n+1)th clock cycle and the (n+t)th clock cycle, an element 0 in the first line buffer. ([¶0007] "a CNN typically includes multiple “convolution layers,” where each convolution layer performs convolution operations across multiple dimensions of a sample data batch and multiple dimensions of a filter stack" [¶0009] "the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix." [¶0012] "One drawback of tile-based convolution engines, however, is that calculating the address sequence needed to load the image data in the correct order to expand a tile of the expanded image matrix involves performing a sequence of dependent integer operations. This sequence of integer operations typically requires a relatively large number of clock cycles to execute. Oftentimes, the number of clock cycles required to perform the integer operations can exceed the number of clock cycles required to perform the matrix multiplication operations" Woolley explicitly teaches placing 0 into virtual image matrix offset and further that the offset may be included in the frame buffer.  Frame buffer is interpreted as synonymous with line buffer.).
	However, Woolley does not explicitly teach wherein the ith data element of the second matrix is located in a last column of the second matrix, and wherein an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located  

Clemons teaches wherein the ith data element of the second matrix is located in a last column of the second matrix, and wherein an (i+1)th data element of the second matrix is located at a starting location of a row next to a row in which the ith data element is located ([¶0047] "A patch may be specified by a data structure that identifies the patch relative to an origin of the digital image 300. The pixel data of the digital image 300 may be stored in row-major order in a contiguous group of memory addresses, either physical addresses or virtual addresses, and the patch data structure may include a first field that specifies an origin of the patch as a location of a particular pixel in the digital image 300" Starting location of a next row interpreted as synonymous with first column of a matrix.  Limitation interpreted as synonymous with reading matrix in row-major order.). 

	Woolley and Clemons are both directed towards accessing segmented multi-dimlensional matrices in a distributed system.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the column-major CNN matrices in Woolley with the row-major matrix representations in Clemons. The substitution would have been obvious because Woolley teaches that column-major order is an appropriate storage type for the convolution operations ([¶0009] “the convolution engine converts the image batch into a column-major image matrix and expresses the filter stack as a filter matrix.”).  Furthermore, Clemons teaches ([¶0003] “In conventional systems, a digital image is often stored in memory in row-major or col-major order.”) Therefore the substitution would be obvious to one of ordinary skill in the art.  

	Claims 13 and 15 are substantially similar to claims 4 and 6, respectively.  Therefore, the rejections applied to claims 4 and 6 also apply to claims 13 and 15. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126