Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

OFFICE ACTION

					Claim Objection
	In claim 10, “the processing circuitry” lacks antecedent basis.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-27 are rejected under 35 U.S.C. 101 because the claimed invention, considered individually and as a combination, is directed to an abstract mathematical algorithm and/or mental process of data collection  without significantly more. 
The independent claims 1, 11, 20 and 24 recite, in parts,  device/method comprising, i.e.:
a local cache memory;
a memory, the memory storing input matrix A, input matrix A having
values to be used when processing instances of input data through a neural
network; and
a processor, the processor configured to:
generate a compiled representation that includes values for acquiring data from input matrix A when processing instances of input data through the neural network, the values including a base address in input matrix A for each thread from among a number of threads and relative offsets, the relative offsets being distances between elements of input matrix A to be processed by the threads; and
store, in the local cache memory, the compiled representation including the base address for each thread and the relative offsets.

Wherein independent claim 20 includes, in part,  the limitation:

acquire, from the local cache memory, the compiled representation; and 
process input matrix A by, for each thread from among the number of threads, using the compiled representation to identify locations in memory from which values in elements of input matrix A are to be acquired to be used, along with values from elements at corresponding locations from input matrix B, as inputs for one or more general matrix multiplication (GEMM) operations by that thread.


Similarly, independent claim 24 includes, in part, the limitation:

processing, by the processor, input matrix A by, for each thread from among the number of threads, using the compiled representation to identify locations in memory from which values in elements of input matrix A are to be acquired to be used, along with values from elements at corresponding locations from input matrix B, as inputs for one or more general matrix multiplication (GEMM) operations by that thread.

Considering these claims as a whole and individual elements, these claimed elements/limitations describe the concept of data collection base on mathematical operations/ algorithm, which corresponding to concepts identified  as abstract ideas by the court such as mathematical concepts/relationships. See Electric Power Group, LLC v. Alstom S.A., 119 U.S.P.Q.2d 1739, 1741. 
Although the claims recite elements such as memory/ cache memory, and processor, these elements are just generic computer implementation that also do not qualify as integration of the abstract idea into a practical application.
This judicial exception is not integrated into a practical application because the claim(s) do not include additional elements that are sufficient to provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.
Dependent claims, when combined with independent claims,  are also directed to non-statutory subject matter because dependent claims merely reciting an abstract mathematical concept/algorithm for collecting and analyzing data/information; therefore,  when combined with independent claims, do not cure the deficiency under 35 U.S.C. 101.
For further clarification see MPEP 7.05, 2106, USPTO Interim Guidance July 2015 Update Quick Reference Sheet (available from www.uspto.gov).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:

A person shall be entitled to a patent unless –

(a)(1) The claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
 	(a)(2) The claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 8-14, 18-20 and 24 are rejected under 35 U.S.C. 102(a) (1) being anticipated by the prior art of record Huynh (US 2021/0097375)
Regarding claim 1, the prior art discloses:
a local cache memory (par 124, 157);
a memory (see one or more of par 1, 21-22, 27, 58 and or one or more of fig 3-4, 8-9, 11), the memory storing input matrix A, input matrix A having values to be used when processing instances of input data through a neural network (see one or more of abstract, fig 3-7, 9-10); and
a processor (see processor one or more of fig 3-4, 7-11 and/or par 8-21), the processor configured to:
generate a compiled representation (i.e., one or more of output, output array , pattern, destination, coordinates, pixel image/values, convolution results/  output, multiplication results between each weight of filter, image index, reference location, region, information indicative of the destination addresses,  information indicative of the subset of input data elements, sums, entries ( disclosed in fig 2-7, 9-10) ) that includes values for acquiring data from input matrix A when processing instances of input data through the neural network, the values including a base address (see one or more of fig 3-6, 9-10) in input matrix A for each thread from among a number of threads (par 157) and relative offsets (see one or more of fig 4-6, par 25, 72-77, 79-83, 89-91, 107-108, 112-113, 134-135, 144-146), the relative offsets being distances between elements of input matrix A to be processed by the threads; and
store, in the local cache memory, the compiled representation (i.e., one or more of output, output array / tile, pattern, destination, coordinates, pixel image/values, convolution results/ array/ output, multiplication results between each weight of filter, image index, reference location, region, information indicative of the destination addresses,  information indicative of the subset of input data elements, sums ( disclosed in fig 2-7, 9-10)) including the base address for each thread and the relative offsets.
(Claim 2) wherein, when generating the compiled representation, the processor is configured to: compute the base address for each thread in input matrix A as a function of some or all of a thread identifier (ID) for that thread, dimensions of input matrix A and/or an output matrix C, properties of elements of input matrix A and/or output matrix C, and convolutional filter properties (par 17-20, 36-53, 57-58, 79, 94-98, 108-110, 127-129).
(Claim 3) wherein, when generating the compiled representation, the processor is configured to: compute the relative offsets as a function of some or all of dimensions of input matrix A and/or an output matrix C (par 21-22, 36, 47, 50, 52, 91, 96, 104, 110), properties of elements of input matrix A and/or output matrix C, and filter properties.
(Claim 4) wherein: the memory stores input matrix B, input matrix B having values to be used when processing instances of input data through a neural network (par 2-3, 17-21, 27-47); and the processor is further configured to: process input matrix A using each of the threads, the processing including using the compiled representation in the local cache memory to identify locations in memory from which values in elements of input matrix A are to be acquired to be used, along with values from elements at corresponding locations from input matrix B, as inputs for one or more general matrix multiplication GEMM operations by that thread (see matrix multiplication in par 2, 36, 58, matrix multiplication optimize memory access by splitting computation into smaller tiles (fig 4-6))
(Claim 8) wherein the values in input matrix A and input matrix B include input values and weights (par 2-3, 17-22, 27, 33, 36 and 47) respectively, associated with instances of input data to be used for processing the instances of input data through the neural network (see one or more of abstract, fig 3-7, 9-10).
(Claim 9) wherein the dimensions (see one or more of par 18, 21-22, 24-25, 36, 47, 99) of input matrix A and input matrix B are greater (in terms of one or more  of size expansion, multi-dimensional, size of summation, stride of the transposed convolution operation is larger than the dimension of the weight data, maximum size output data)  than dimensions used for the GEMM operations.

 (Claim 10) wherein the local cache memory is coupled to the processing circuitry via a fast-access interface (accelerator, acceleration engine (abstract, fig 4, 8)) that enables faster accesses than accesses of the memory.
Claims 11-14, 18-20, and 24 recite similar subject matter and are rejected for the same reason.

Claims  1-3, 10, 11-13  are rejected under 35 U.S.C. 102(a) (1) being anticipated by the prior art of record Kim (US 2020/0167943)
Regarding claim 1, the prior art discloses:
a local cache memory (par 57-58, fig 4-5);
a memory (fig 3-5), the memory storing input matrix A (par 88-89, 115, 139), input matrix A having values to be used when processing instances of input data through a neural network (see one or more of abstract, fig 1, 6, 9-10, par 49, 54, 68, 88, 110-118, 121-125, 134-137); and
a processor (see CPU/processor in par 20, 38-39, 54-55, 57-61, 65-66, 72, 77-78, 81, 95-102, 110, 163, 165) , the processor configured to:
generate a compiled representation (i.e., one or more of coordinates, mask, map, value, 3D model, image, global information, coverage information, result generated by ROP, ROL, plane object parameters, thread generates results, anchor ID, vectors, index, dimension/pyramid output (disclosed in fig 1-4, 6, 9-11) ) that includes values for acquiring data from input matrix A when processing instances of input data through the neural network, the values including a base address (par 66, 69, 73, 77-78, 91) in input matrix A for each thread from among a number of threads and relative offsets (fig 8, par 138-140), the relative offsets being distances between elements of input matrix A to be processed by the threads; and
store, in the local cache memory, the compiled representation (i.e., one or more of coordinates, mask, map, value, 3D model, image, global information, coverage information, result generated by ROP, ROL, plane object parameters, thread generates results, anchor ID, vectors, index, dimension/pyramid output (disclosed in fig 1-4, 6, 9-11) ) including the base address (par 66, 69, 73, 77-78, 91)  for each thread and the relative offsets (fig 8, par 138-140).
(Claims 2, 12) wherein, when generating the compiled representation, the processor is configured to: compute the base address for each thread in input matrix A as a function of some or all of a thread identifier (ID) for that thread, dimensions of input matrix A and/or an output matrix C (par 3, 5, 14, 52, 55, 72, 82, 84, 88-94, 113-137, 153-158), properties of elements of input matrix A and/or output matrix C, and convolutional filter properties (par 3, 5, 14, 52, 55, 72, 82, 84, 88-94, 113-137, 153-158).
(Claims 3, 13) wherein, when generating the compiled representation, the processor is configured to: compute the relative offsets as a function of some or all of dimensions of input matrix A and/or an output matrix C (par 5, 7, fig 8, par 88-90, 132, 134, 138-140, 155-158), properties of elements of input matrix A and/or output matrix C, and filter properties (par 5, 7, fig 8, par 88-90, 132, 134, 138-140, 155-158).
(Claim 10) wherein the local cache memory is coupled to the processing circuitry via a fast-access interface (high sped NV link interconnect (par 57, 99),
Memory interface 470 may implement 32-bit, 64-bit, 128-bit, 1024-bit data buses, or the like, for high-speed data transfer (par 74), high bandwidth access (par 93)) that enables faster accesses than accesses of the memory.

Allowable Subject Matter
Claims 5-7, 15-17, 21-23 and 25-27 would be allowable if the rejection(s)  under 35 USC  101 , set forth in this Office action is overcome and rewritten to include all of the limitations of the base claim and any intervening claims.
Claims 5-7, 15-17, 21-23 and 25-27 would be allowable because the prior art does not teach or suggest the limitation in claim 5 and similarly recited claims 15, 21, and 23

Correspondence Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL DINH whose telephone number is 571-272-1890.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s Supervisor, Jack Chiang can be reached on 571-272-7483.  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.	

/PAUL DINH/            Primary Examiner, Art Unit 2851