Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action


1.	The Examiner acknowledges the applicant’s amendment filed April 22, 2022.  At this point claims 1-12, 14-22, 24-27 are pending in the instant application and ready for examination by the Examiner.

Abstract
2.	The abstract is objected to based on 37 C.F.R. 1.71 (f). The specification must commence on a separate sheet. Each sheet including part of the specification may not include other parts of the application or other information. The claim(s), abstract and sequence listing (if any) should not be included on a sheet including any other part of the application.

CLAIM INTERPRETATION


3. 	The following is a quotation of 35 U.S.C. 112(f):

(f) ELEMENT IN CLAIM FOR A COMBINATION – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

 	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP §2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B)    the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C)    the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Such claim limitation(s) is/are: ‘matrix processing unit (MPU)’ in claims 1, 4-6, 9-10, 12, 20, 22 and 24
Such claim limitation(s) is/are: ‘master control central processor (MCC)’ in claims 1-4, 6, 20 and 24.
Such claim limitation(s) is/are: ‘host device’ in claims 1, 3 and 22.
Such claim limitation(s) is/are: ‘system’ in claims 24, 26-27.
Such claim limitation(s) is/are: ‘host processor’ in claims 3, 10 and 17.
Such claim limitation(s) is/are: ‘learning processor’ in claims 24.
Such claim limitation(s) is/are: ‘super memory block’ in claim 12.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
4.	The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 1, 4-6, 9-10, 12, 20, 22 and 24 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
In Reference to Claim(s) 1, 4-6, 9-10, 12, 20, 22 and 24, these claim(s) recites a Matrix Processing Units (MPUs). The instant specification does not provide an explicit definition for these elements. Therefore, these elements were not described with sufficient structure.
In Reference to Claim(s) 1-4, 6, 20 and 24, these claim(s) recites a master control central processor (MCC). The instant specification does not provide an explicit definition for these elements. Therefore, these elements were not described with sufficient structure.

Claim Rejections - 35 USC § 112b
5.	The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1, 4-6, 9-10, 12, 20, 22 and 24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AlA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AlA35 U.S.C. 112, the applicant), regards as the invention.
Claim limitations “MCC’,” MPU”, or “Super Memory block (SMB)” invokes 35 U.S.C. 112(f) or pre-AlA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire


Claim Rejections - 35 USC § 101

6.	35 U.S.C. 101 reads as follows:

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-12, 14-22, 24-27 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more due. 
When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four categories of invention, i.e., Process, machine, manufacture or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). 

Step 1
According to the first part of the analysis, in the instant application, claims 1-12, 14-21 are directed to an apparatus. Claim 22 is directed to a method. Claims 24-27 are directed to a system.  Thus each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture or composition of matter). These claims disclose a process. 

Claim 1
An apparatus comprising:
a network of matrix processing units (MPUs), wherein each MPU is connected to at least one other MPU in the network, and each MPU 1s to perform matrix multiplication operations;
a memory to store tensor data; and
a master control central processing unit (MCC) to:
obtain an instruction from a host device, wherein the instruction includes one or more tensor operands based on the tensor data;
partition the tensor data:
distribute the partitioned tensor data to one or more MPUs in the network of MPUs:
invoke a set of operations on the one or more MPUs based on the instruction, wherein the set of operations includes operations on the tensor operands; and
output a result of the set of operations, wherein the result includes a tensor value.

Claim 22
A method comprising: 
storing tensor data in memory, wherein the memory is accessible to a network of matrix processing units (MPUs); 
obtaining an instruction from a host device, wherein the instruction includes one or more tensor operands based on the tensor data; 
partitioning the tensor data; 
distributing the partitioned tensor data to one or more MPUs in the network of MPUs: 
causing a set of operations to be performed by the one or more of the MPUs based on the instruction, wherein the set of operations include operations on the tensor operands; and 
generating a result from performance of the set of operations, wherein the result includes a tensor value.

Claim 24
A system to implement a deep learning processor, the system comprising: 
a port to connect to a host processor; 
a plurality of interconnected matrix processing units (MPUs), wherein each MPU includes circuitry to perform tensor arithmetic operations; 
a memory to store tensor data; and 
a master control central processing unit (MCC) to: 
obtain an instruction from the host processor, the instruction including one or more tensor operands based on the tensor data;
partition the tensor data: 
distribute the partitioned tensor data to one or more MPUs in the network of MPUs; 	cause the one or more of the MPUs to perform a set of operations based on the instruction, wherein the set of operations include operations on the tensor operands; and 
return a result of the set of operations to the host processor, wherein the result includes a tensor value connected to the host.

Step 2A, Prong 1 
Following the determination of whether or not the claims fall within one of the four categories (Step 1), it must be determined if the claims recite a judicial exception  (e.g. mathematical concepts, mental processes, certain methods of organizing human activity) (Step 2A, Prong 1), In this case, the claims are determined to recite a judicial exception as explained below. The mental process is disclosed in the claims. 

Claim(s) 1, 22 and 24 substantially recite
partition the tensor data: distribute the partitioned tensor data to one or more MPUs in the network of MPUs(Claim 1)
partitioning the tensor data; 
distributing the partitioned tensor data to one or more MPUs in the network of MPUs: (Claim 22)
partition the tensor data: 
distribute the partitioned tensor data to one or more MPUs in the network of MPUs; (Claim 24)
A human can segment data into smaller sizes and allocate the smaller data to different processors (MPU). This limitations are directed to the abstract  idea of a mental process MPEP 2106.04(a)(2)

Claim(s) 16 further recite
…wherein the set of operations includes a max pooling operation. (Claim 16)
	Finding the highest value of a subset matrix is merely a mathematical algorithm and thus is abstract. 
This is a mathematical concept and is considered abstract. 

Claim(s) 17 further recite
…wherein the set of operations includes performing a Winograd transformation on the operands and performing a matrix multiplication on the operands transformed by the Winograd transformation. (Claim 17)
	A Winograd transformation can be viewed as a filter which data is passed through. As such, this falls under the abstract domain of a mathematical concept. 
This is a mathematical concept and is considered abstract. 

Step 2A, Prong two: 
Following the determination that the claims recite a judicial exception, it must be determined if the claims recite additional elements that integrate the exception into a practical application of the exception (Step 2A, Prong 2). In this case, after considering all the claims individually and as an ordered combination, it is determined that the claims do not include additional elements that integrate the exception into a practical application of the exception as explained below. 

Claim(s) 24 further recite
…a port to connect to a host processor; (Claim 24)

Having a portal to a host computer is not considered a practical application. This is understood to be generic computer equipment. See MPEP 2106.05(f).
This is an additional element and does not integrate into practical application.

Claim(s) 1 and 24 further recite
….a network of matrix processing units (MPUs), wherein each MPU is connected to at least one other MPU in the network, and each MPU 1s to perform matrix multiplication operations; (Claim 1)
….a plurality of interconnected matrix processing units (MPUs), wherein each MPU includes circuitry to perform tensor arithmetic operations; (Claim 24)
A network of processing units or processors as a network is merely a design option and is not considered a practical application. This is an additional element and does not integrate into practical application.

Claim(s) 1 and 24 further recite
…a memory to store tensor data; and(Claim 1)
…storing tensor data in memory, wherein the memory is accessible to a network of matrix processing units (MPUs); (Claim 22)
…a memory to store tensor data; and (Claim 24)
Storing data of a specific type is not considered a practical application. Storing information, which is well-understood, routine, conventional as evidenced by MPEP 2106.05(d), section II, list 1, example iv. 
This is an additional element and does not integrate into practical application.

Claim(s) 1 and 24 further recite
…a master control central processing unit (MCC) to: (Claim 1)
…a master control central processing unit (MCC) to: (Claim 24)
	Having a processing unit that controls other units is not considered a practical application but merely fulfilling a designed function. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
This is an additional element and does not integrate into practical application.

Claim(s) 1, 22 and 24 further recite
…obtain an instruction from a host device, wherein the instruction includes one or more tensor operands based on the tensor data; (Claim 1)
…obtaining an instruction from a host device, wherein the instruction includes one or more tensor operands based on the tensor data; (Claim 22)
…obtain an instruction from the host processor, the instruction including one or more tensor operands based on the tensor data; (Claim 24)
Receiving computer code to implement tensor operations on tensor data is merely a computer performing its designed purpose is not a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
This is an additional element and does not integrate into practical application.

Claim(s) 1, 22 and 24 further recite
…invoke a set of operations on the one or more MPUs based on the instruction, wherein the set of operations includes operations on the tensor operands; and(Claim 1)
…causing a set of operations to be performed by the one or more of the MPUs based on the instruction, wherein the set of operations include operations on the tensor operands; and (Claim 22)
…cause the one or more of the MPUs to perform a set of operations based on the instruction, wherein the set of operations include operations on the tensor operands; and (Claim 24)
	Specifying a specific operation to be performed on data is not considered a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
This is an additional element and does not integrate into practical application.

Claim(s) 1, 22 and 24 further recite
…output a result of the set of operations, wherein the result includes a tensor value. (Claim 1)
…generating a result from performance of the set of operations, wherein the result includes a tensor value. (Claim 22)
…return a result of the set of operations to the host processor, wherein the result includes a tensor value connected to the host. (Claim 24)
Producing a result from input data and calculations is not considered a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 2 further recite
…wherein the MCC is further to send the result for storage in memory, wherein the result is stored as a tensor value in memory. (Claim 2)
Saving information is not considered a practical application. Storing information, which is well-understood, routine, conventional as evidenced by MPEP 2106.05(d), section II, list 1, example iv. 
This is an additional element and does not integrate into practical application.

Claim(s) 3 and 25 further recite
…wherein the MCC sends the result to the host device, and the host device includes a host processor connected to the MCC. (Claim 3)
…including the host processor. (Claim 25)
	Sending information to a specific device or component is not considered a practical application. Receiving or transmitting information, which is well-understood, routine, conventional as evidenced by MPEP 2106.05(d), section II, list 1, example i.
	This is an additional element and does not integrate into practical application.

Claim(s) 4 further recite
…wherein the network of MPUs includes a plurality of MPUs, and the MCC is to select a subset of the plurality of MPUs to perform the set of operations. (Claim 4)
Selecting a destination to send data or information to is not considered a practical application. Receiving or transmitting information, which is well-understood, routine, conventional as evidenced by MPEP 2106.05(d), section II, list 1, example i.
	This is an additional element and does not integrate into practical application.

Claim(s) 5 further recite
…wherein the subset of MPUs includes two or more of the MPUs. (Claim 5)
A plurality of processors is not considered a practical application. This is understood to be generic computer equipment. See MPEP 2106.05(f).
	This is an additional element and does not integrate into practical application.

Claim(s) 6 further recite
…wherein the instruction includes a stream of instructions and the MCC is to coordinate data flow and a sequence of operations to be performed by the network of MPUs based on the stream of operations. (Claim 6)
A specific type of instructions does not comprise a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 7 further recite
…wherein the sequence of operations includes a sequence of tensor arithmetic operations. (Claim 7)
A specific type of instructions does not comprise a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 8 further recite
…wherein the sequence of tensor operations includes matrix-matrix operations. (Claim 8)
A specific type of instructions does not comprise a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 9 further recite
…wherein the memory includes a memory resource block to be shared by two or more MPUs in the network of MPUs. (Claim 9)
A design feature of memory being shared by more than one processor is not considered a practical application. 
	This is an additional element and does not integrate into practical application.

Claim(s) 10 further recite
…wherein invoking the set of operations includes pointing one or more of the MPUs to the memory resource block to access the tensor data. (Claim 10)
	The use of pointers to access memory is not a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 11 further recite
…wherein the set of operations include at least one of a row/column broadcast, block shifting, matrix copy, matrix transpose, and matrix expansion. (Claim 11)
Matrix multiplication is not considered a practical application. A specific type of instructions does comprise a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 12 further recite
…wherein the memory includes a super memory block (SMB) to group a plurality of memory resource blocks, and two or more MPUs in the network of MPUs have read/write access to the plurality of memory resource blocks in the SMB. (Claim 12)
Employing a specific of memory remains memory and does not suggest a practical application. Storing information, which is well-understood, routine, conventional as evidenced by MPEP 2106.05(d), section II, list 1, example iv. 
	This is an additional element and does not integrate into practical application.

Claim(s) 14 further recite
….further including an on-chip router to route data multi-directionally between components of the apparatus. (Claim 14)
Sending information from one component toward another is not a practical application. Receiving or transmitting information, which is well-understood, routine, conventional as evidenced by MPEP 2106.05(d), section II, list 1, example i.
	This is an additional element and does not integrate into practical application.

Claim(s) 15 further recite
…wherein the memory includes one or more barrel shifters to shift a matrix described in memory to target a read or write to a particular row or column of the matrix. (Claim 15)
This can merely be part of incremental reading data from a matrix and is not considered a practical application. Mere data gathering or the ability input for an equation is insignificant activity MPEP 2106.05(g)
This is an additional element and does not integrate into practical application.

Claim(s) 18 further recite
…wherein the tensor operand includes a matrix. (Claim 18)
	Tensor data related to arrays, matrix or vector data structure. Stating the definition of the concept of tensor is not considered a practical application. 
This is an additional element and does not integrate into practical application.

Claim(s) 19 further recite
…wherein the tensor operands include a particular input matrix and the set of operations includes a matrix dimension shuffle operation to reorder a plurality of dimensions of the particular input matrix. (Claim 19)
	Matrix data input and computations and/or arrangement is not considered a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
	This is an additional element and does not integrate into practical application.

Claim(s) 20 further recite
wherein at least a particular MPU in the network of MPUs includes local memory to store a set of matrix subroutines, and the particular MPU is to:
translate an operation received from the MCC into a subset of the matrix subroutines; and
perform the operation through execution of the subset of the matrix subroutines.  (Claim 20)
Employing memory to save data, parsing the data into subsets and performing operations on the data is merely a computer and instructions performing in a generic manner and is not considered a practical application. It is merely a  generic computer component performing generic computer functions. MPEP 2106.07(b)
This is an additional element and does not integrate into practical application.

Claim(s) 21 further recite 
…wherein the set of operations are used to implement one of a set of deep learning models, and the set of deep learning models includes a multilayer perceptron model, a restricted Boltzmann machine model, a deep belief network models, an auto-encoder model, and a convolutional neural network. (Claim 21)
	Declaring a category of models is not considered a practical application. Field of use and technological environment MPEP 2106.05(h)
This is an additional element and does not integrate into practical application.

Claim(s) 26 further recite
…wherein the system is implemented using a system on a chip. (Claim 26)
Using a specific hardware for implementation is not considered as practical application.  This is understood to be generic computer equipment. See MPEP 2106.05(f).
This is an additional element and does not integrate into practical application.

Claim(s) 27 further recite
…wherein the system is implemented using a server blade. (Claim 27)
Using a specific hardware for implementation is not considered as practical application.  This is understood to be generic computer equipment. See MPEP 2106.05(f).
This is an additional element and does not integrate into practical application.

	The judicial exception is not integrated into a practical application. There is no claimed application in which the invention is to be employed. There is no specific application such as individual life expectancy or rock boring bit longevity. There are no general domains such as farming or business methods. A system of system appears to be as general as possible and all encompassing.   

Step 2B: 
Based on the determination in Step 2A of the analysis that the claims are directed to a judicial exception, it must be determined if the claims contain any element or combination of elements sufficient to ensure that the claim amounts to significantly more than the judicial exception (Step 2B). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the same reasons given above in Step 2A, Prong 2 analysis. Furthermore, each additional element identified above as being insignificant extra-solution activity is also well-known, routine, conventional as described below. 

Claim(s) 24 further recite
…a port to connect to a host processor; (Claim 24)
Having a portal to a host computer is not considered a practical application. The use of a computer or other machinery, to compile information to transform a programming language into a target language with physical connections between a file and an application, is in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not provide significantly more. MPEP2106.05(f)(II)
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 1 and 24 further recite
….a network of matrix processing units (MPUs), wherein each MPU is connected to at least one other MPU in the network, and each MPU 1s to perform matrix multiplication operations; (Claim 1)
….a plurality of interconnected matrix processing units (MPUs), wherein each MPU includes circuitry to perform tensor arithmetic operations; (Claim 24)
A network of processing units or processors as a network is merely a design option and is not considered significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 1 and 24 further recite
…a memory to store tensor data; and(Claim 1)
…storing tensor data in memory, wherein the memory is accessible to a network of matrix processing units (MPUs); (Claim 22)
…a memory to store tensor data; and (Claim 24)
Storing data of a specific type is not significantly more than the judicial exception. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93. MPEP 2106.05(d)(II)
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 1 and 24 further recite
…a master control central processing unit (MCC) to: (Claim 1)
…a master control central processing unit (MCC) to: (Claim 24)
	Having a processing unit that controls other units is not significantly more than the judicial exception but merely fulfilling a designed function. Dietgoal Innovations, LLC v. Bravo Media, LLC, 599 Fed. Appx. 956 (Fed. Cir. Apr. 8, 2015), when the claim is directed to an abstract idea, and the additional elements do not amount to significantly more than the abstract idea, but merely implement the idea using generic computer technology the claims are not patent eligible.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 1, 22 and 24 further recite
…obtain an instruction from a host device, wherein the instruction includes one or more tensor operands based on the tensor data; (Claim 1)
…obtaining an instruction from a host device, wherein the instruction includes one or more tensor operands based on the tensor data; (Claim 22)
…obtain an instruction from the host processor, the instruction including one or more tensor operands based on the tensor data; (Claim 24)
Receiving computer code to implement tensor operations on tensor data is merely a computer performing its designed purpose and is not significantly more than the judicial exception. The use of a computer or other machinery, to compile information to transform a programming language into a target language with physical connections between a file and an application, is in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not provide significantly more. MPEP2106.05(f)(II)
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 1, 22 and 24 further recite
…invoke a set of operations on the one or more MPUs based on the instruction, wherein the set of operations includes operations on the tensor operands; and(Claim 1)
…causing a set of operations to be performed by the one or more of the MPUs based on the instruction, wherein the set of operations include operations on the tensor operands; and (Claim 22)
…cause the one or more of the MPUs to perform a set of operations based on the instruction, wherein the set of operations include operations on the tensor operands; and (Claim 24)
	Specifying a specific operation to be performed on data is not significantly more than the judicial exception. The use of a computer or other machinery, to compile information to transform a programming language into a target language with physical connections between a file and an application, is in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not provide significantly more. MPEP2106.05(f)(II)
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 1, 22 and 24 further recite
…output a result of the set of operations, wherein the result includes a tensor value. (Claim 1)
…generating a result from performance of the set of operations, wherein the result includes a tensor value. (Claim 22)
…return a result of the set of operations to the host processor, wherein the result includes a tensor value connected to the host. (Claim 24)
Producing a result from input data and calculations is not significantly more than the judicial exception. Electric Power Group, LLC v. Alstom, SA, 119 U.S.P.Q.2.d 1739, 830 F.3d 1350 (Fed. Cir. August 1, 2016), collecting information, analyzing it, and displaying certain results of the collection and analysis is an abstract idea.  
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 2 further recite
…wherein the MCC is further to send the result for storage in memory, wherein the result is stored as a tensor value in memory. (Claim 2)
Saving information is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 3 and 25 further recite
…wherein the MCC sends the result to the host device, and the host device includes a host processor connected to the MCC. (Claim 3)
…including the host processor. (Claim 25)
	Sending information to a specific device or component is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 4 further recite
…wherein the network of MPUs includes a plurality of MPUs, and the MCC is to select a subset of the plurality of MPUs to perform the set of operations. (Claim 4)
Selecting a destination to send data or information to is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 5 further recite
…wherein the subset of MPUs includes two or more of the MPUs. (Claim 5)
A plurality of processors is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 6 further recite
…wherein the instruction includes a stream of instructions and the MCC is to coordinate data flow and a sequence of operations to be performed by the network of MPUs based on the stream of operations. (Claim 6)
A specific type of instructions does not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 7 further recite
…wherein the sequence of operations includes a sequence of tensor arithmetic operations. (Claim 7)
A specific type of instructions does not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 8 further recite
…wherein the sequence of tensor operations includes matrix-matrix operations. (Claim 8)
A specific type of instructions does not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 9 further recite
…wherein the memory includes a memory resource block to be shared by two or more MPUs in the network of MPUs. (Claim 9)
A design feature of memory being shared by more than one processor is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 10 further recite
…wherein invoking the set of operations includes pointing one or more of the MPUs to the memory resource block to access the tensor data. (Claim 10)
	The use of pointers to access memory is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 11 further recite
…wherein the set of operations include at least one of a row/column broadcast, block shifting, matrix copy, matrix transpose, and matrix expansion. (Claim 11)
Matrix multiplication is not significantly more than the judicial exception. A specific type of instructions does comprise a practical application. 
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 12 further recite
…wherein the memory includes a super memory block (SMB) to group a plurality of memory resource blocks, and two or more MPUs in the network of MPUs have read/write access to the plurality of memory resource blocks in the SMB. (Claim 12)
Employing a specific of memory remains memory and does not suggest significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 14 further recite
….further including an on-chip router to route data multi-directionally between components of the apparatus. (Claim 14)
Sending information from one component toward another is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 15 further recite
…wherein the memory includes one or more barrel shifters to shift a matrix described in memory to target a read or write to a particular row or column of the matrix. (Claim 15)
This can merely be part of incremental reading data from a matrix and is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 18 further recite
…wherein the tensor operand includes a matrix. (Claim 18)
	Tensor data related to arrays, matrix or vector data structure. Stating the definition of the concept of tensor is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 19 further recite
…wherein the tensor operands include a particular input matrix and the set of operations includes a matrix dimension shuffle operation to reorder a plurality of dimensions of the particular input matrix. (Claim 19)
	Matrix data input and computations and/or arrangement is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 20 further recite
wherein at least a particular MPU in the network of MPUs includes local memory to store a set of matrix subroutines, and the particular MPU is to:
translate an operation received from the MCC into a subset of the matrix subroutines; and
perform the operation through execution of the subset of the matrix subroutines.  (Claim 20)
Employing memory to save data, parsing the data into subsets and performing operations on the data is merely a computer and instructions performing in a generic manner and is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 21 further recite 
…wherein the set of operations are used to implement one of a set of deep learning models, and the set of deep learning models includes a multilayer perceptron model, a restricted Boltzmann machine model, a deep belief network models, an auto-encoder model, and a convolutional neural network. (Claim 21)
	Declaring a category of models is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 26 further recite
…wherein the system is implemented using a system on a chip. (Claim 26)
Using a specific hardware for implementation is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim(s) 27 further recite
…wherein the system is implemented using a server blade. (Claim 27)
Using a specific hardware for implementation is not significantly more than the judicial exception.
This is an additional element claim and does not amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 102
7.	(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-6, 9, 15, 20-22 and 24-27is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Herrero Abellanas. (U. S. Patent Publication 20160179434, referred to as Herrero)

Claim 1
Herrero discloses an apparatus comprising: a network of matrix processing units (MPUs), wherein each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations (Herrero, 0075, 0119; ‘In one embodiment, one or more additional processor(s) 415, such as coprocessors, high-throughput MIC processors, GPGPU's, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processor, are coupled to first bus 416. In one embodiment, second bus 420 may be a low pin count (LPC) bus)’ and ‘ FIG. 18B shows an image 1820 on which a filter will be applied.’ EC: Applying a filter to an image in a convolutional neural network involves a dot product.) a memory to store tensor data (Herrero, 0118; FIG. 18A illustrates one embodiment of the memory organization where memory banks 1801-1806 and interconnects 1811-1816 are shared among different types of data (e.g. the input image and partial results) executed within an execution cluster 1800 (e.g., comprising a plurality of processing units (PUs)).); and a master control central processing unit (MCC) (Herrero, fig 6; Master control central processing unit maps to ‘application processor.) to: obtain an instruction from a host device (Herrero, 0196; As described herein, instructions may refer to specific configurations of hardware such as application specific integrated circuits (ASICs) configured to perform certain operations or having a predetermined functionality or software instructions stored in memory embodied in a non-transitory computer readable medium.), wherein the instruction includes one or more tensor operands based on the tensor data (Herrero, 0027; FIG. 18B illustrates an exemplary image on which a filter may be applied in accordance with one embodiment; EC: The application of a filter on an image employs a dot product algorithm.); partition the tensor data (Herrero, 0020; The image data 202, the filter data 204, and the output data 206 can comprise any number of dimensions. For example, an image can comprise sets of two-dimensional (2D) data corresponding to different colors (e.g., data corresponding to red/green/blue (RGB) values); thus the image data 202 can comprise sets of 2D image data for each of said colors, and the filter data 204 can comprise sets of 2D filter data corresponding to each of said colors.): distribute the partitioned tensor data to one or more MPUs in the network of MPUs (Herrero, 0022; Thus, the described and illustrated implementations should be understood only as examples, and the illustrated processes may be performed in a different order, some actions may be performed in parallel, and some actions may be pipelined.): invoke a set of operations on the one or more MPUs based on the instruction, wherein the set of operations includes operations on the tensor operands; and output a result of the set of operations, wherein the result includes a tensor value. (Herrero, 0019; In this embodiment, during forward propagation a set of filters 204 are applied across image data 202 to generate outputs 206 based on any applicable offsets (i.e., strides). EC: Tensor value is disclosed in the two dimensional outputs.)

Claim 2
Herrero discloses wherein the MCC is further to send the result for storage in memory, wherein the result is stored as a tensor value in memory. (Herrero, fig 6, 0093; ‘The application processor has within it ‘shared cache.’ ‘ and ‘FIG. 9 shows a neuromorphic accelerator architecture 900 where a single Processing Unit (“PU”) 901 is in charge of computing the dot-product operation for each logical neuron and accumulating the partial results until all input neurons have been traversed and the result is final. Inputs and weights are brought to the PU 901 from an Input/Output (IO) interface 902 using point-to-point buses, which connect each element from the unit with either internal memory or the external world.’)

Claim 3
Herrero discloses wherein the MCC sends the result to the host device, and the host device includes a host processor connected to the MCC. (Herrero, fig 5, 0066; FIGS. 3-6 are block diagrams of exemplary computer architectures. Other system designs and configurations known in the arts for laptops, desktops, handheld PCs, personal digital assistants, engineering workstations, servers, network devices, network hubs, switches, embedded processors, digital signal processors (DSPs), graphics devices, video game devices, set-top boxes, micro controllers, cell phones, portable media players, hand held devices, and various other electronic devices, are also suitable. EC: Workstations and network servers reflect a client server design which maps to a host device and a master control central processor.)

Claim 4
Herrero discloses wherein the network of MPUs includes a plurality of MPUs, and the MCC is to select a subset of the plurality of MPUs to perform the set of operations. (Herrero, 0065; The cores 202A-N may be homogenous or heterogeneous in terms of architecture instruction set; that is, two or more of the cores 202A-N may be capable of execution the same instruction set, while others may be capable of executing only a subset of that instruction set or a different instruction set.)

Claim 5
Herrero discloses wherein the subset of MPUs includes two or more of the MPUs. (Herrero, 0065; The cores 202A-N may be homogenous or heterogeneous in terms of architecture instruction set; that is, two or more of the cores 202A-N may be capable of execution the same instruction set, while others may be capable of executing only a subset of that instruction set or a different instruction set.)

Claim 6
Herrero discloses wherein the instruction includes a stream of instructions and the MCC is to coordinate data flow and a sequence of operations to be performed by the network of MPUs based on the stream of operations. (Herrero, fig 26; As the filter moves across the image. A dot product is performed and the result is placed in the resulting image. Each time the filter shifts position for new data input, this maps to stream of instructions and sequence of operations.)

Claim 9
Herrero discloses wherein the memory includes a memory resource block to be shared by two or more MPUs in the network of MPUs. (Herrero, fig 6; Shared cache units are shared by cores 502a-n)

Claim 15
Herrero discloses wherein the memory includes one or more barrel shifters to shift a matrix described in memory to target a read or write to a particular row or column of the matrix. (Herrero, fig 26; As the filter moves across the image. A dot product is performed and the result is placed in the resulting image. Each time the filter shifts position for new data input, this maps to stream of instructions and sequence of operations. EC: The applicant calls this barrel shifting.)

Claim 20
Herrero discloses wherein at least a particular MPU in the network of MPUs includes local memory to store a set of matrix subroutines (Herrero, 0076; A shared cache (not shown) may be included in either processor or outside of both processors, yet connected with the processors via P-P interconnect, such that either or both processors' local cache information may be stored in the shared cache if a processor is placed into a low power mode.), and the particular MPU is to: translate an operation received from the MCC into a subset of the matrix subroutines; and perform the operation through execution of the subset of the matrix subroutines.  (Herrero, 0054; The front end unit 130 includes a branch prediction unit 132 coupled to an instruction cache unit 134, which is coupled to an instruction translation lookaside buffer (TLB) 136, which is coupled to an instruction fetch unit 138, which is coupled to a decode unit 140. And In one embodiment, the core 190 includes a microcode ROM or other medium that stores microcode for certain macroinstructions (e.g., in decode unit 140 or otherwise within the front end unit 130). The decode unit 140 is coupled to a rename/allocator unit 152 in the execution engine unit 150.)

Claim 21
Herrero discloses wherein the set of operations are used to implement one of a set of deep learning models, and the set of deep learning models includes a multilayer perceptron model, a restricted Boltzmann machine model, a deep belief network models, an auto-encoder model, and a convolutional neural network. (Herrero, 0111; ‘In fact, these embodiments may be implemented on any form of device to reduce the bandwidth requirements of machine-learning algorithms and improve energy-efficiency on novel computer paradigms like Artificial Neural Networks (e.g., Convolutional Neural Networks or Deep Belief Neural Networks).’)

Claim 22
Herrero discloses a method comprising: storing tensor data in memory, wherein the memory is accessible to a network of matrix processing units (MPUs) (Herrero, 0118, 0075; ‘FIG. 18A,  illustrates one embodiment of the memory organization where memory banks 1801-1806 and interconnects 1811-1816 are shared among different types of data (e.g. the input image and partial results) executed within an execution cluster 1800 (e.g., comprising a plurality of processing units (PUs)).’ And ‘In one embodiment, one or more additional processor(s) 415, such as coprocessors, high-throughput MIC processors, GPGPU's, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processor, are coupled to first bus 416. In one embodiment, second bus 420 may be a low pin count (LPC) bus)’) obtaining an instruction from a host device (Herrero, 0196; As described herein, instructions may refer to specific configurations of hardware such as application specific integrated circuits (ASICs) configured to perform certain operations or having a predetermined functionality or software instructions stored in memory embodied in a non-transitory computer readable medium.), wherein the instruction includes one or more tensor operands based on the tensor data (Herrero, 0027; FIG. 18B illustrates an exemplary image on which a filter may be applied in accordance with one embodiment; EC: The application of a filter on an image employs a dot product algorithm.); partitioning the tensor data (Herrero, 0020; The image data 202, the filter data 204, and the output data 206 can comprise any number of dimensions. For example, an image can comprise sets of two-dimensional (2D) data corresponding to different colors (e.g., data corresponding to red/green/blue (RGB) values); thus the image data 202 can comprise sets of 2D image data for each of said colors, and the filter data 204 can comprise sets of 2D filter data corresponding to each of said colors.); distributing the partitioned tensor data to one or more MPUs in the network of MPUs (Herrero, 0022; Thus, the described and illustrated implementations should be understood only as examples, and the illustrated processes may be performed in a different order, some actions may be performed in parallel, and some actions may be pipelined.): causing a set of operations to be performed by the one or more of the MPUs based on the instruction, wherein the set of operations include operations on the tensor operands; and generating a result from performance of the set of operations, wherein the result includes a tensor value. (Herrero, 0019; In this embodiment, during forward propagation a set of filters 204 are applied across image data 202 to generate outputs 206 based on any applicable offsets (i.e., strides). EC: Tensor value is disclosed in the two dimensional outputs. )

Claim 24
Herrero discloses a system to implement a deep learning processor, the system comprising: a port to connect to a host processor (Herrero, fig 5, 0066; FIGS. 3-6 are block diagrams of exemplary computer architectures. Other system designs and configurations known in the arts for laptops, desktops, handheld PCs, personal digital assistants, engineering workstations, servers, network devices, network hubs, switches, embedded processors, digital signal processors (DSPs), graphics devices, video game devices, set-top boxes, micro controllers, cell phones, portable media players, hand held devices, and various other electronic devices, are also suitable. EC: Workstations and network servers reflect a client server design which maps to a host device and a master control central processor.); a plurality of interconnected matrix processing units (MPUs), wherein each MPU includes circuitry to perform tensor arithmetic operations (Herrero, 0075, 0119; ‘In one embodiment, one or more additional processor(s) 415, such as coprocessors, high-throughput MIC processors, GPGPU's, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processor, are coupled to first bus 416. In one embodiment, second bus 420 may be a low pin count (LPC) bus)’ and ‘ FIG. 18B shows an image 1820 on which a filter will be applied.’ EC: Applying a filter to an image in a convolutional neural network involves a dot product.); a memory to store tensor data (Herrero, 0118; FIG. 18A illustrates one embodiment of the memory organization where memory banks 1801-1806 and interconnects 1811-1816 are shared among different types of data (e.g. the input image and partial results) executed within an execution cluster 1800 (e.g., comprising a plurality of processing units (PUs)).); and a master control central processing unit (MCC) (Herrero, fig 6; Master control central processing unit maps to ‘application processor.) to: obtain an instruction from the host processor (Herrero, 0196; As described herein, instructions may refer to specific configurations of hardware such as application specific integrated circuits (ASICs) configured to perform certain operations or having a predetermined functionality or software instructions stored in memory embodied in a non-transitory computer readable medium.), the instruction including one or more tensor operands based on the tensor data (Herrero, 0027; FIG. 18B illustrates an exemplary image on which a filter may be applied in accordance with one embodiment; EC: The application of a filter on an image employs a dot product algorithm.); partition the tensor data (Herrero, 0020; The image data 202, the filter data 204, and the output data 206 can comprise any number of dimensions. For example, an image can comprise sets of two-dimensional (2D) data corresponding to different colors (e.g., data corresponding to red/green/blue (RGB) values); thus the image data 202 can comprise sets of 2D image data for each of said colors, and the filter data 204 can comprise sets of 2D filter data corresponding to each of said colors.): distribute the partitioned tensor data to one or more MPUs in the network of MPUs (Herrero, 0022; Thus, the described and illustrated implementations should be understood only as examples, and the illustrated processes may be performed in a different order, some actions may be performed in parallel, and some actions may be pipelined.); cause the one or more of the MPUs to perform a set of operations based on the instruction, wherein the set of operations include operations on the tensor operands; and return a result of the set of operations to the host processor, wherein the result includes a tensor value connected to the host. (Herrero, 0019; In this embodiment, during forward propagation a set of filters 204 are applied across image data 202 to generate outputs 206 based on any applicable offsets (i.e., strides). EC: Tensor value is disclosed in the two dimensional outputs. )

Claim 25
Herrero discloses including the host processor. (Herrero, fig 5, 0066; FIGS. 3-6 are block diagrams of exemplary computer architectures. Other system designs and configurations known in the arts for laptops, desktops, handheld PCs, personal digital assistants, engineering workstations, servers, network devices, network hubs, switches, embedded processors, digital signal processors (DSPs), graphics devices, video game devices, set-top boxes, micro controllers, cell phones, portable media players, hand held devices, and various other electronic devices, are also suitable. EC: Workstations and network servers reflect a client server design which maps to a host device and a master control central processor.)

Claim 26
Herrero discloses wherein the system is implemented using a system on a chip. (Herrero, 0013; FIG. 6 illustrates a block diagram of a system on a chip (SoC) in accordance with an embodiment of the present invention)

Claim 27
Herrero discloses wherein the system is implemented using a server blade. (Herrero, 0066; Other system designs and configurations known in the arts for laptops, desktops, handheld PCs, personal digital assistants, engineering workstations, servers, network devices, network hubs, switches, embedded processors, digital signal processors (DSPs), graphics devices, video game devices, set-top boxes, micro controllers, cell phones, portable media players, hand held devices, and various other electronic devices, are also suitable.)

Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim(s) 7-8, 10-12, 16 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Herrero as applied to claims 1-6, 9, 15, 20-22 and 24-27 above, and further in view of Giusti. (‘Fast image scanning with deep max pooling convolutional neural networks’, referred to as Giusti)

Claim 7
Herrero does not disclose expressly wherein the sequence of operations includes a sequence of tensor arithmetic operations.
Giusti discloses wherein the sequence of operations includes a sequence of tensor arithmetic operations. (Giusti, p4035 section 2.2.1, formulas (3) and (4)) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 8
Herrero does not disclose expressly wherein the sequence of tensor operations includes matrix-matrix operations. 
Giusti discloses wherein the sequence of tensor operations includes matrix-matrix operations. (Giusti, fig 1; Input maps and kernels are matrixes. ) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 10
Herrero does not disclose expressly wherein invoking the set of operations includes pointing one or more of the MPUs to the memory resource block to access the tensor data.
Giusti discloses wherein invoking the set of operations includes pointing one or more of the MPUs to the memory resource block to access the tensor data. (Giusti, fig 1; The kernel is placed or pointed over the input maps to generate a pixel value on the output map. EC: The pixel data can be viewed as tensor data.) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 11
Herrero does not disclose expressly wherein the set of operations include at least one of a row/column broadcast, block shifting, matrix copy, matrix transpose, and matrix expansion.
Giusti discloses wherein the set of operations include at least one of a row/column broadcast, block shifting, matrix copy, matrix transpose, and matrix expansion. (Giusti, fig 1; How the output map is generated is by the kernel is placed over the input map and a dot product generates a value for the output map. Then the kernel is shifted and operations are repeated. This is why a 4x4 reduces to a 3x3 matrix. This covers row/column broadcast, block shifting.) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 12
Herrero discloses wherein the memory includes a super memory block (SMB) to group a plurality of memory resource blocks (Herrero, 0115; A straightforward solution to this problem is to have a unified storage area 1710 with multiple read/write ports devoted to the different types of data as shown in FIG. 17B.), and two or more MPUs in the network of MPUs have read/write access to the plurality of memory resource blocks in the SMB. (Herrero, abstract; …a unified scratchpad memory comprising a plurality of memory banks communicatively coupled to the plurality of processing units through a plurality of read/write ports, each of the plurality of memory banks partitioned to store both the input data and partial results…)

Claim 16
Herrero does not disclose expressly wherein the set of operations includes a max pooling operation.
Giusti discloses wherein the set of operations includes a max pooling operation. (Giusti, fig 1; Max pooling operation maps to max pooling layer.) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 18
Herrero does not disclose expressly wherein the tensor operand includes a matrix.
Giusti discloses wherein the tensor operand includes a matrix. (Giusti, fig 1; Input maps and kernels are matrixes. ) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 19
Herrero does not disclose expressly wherein the tensor operands include a particular input matrix and the set of operations includes a matrix dimension shuffle operation to reorder a plurality of dimensions of the particular input matrix.  
Giusti discloses wherein the tensor operands include a particular input matrix and the set of operations includes a matrix dimension shuffle operation to reorder a plurality of dimensions of the particular input matrix.  (Giusti, fig 1; How the output map is generated is by the kernel is placed over the input map and a dot product generates a value for the output map. Then the kernel is shifted and operations are repeated. This is why a 4x4 reduces to a 3x3 matrix. This covers row/column broadcast, block shifting.) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Giusti before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the basic concepts of a convolutional neural network, max pooling of Giusti. Given the advantage of using input matrix with a filter or kernel generates a resulting matrix and reducing the dimensions of the resulting matrix by another filter and stride, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Herrero as applied to claims 1-6, 9, 15, 20-22 and 24-27 above, and further in view of Florea. (U. S. Patent Publication 20180041434, referred to as Florea)

Claim 14
Herrero does not disclose expressly further including an on-chip router to route data multi-directionally between components of the apparatus.
Florea discloses further including an on-chip router to route data multi-directionally between components of the apparatus. (Florea, 0003; Background  In general, computer systems and routers are discrete, physically separate components. However, computer systems and one or more routers can be combined on a single integrated circuit (IC), or for brevity, chip. For example, such a chip can be referred to as a network on a chip (NoC). A NoC can include multiple processing cores that communicate using one or more on-chip routers. The on-chip routers can route packets to and from processing cores on the same chip or other chips.) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Florea before him before the effective filing date of the claimed invention, to modify Herrero to incorporate on chip routers of Florea. Given the advantage of each processor having an on chip router reduces computation time, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Herrero as applied to claims 1-6, 9, 15, 20-22 and 24-27 above, and further in view of Aydonat. (U. S. Patent Publication 20180074787, referred to as Aydonat)

Claim 17
Herrero does not disclose expressly wherein the set of operations includes performing a Winograd transformation on the operands and performing a matrix multiplication on the operands transformed by the Winograd transformation.
Aydonat discloses wherein the set of operations includes performing a Winograd transformation on the operands and performing a matrix multiplication on the operands transformed by the Winograd transformation. (Aydonat, 0020; Fast filtering is a core operation in field-programmable gate array convolutional neural networks. Circuitry transforms filter and input data to intermediate filter and input data results using a transformation function. The transformation function may determine intermediate filter results based at least in part on a number of filter elements and output elements. For instance, the transformation function may include Winograd transformations.) It would have been obvious to one having ordinary skill in the art, having the teachings of Herrero and Aydonat before him before the effective filing date of the claimed invention, to modify Herrero to incorporate the Winograd transformation of Aydonat. Given the advantage of reducing dimensions by a pruning shared data thus less computational cost, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Response to Arguments
9.	Applicant’s arguments filed on 4/22/2022 for claims 1-12, 14-22, 24-27 have been fully considered but are not persuasive.

10.	Applicant’s argument:
I Objection to the Title

By way of the foregoing amendments, the Title is amended to: Methods and
Apparatus to Perform Tensor. Reconsideration is requested. 

II. Objections to the Drawings

The Office Action objects to FIG. 7 as allegedly inconsistent with the description in the specification with regard to item 720. In the drawings, block 720 is illustrated as a layer with the label “Arbiter/Scheduler & Internal Chip Interface(s).” In the specification, Para. [0057] includes the only reference to block 820 and states: “a transaction layer 720 (e.g., to interface to ICC, on-chip network, and HBM, perform flit segmentation and re-assembly, flow control credit handling, virtual channel (VC) and priority arbiter, etc.).” Thus, the Arbiter/Scheduler & Internal Chip Interfaces are consistent with the description of a transaction layer. Reconsideration is requested.

Regarding the remaining objections to the drawings, reconsideration in view of the foregoing amendments to the specification is requested.

Examiner’s answer:
The examiner withdraws the objections.

11.	Applicant’s argument:
Il. Objections to the Specification
The Office Action notes the proper language for an Abstract, but does not indicate any objection to the current Abstract. It is respectfully submitted that the current Abstract is proper. 

Examiner’s answer:
The examiner maintains the objection. 

12.	Applicant’s argument:
With the exception of the following notes, it is respectfully submitted that the foregoing amendments to the Specification obviate the objections to the specification.

Regarding the objection to Para. [0067], it is respectfully submitted that the specification describes what diagram 1000 refers to. In particular, Para. [0067] states: “For instance, FIG. 10 shows a representative diagram 1000 of inputs and outputs of an example SMB 1005, and the routing between the composite MRBs (e.g., 830a- n) and the ports of the SMB 1005.” (emphasis added). Thus, element 1000 refers to the described diagram. Reconsideration is requested.

Regarding the objection to Para. [00153], the Objection alleges that there is an inconsistency between the described indicating that the flowchart may restart at block 2002 but reference to “continue writing.” This description is not inconsistent. The multiple blocks of FIG. 20 relate to the writing (e.g., the multiple steps that are part of the writing process). Thus, it is not consistent to note that returning to block 2002 continues the writing even if the final step of writing data does not occur until block 2006. Reconsideration is requested.

Examiner’s answer:
The examiner withdraws the rejection. 

13.	Applicant’s argument:
IV. Claim Objections

The Office Action objected to claim 3, and requested the removal of the word “for” prior to “to.” The Applicant notes that this wording correction was already attended to in the preliminary amendment submitted on July 26, 2019. As such, no amendment should be required. Withdrawal of the objection to claim 3 is respectfully requested.

Examiner’s answer:
The examiner withdraws the rejection. 


14.	Applicant’s argument:

V. Claim Interpretation
The Office Action alleges that the MPUs, the MCC, and the Super Memory Block referenced throughout multiple claims are generic placeholders. Claim terms that do not include the term “means” create a rebuttable presumption that the claim terms are not directed to means plus function structure. However, the Office Action does not rebut the presumption, but rather just asserts that the MPUs, the MCC, and the Super Memory Block lack structure. To the contrary, processing units, central processing units, and memory blocks are well-known structural components that may be implemented, for example, by a collection of circuits. Adding an additional description term to such structural terms (e.g., matrix or master control) does not remove the structural nature anymore than adding video to the term encoder does not remove the structural nature of a video encoder. It is respectfully submitted that these claims are not intended to invoke 112(6) interpretation. Reconsideration is requested. 

Examiner’s answer:
The interpretation stands. 


15.	Applicant’s argument:
VI. Rejections under 35 USC § 112a The Office Action alleges that claims 1-21 are rejected under 112a because claim terms lack “an explicit definition.” No further explanation/argument is provided. The rejection fails to establish a prima facie rejection because 112a does not require “an explicit definition” of claim terms. In fact, the MPEP states: “The absence of definitions or details for well-established terms or procedures should not be the basis of a rejection under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph, for lack of adequate written description.” (MPEP 2163). Reconsideration is requested. 

VII. Rejections under 35 USC § 112b The rejection is traversed. Nevertheless, as set forth above, none of the claims are intended to invoke 112(6). Accordingly, the rejection that the elements do not include structure in the specification is moot. 

Examiner’s answer:
The rejection stands. The claims recite 4 different processors or processing units. There needs to specific information in regards to structure that one can determine the difference between a MPU and a MCC. 

16.	Applicant’s argument:
VIII. Rejections under 35 USC § 101 

The claims are rejected as allegedly directed to an abstract idea. In particular, the Office Action alleges that the claims are directed to a mental process that could be performed in the human mind or by using pen and paper. Reconsideration in view of the amendments based on the previous claim 18 is requested. For example, at least the operation of distributing partitioned data to matrix processing units cannot be performed in the human mind or with pen and paper. Accordingly, reconsideration is requested.

Examiner’s answer:
The rejection is maintained. The claimed invention is a combination of mathematical concept, lacks a practical application nor is not significantly more. 

17.	Applicant’s argument:
IX. Rejections under 35 USC § 102

Examiner’s answer:
The argument is moot due to new art has been cited. 

18.	Claims 1-12, 14-22, 24-27 are rejected.
	
Conclusion	
19.	The prior art of record and not relied upon is considered pertinent to the applicant’s disclosure.
	-Search terms: Claim 1 and ip.com
	-U. S. Patent Publication 20170097884: Werner
	-U. S. Patent Publication 20160026912: Falcon

Correspondence Information
20.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Michael Huntley can be reached at (303) 297-4307.  .  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
            Customer Service Window, 
	Randolph Building, 
            401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129