DETAILED ACTION
This Office Action is in response to Application No. 16/362,398 filed on March 22nd, 2019. Claims 1-22 are presented for examination and are currently pending.
	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR 10-2018-0052920 filed on May 9th, 2018.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 22nd, 2019 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
Claims 1 and 7 of this application contain contingent limitations meaning that the claim is opened up for interpretation where any prior art meets the broadest reasonable interpretation of the claim when only one of those conditions are met. Therefore, for claim 1, the prior art only needs to read on “performing a row transformation on the weight matrix” OR “performing the row transformation and a column transformation on the weight matrix”. For claim 7, the prior art only needs to read on the row information including “row numbers corresponding to the rows associated with the non-zero values” OR “encoding information from which the row numbers are decoded” there is only a single comparison for these features and thus, only a single condition is required in the claim language. See MPEP 2111.04(II); See also Ex parte Schulhauser. 
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


Claims 14-22 of this application include one or more limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
The “input register” and “output register” of claims 14 and 20. The claim language recites that there are these registers that store input vectors and output vectors respectively, but does not reference or recite any structure or algorithm for support of these registers. The registers are directed towards storing and transferring data throughout claims 14, 20 and their dependent claims. In light of the specification, the registers are located on the accelerator with reference numbers 440 (input register) and 450 (output register) but no further detail or structure is provided that the examiner can find that helps classify, describe or create a clear distinction for these units. As “input register” and “output register” are general terms that one of ordinary skill in the art would agree have no agreed upon one specific structure, the term is now being interpreted under 112(f). For the purposes of this office action, the terms are being interpreted as any memory/memories that will be able to store the input vector and output vector. 
The “decoder” of claim 15. The claim language recites a decoder that is configured to decode the row numbers corresponding to the non-zero values from the row information but does not reference or recite any structure or algorithm for support of this decoder. In light of the specification, the decoder is located on the accelerator with reference number 430 but no further detail or structure is provided that the examiner can find that helps classify, describe or create a clear distinction for this unit. As a “decoder” is a general term that one of ordinary skill in the art would agree has no agreed upon one specific structure or meaning, the term is now being interpreted under 112(f). For the purposes of this office action, the term is being interpreted as any software or hardware that can perform the actions of the claim limitation. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14-18 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 14 is rejected for its use of the terms “input register” and “output register” which are vague terms that do not define a clear or definite device or software for the claim language. The terms were interpreted under 112(f) and re-evaluated in light of the specification which also fails to disclose any structure or algorithm aside from being located on the accelerator in Fig. 10. As one of ordinary skill in the art, at the time of filing, would fail to agree on one set structure or definition for the terms, the claim is rendered indefinite. For the purposes of this office action, the terms are being interpreted as a piece of software or hardware that can perform the function of storing said data. Claims 15-18 are rejected for their dependence on claim 14. Claim 20 is rejected for similar reasons as above. Applicant is kindly asked to fix this error and all similar errors.

Claim 15 is rejected for its user of the term “decoder”. The claim language states a decoder is configured to decode row numbers corresponding to non-zero values but fails to state the structure or algorithm associated with said decoder. The term was interpreted under 112(f) and re-evaluated in light of the specification which also fails to disclose any structure or algorithm aside from the decoder being located on the accelerator in Fig. 10. As one of ordinary skill in the art, at the time of filing, would fail to know what a “decoder” is or what it is or how it performs the functions, the claim is rendered indefinite. For the purposes of this office action, the term is being interpreted as a piece of software or hardware that can perform the function of decoding the row numbers. Applicant is kindly asked to fix this error and all similar errors.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.




Claims 1-22 are rejected under §101 as non-eligible subject matter.
In regards to claim 1, the claim is rejected because the claimed invention is directed to an abstract idea without significantly more. The claim recites determining row lengths within a matrix, rearranging the rows based on the number of non-zero elements, and distributing the formatted data to processing elements. 

2A Prong 1: The following limitations under their broadest reasonable interpretation, cover performance of mathematical calculations and limitations of the mind but for the recitation of generic computer components.

A method of formatting a weight matrix in a current layer included in a neural network, the method comprising: determining a row length for each row of the weight matrix, the row length corresponding to a number of elements each having a non- zero value in each row, (mathematical calculation / mental process – determining the length of a row based on how many elements the row contains is just counting the elements within a row) the weight matrix including a plurality of elements that are arranged in a plurality of rows and a plurality of columns; (mathematical calculation / mental process – data being in the form of a matrix containing rows and columns is an arrangement of data which can be a mental process and/or a mathematical calculation since the arrangement is in the form of a matrix) obtaining rearrangement information including a result of sorting the rows in order of size of the determined row lengths; (mathematical calculation – calculating, based on which rows have the most non-zero elements, an order of the rows) performing a row transformation on the weight matrix or performing the row transformation and a column transformation on the weight matrix, using the rearrangement information, thereby generating a transformed weight matrix; (mathematical calculation – performing the sorting of the rows based on the previous calculation) and generating formatted data including one or more data groups each including non-zero values of elements of the transformed weight matrix that are processed in the PEs and column information of the non- zero values. ( Mathematical process – generating formatted data is equivalent to formatting the data which is a mathematical process) 



2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements of “distributing rows of the transformed weight matrix to a plurality of processing elements (PEs)” which is a claim limitation consisting of data transfer and generic computer components. The generic computer components (processing elements) are recited without specific instructions, except for the computer components to receive the above data. As the claim is directed towards these devices with high-levels of generality (any processor that can execute instructions would be able to perform the claim limitations), the claim amounts to instructions to apply the exception using said generic computer components. Further, the limitation also recites “distributing rows of the transformed weight matrix” which is recited at high-levels of generality and amount to instructions to transfer data which is a form of insignificant extra-solution activity. Further, the first limitation which recites the “formatting a weight matrix in a current layer included in a neural network” is directed towards generally linking the judicial exception (mental process / mathematical calculation) to a particular field or technology such that the neural network is a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limitations. The claim is directed to an abstract idea.

2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using generic computer components to perform mathematical calculations amounts to no more than mere instructions to apply the exception using a generic computer component (as stated previously, any generic processor can perform the actions of the claim limitation). Further, the steps which recite transferring data in the form of the transformed weight matrix is a step which was considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus is re-evaluated in Step 2B to determine if the step is more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving steps are well-understood, routine, conventional activities is supported under Berkheimer. The claim is not patent eligible.

In regards to claim 14, the claim is similar to claim 1 but directed towards an accelerator rather than a method, with differences between the two claims comprising of additional generic computer components in the accelerator claim. Specifically, claim 14 contains the following distinct limitations: “An accelerator comprising: a processing element (PE) array including a plurality of PEs”, “an output register configured to store an output vector output from the PE array”, “an input register configured to store an input vector to be provided to the PE array”, and “and a control circuit configured to provide the PE array with formatted data”. The above limitations all comprise of generic computer components as all of the above pieces of hardware are recited without specific instructions on how to carry out their task and instead are directed to just perform a function. Further, items such as “processing element”, “output register” and “input register” are general terms that can utilize many different components and are general terms that are used to apply the abstract idea. Applying the mental process to a generic computer process without specific instruction or direction (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) does not provide an inventive concept. The claim is reexamined under 2B to see if it contains any additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using generic computer components to perform mathematical calculations and mental processes amounts to no more than mere instructions to apply the exception using a generic computer component (as stated previously, any generic computer-readable medium and processor can perform the actions of the claim limitation). Mere instructions to apply an exception using a generic computer component (a generic processing element - Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) cannot provide an inventive concept. Additionally, the limitations regarding the input and output registers consist of data transfer which the court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). With the other claim limitations being similar to claim 1 and being covered under the analysis for claim 1 above, the claim contains no more additional elements. The claim is not patent eligible.

In regards to claim 19, the claim is similar to independent claims 14 and 1 but is directed towards a system rather than an accelerator or method with the differences between the claims comprising some generic hardware components. Specifically, “A system comprising: an accelerator including a plurality of PEs” and “and a neural network application circuit configured to control an inference operation performed in the accelerator by providing the accelerator with an input vector generated from an input signal” are two distinct limitations. Regarding the first limitation, the PEs (processing elements) and accelerator are generic computer components similar to claims 1 and 14 with the generic computer components simply being recited to apply the abstract idea. In regards to the second limitation, the neural network application circuit is also a generic computer component that is utilized to transfer data to the accelerator. The claim limitations are reexamined under step 2B to see if it contains any additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using generic computer components to perform mathematical calculations and mental processes amounts to no more than mere instructions to apply the exception using a generic computer component (as stated previously, any generic computer-readable medium and processor can perform the actions of the claim limitation). Mere instructions to apply an exception using a generic computer component (a generic processing element - Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) cannot provide an inventive concept. Additionally, the limitation regarding the neural network application circuit consists of data transfer which the court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). With the other claim limitations being similar to claims 1 and 14 and being covered under the analysis for claims 1 and 14 above, this claim contains no more additional elements. The claim is not patent eligible.

In regards to claim 2, the claim is directed towards specifying how the distribution of rows is performed such that the row lengths distributed to each PE is minimized. Under step 2A Prong 1, the claim is directed towards an additional mental process. Rearranging how the rows would be distributed to processing elements is a limitation which can be performed in the human mind. The claim simply specifies the distribution to be minimized among the processing elements which is equivalent to a human comparing the rows distributed to each processing element and determining which length row should be distributed to each processing element. This claim and claim 1 which it relies on, does not specify what is performing this action and it being distributed to a processing element does not remove the claim from the domain of being an abstract idea. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 3, the claim is directed towards specifying how the row transformation is carried out. Specifically, stating that the row transformation be carried out according to the rearrangement information which is defined in claim 1 as sorting the rows in order of row length. Under step 2A Prong 1, the claim is directed towards a mental process. Sorting rows of a matrix according to how many elements they have is a limitation that can be performed by the human mind. As the claim limitation is still another mental process, it does not integrate the claim into a practical application. The limitation stating that the sorting be carried out according to the rearrangement information rather than what the rearrangement information is, which is sorting in numerical order, should not obfuscate that the limitation can still be performed by a human and the limitation is merely applied to generic computer components. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 4, the claim is directed towards specifying there is a previous layer of the neural network and that the rearrangement of the columns in the weight matrix be carried out according to the rearrangement of the previous layer. Under step 2A Prong 1, the claim is directed towards a mental process. As stated previously, the rearrangement information is still merely sorting by numerical value which can be performed by the human mind but recited for generic computer components. As the claim limitation is still another mental process, it does not integrate the claim into a practical application. The limitation specifying that the sort follows the arrangement of the previous layer does not impact whether a human would be able to perform the sort and fails to integrate the claim into a practical application. Specifying the inclusion of the previous layer and basing the sorting off of that layer is directed towards generally linking the judicial exception (mental process) to a particular field or technology such that the neural network (and subsequently, the previous layer) are generic computer components. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 5, the claim is directed towards sorting the columns within the weight matrix according to the rearrangement information. Under step 2A Prong 1, the claim is directed towards a mental process. As stated previously, the rearrangement information is still merely sorting by numerical value which can be performed by the human mind but recited for generic computer components. As the claim limitation is still another mental process, it does not integrate the claim into a practical application. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception.  The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 6, the claim is directed towards specifying the formatted data includes row information corresponding to the non-zero elements of the weight matrix. Under step 2A Prong 1, the claim is directed towards an additional element without significantly more. The claim limitation simply specifies what pieces of data are being included in the formatted row information which is a data type. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 7, The claim is directed towards further specifying that the row information included row numbers for the rows associated with non-zero elements or including encoding information. Under step 2A Prong 1, the claim is directed towards an additional element without significantly more. The claim limitation simply specifies what pieces of data are being included in the formatted row information which is a data type. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 8, the claim is directed towards further specifying the encoding information includes row lengths of the transformed weight matrix. Under step 2A Prong 1, the claim is directed towards an additional element without significantly more. The claim limitation simply specifies what pieces of data are being included in the formatted row information which is a data type. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception.  The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 9, the claim is directed towards specifying that the neural network is a LSTM (Long Short-Term Memory) network. Under step 2A Prong 1, the claim is directed towards generally linking the judicial exception (mathematical calculation / mental process ) to a particular field or technology such that the LSTM network is a generic computer component. The claim limitation does not provide any specific implementation or directions that transform this limitation into more than a generic computer component in this context (a generic machine learning model can stand in for this limitation). The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 10, the claim is directed towards the weight matrix comprising two different matrices with the first matrix being multiplied by an input vector and the second matrix being multiplied by an output vector and then disposing both of the matrices in a row format. Under step 2A Prong 1, the claim is directed towards an additional mathematical calculation and mental process. First, the claim includes matrix multiplication between the first weight matrix and input vector and between the second weight matrix and the previous output vector. This limitation falls under mathematical calculation. The second limitation which specifies disposing the matrices into row format is a mental process for the rearrangement of data into another format. Under Prong 2, both of the claim limitations alone or together fail to integrate the claim into a practical application as they are both recite additional abstract ideas at high-levels of generality. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. As the claim does not amount to significantly more than the judicial exception, the claim is not patent eligible. The claim is rejected for the same reasons and judgement as applied to claim 1. 
In regards to claim 11, the claim is directed towards further specifying that the row transformations must be performed to the rearrangement information. Under step 2A Prong 1, the claim is directed towards a mental process. As stated previously, the rearrangement information is still merely sorting by numerical value which can be performed by the human mind but recited for generic computer components. As the claim limitation is still another mental process, it does not integrate the claim into a practical application. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 12, the claim is directed towards performing column transformations and specifying that the transformations include rearranging the order of the columns. Under step 2A Prong 1, the claim is directed towards a mental process. As stated previously, the rearrangement information is still merely sorting by numerical value which can be performed by the human mind but recited for generic computer components. As the claim limitation is still another mental process, it does not integrate the claim into a practical application. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 13, the claim is directed towards specifying there being a previous layer and that the column transformation of claim 12 is based off of the rearrangement information of the previous layer. Under step 2A Prong 1, the claim is directed towards an additional mental process. As stated previously, the rearrangement information is still merely sorting by numerical value which can be performed by the human mind but recited for generic computer components. As the claim limitation is another mental process, it does not integrate the claim into a practical application. The limitation specifying that the sort follows the arrangement of the previous layer does not impact whether a human would be able to perform the sort and fails to integrate the claim into a practical application. Specifying the inclusion of the previous layer and basing the sorting off of that layer is directed towards generally linking the judicial exception (mental process) to a particular field or technology such that the neural network (and subsequently, the previous layer) are generic computer components. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 1.
In regards to claim 15, the claim is directed towards the accelerator further comprising a decoder that can decode the row numbers corresponding to the non-zero values from the row information. Under step 2A Prong 1, the claim is directed towards an additional element. The decoder of the claim language is a generic computer component that does not disclose structure or algorithm. The generic computer component is recited at high-levels of generality and without specific instructions, except for the component to perform the action. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 14.
In regards to claim 16, the claim is directed towards the accelerator further comprising a state register to store an operation result and then provide the result back to the processing element. Under step 2A Prong 1, the claim is directed towards an additional element. The state register is a memory that is a generic computer component that has no specific instructions except for the component to perform the action. Further, storing the operation result and then providing the result back to the processing element is a form of data transfer which is a form of insignificant extra-solution activity. The claim is reevaluated to see if the additional elements amount to significantly more than the judicial exception. The steps which recite receiving data in the registers and transmitting the data to the processing elements are steps which were considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus are re-evaluated in Step 2B to determine if the steps are more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving steps are well-understood, routine, conventional activities is supported under Berkheimer. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 14.
In regards to claim 17, the claim is directed towards the formatted data further including the non-zero values and the column numbers corresponding to the non-zero values. Under step 2A Prong 1, the claim is directed towards an additional element. The claim limitation simply specifies what pieces of data are being included in the formatted row information which is a data type. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 14.
In regards to claim 18, the claim is directed towards the control circuit providing the data groups to the processing array in a sequential matter. Under step 2A Prong 1, the claim is directed towards insignificant extra-solution activity. The claim is directed towards the transfer of data from the control circuit and the PE. Passing the operation result to the processing element is a form of data transfer which is a form of insignificant extra-solution activity. The claim is reevaluated to see if the additional elements amount to significantly more than the judicial exception. The steps which recite transmitting the data to the processing elements are steps which were considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus are re-evaluated in Step 2B to determine if the steps are more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving steps are well-understood, routine, conventional activities is supported under Berkheimer. The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 14.
In regards to claim 20, the claim is rejected because the claimed invention is directed to an abstract idea without significantly more. The claim recites various generic hardware components and performing mathematical calculations and mental processes. 

2A Prong 1: The following limitations under their broadest reasonable interpretation, cover performance of mathematical calculations and limitations of the mind but for the recitation of generic computer components. 

…and a control circuit configured to provide the PE array with the formatted data of the weight matrix, wherein the formatted data includes non-zero values of elements allocated to each of the plurality of PEs, column numbers corresponding to the non-zero values, and row information from which row numbers corresponding to the non-zero values are decoded, a column number indicating a column to which a non-zero value belongs, a row number indicating a row to which the non-zero value belongs (mathematical calculation – generating and providing formatted data are synonymous with one another and the creation of formatted data is a mathematical calculation where the data is sorted according to mathematical properties or principles)2A Prong 2:  This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements of “wherein the accelerator comprises: a processing element (PE) array including the plurality of PEs” which is a generic set of processing elements recited with high-generality and no instructions aside from performing the operations. Additionally, there are claim limitations: “an output register configured to store an output vector output from the PE array” and  “an input register configured to store the input vector to be provided to the PE array” both of which consist of generic computer components (any kind of memory could be substituted for the registers) being instructed to store outputs which is a form of data transfer and subsequently insignificant extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limitations. The claim is directed to an abstract idea.

2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using generic computer components to perform mathematical calculations amounts to no more than mere instructions to apply the exception using a generic computer component (as stated previously, any generic processor can perform the actions of the claim limitation). Further, the steps which recite transferring data from the registers is a step which was considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus is re-evaluated in Step 2B to determine if the step is more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving steps are well-understood, routine, conventional activities is supported under Berkheimer. The claim is not patent eligible.
In regards to claim 21, the claim is directed towards training a neural network to generate a weight matrix and prune said weight matrix via a neural network circuit and format the weight matrix via a formatting circuit. Under step 2A Prong 1, the claim is directed towards a mathematical calculation. The neural network generation circuit and the formatting circuit are generic computer components (any processor can be substituted in the claim language and perform the limitations) and are recited without any specific instructions in the claim aside for performing the action, in an “apply it” manner. Further, the training of the neural network to generate the weight matrix is applying the judicial exception to a particular field. Specifically, the claim generally links the judicial exception (mathematical calculation) to a particular field or technology such that the neural network is a generic computer component to generate a weight matrix. The claim limitation does not provide any specific implementation or directions that transform this limitation into more than a generic computer component in this context (any neural network or machine learning model can stand in for this limitation and no specific instructions on how the training is to be performed are given). Additionally, the pruning of the weight matrix and the generation of the formatted data are mathematical calculations that are recited for completion by said generic computer components. The generation of the formatted data including determining the length of rows, sorting the rows in an order, and grouping the rows such that there is an even length for each processing element are mathematical calculations being recited by the claim language. The claim is not integrated into a practical application. The claim is reexamined under 2B to see if there is significantly more than the judicial exception. The linking of the abstract idea to the particular technology is done with a high-level of generality and fails to add significantly more than the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limitations. The claim is directed to an abstract idea and is not patent eligible with similar reasoning and judgement as applied to claim 19.
In regards to claim 22, the claim is directed towards the formatting circuit generating the formatted data based on how many processing elements there are. Under step 2A Prong 1, the claim is directed towards an additional element without significantly more. The claim limitation simply specifies an additional piece of data being included in the formatted row information which is a data type. Further, the circuit, as stated above, is a generic computer component (such that any processor can be used in its place and be able to perform the claim limitations) that has no specific instructions or structure that can be considered significantly more than the judicial exception and is recited at a high-level of generality. The claim is reevaluated under 2A Prong 2 and step 2B and as it has no other meaningful limitations, the claim is not integrated into a practical application nor does it add significantly more than the judicial exception.  The claim is not patent eligible and is rejected for the same reasons and judgement as applied to claim 19.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 6-11 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rub (US 9830302 B1), and further in view of Zejda (US 10354733 B1).
In regards to claim 1, Rub teaches the following:
determining a row length for each row of the weight matrix, the row length corresponding to a number of elements each having a non- zero value in each row,
 [ (Abstract) “An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows”
	This teaches the method for sorting (equivalent to formatting) a matrix and doing so by sorting the rows by the number of non-zero elements within the row. ]
[ (Abstract) “The method allows for packing the sorted rows in each of the groups to generate packed rows. Each of the packed rows within the same group has the same length”
	This citation teaches matching the length of the rows which would show the system has determined the length of the row for the non-zero elements. ]
 the weight matrix including a plurality of elements that are arranged in a plurality of rows and a plurality of columns;
[ (Col. 3, Lines 3-8) “FIG. 1A, identified as prior art, is a block diagram showing standard full matrix vector multiplication. An example 4×4 matrix vector product is shown in block 100. A standard “row times column” approach typically used in scalar computations for the 4×4 matrix vector product is shown in block 102”
	This citation shows that the starting matrix is indeed in normal matrix format which consists of rows and columns. ]
obtaining rearrangement information including a result of sorting the rows in order of size of the determined row lengths;
[ (Col. 3, Lines 51-53) “FIG. 2 is flow chart showing a first section (a matrix preparation) of a method 200 for sparse matrix vector multiplication, according to various embodiments”
	This citation teaches the preparation for the sorting operation in which the rows will be rearranged. Examiner notes that this would be equivalent to obtaining the information for the processing step. ]
 performing a row transformation on the weight matrix or performing the row transformation and a column transformation on the weight matrix, using the rearrangement information, thereby generating a transformed weight matrix;
[ (Col. 3, Lines 53-59) “In block 202, the method 200 includes sorting the rows of the matrix by the number of the non-zero elements in the rows. In some embodiments, the rows are sorted in ascending order of the numbers of the non-zero elements. In other embodiments, the rows are sorted in descending order of the numbers of the non-zero elements”
	Continuing from the last citation, here the sorting operation takes place and the rows are organized by number of non-zero elements either in ascending or descending order. ]
 	distributing rows of the transformed weight matrix to a plurality of processing elements (PEs); 
[ (Col. 1, Lines 54-58) “The exemplary method also includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units”
	This citation teaches that the then sorted rows are distributed to processing elements (computational units). ]
and generating formatted data including one or more data groups each including non-zero values of elements of the transformed weight matrix that are processed in the PEs and column information of the non- zero values.
[ (Table 1) and (Table 2)
	These two tables and the surrounding paragraphs within Column 4 of the reference disclose the full transformation of the original matrix (Table 1) and the end product (Table 2) with the reference disclosing that the element value is disclosed with the first digit representing the row number and second digit representing the column number. ]
	What is not distinctly disclosed by Rub and is instead taught by Zejda is seen below:
A method of formatting a weight matrix in a current layer included in a neural network, the method comprising:
[ (Col. 3, Lines 46-50) “Examples of the present disclosure provide techniques and apparatus for partitioning and reordering block-based matrix multiplications for fast massively parallel general matrix multiplication (GEMM)” ]
[ (Col. 15, Lines 14-16) “According to some examples, the compute array implements one or more layers of a convolutional neural network. In this case, the first matrix may be a weight matrix”
	These two citations from Zejda teach the environment of the method being performed in the context of a neural network with the operations being done on a weight matrix. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for sparse matrix vector multiplication as taught by Rub with the system for memory bandwidth reduction in neural networks as taught by Zejda. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide improvement in both computational speed as well as power savings. [ Zejda (Col. 9 Lines 62-64) ]. This would facilitate the recognized benefit of an increased efficiency in how the system runs both in delivering results to the user and the power drain the system exhibits.

In regards to claim 2, The method of claim 1, is taught by Rub/Zejda as seen in the rejection for claim 1 above, the rest of the claim is taught by Rub as seen below:
wherein the distributing is performed such that a maximum value among sums of row lengths becomes minimized, each of the sums of row lengths representing a sum of row lengths of rows distributed to each of the PEs.
[ (Col. 1, Lines 50-58) “the sorted rows is equal to a number (R) of rows updated in parallel. In addition, the exemplary method includes packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a same length. The exemplary method also includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units”
	This citation from Rub teaches that the sorted rows are performed in a manner where the length of the rows is calculated to fit the computational elements (equivalent to the processing elements) and are distributed in the same manner every cycle. ]

In regards to claim 3, The method of claim 1, is taught by Rub/Zejda as seen in the rejection for claim 1 above, the rest of the claim is taught by Rub as seen below:
wherein performing the row transformation includes rearranging an order of rows of the weight matrix according to the rearrangement information.
[ (Col. 3, Lines 53-59) “In block 202, the method 200 includes sorting the rows of the matrix by the number of the non-zero elements in the rows. In some embodiments, the rows are sorted in ascending order of the numbers of the non-zero elements. In other embodiments, the rows are sorted in descending order of the numbers of the non-zero elements”
	This citation from Rub teaches the row transformation being done according to the arrangement information of ascending or descending number of non-zero elements per row. ]



In regards to claim 6, The method of claim 1, is taught by Rub/Zejda as seen in the rejection for claim 1 above, the rest of the claim is taught by Rub as seen below:
wherein the formatted data further includes row information corresponding to rows associated with the non-zero values of elements of the transformed weight matrix.
[ (Col. 1, Lines 47-54) “The exemplary method also includes splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel. In addition, the exemplary method includes packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a same length”
	This citation shows the end result of the sorting method previously cited concluding with packed rows, these packed rows contain the row information associated with the non-zero values of the elements of the transformed weight matrix. ]

In regards to claim 7, The method of claim 6, is taught by Rub/Zejda as seen in the rejection for claim 6 above, the rest of the claim is taught by Rub as seen below:
wherein the row information includes row numbers corresponding to the rows associated with the non-zero values or encoding information from which the row numbers are decoded.
[ (Tables 1, 2, and 3) and (Col. 4 – Col. 5)
	These tables from the Rub reference teach an example of a sparse matrix (table 1) being converted to the new format (table 2). Table 3 then shows the row formation in a row by row manner which is equivalent to the claim limitation. ]



In regards to claim 8, The method of claim 7, is taught by Rub/Zejda as seen in the rejection for claim 7 above, the rest of the claim is taught by Rub as seen below:
wherein the encoding information includes row lengths of the rows of the transformed weight matrix.
[ (Tables 1, 2, and 3) and (Col. 4 – Col. 5)
	As shown in the tables above, the tables show the length of the rows in table 2 (Group one length is 5, whereas group 2 length is 2) and in table 3 which breaks down each row into its list of elements and forms chains of “equations” to create the row data format. ]

In regards to claim 9, The method of claim 1, is taught by Rub/Zejda as seen in the rejection for claim 1 above, the rest of the claim is taught by Zejda as seen below:
wherein the neural network is a Long Short-Term Memory (LSTM) network including one or more LSTM layers.
[ (Col. 9, Lines 46-52) “Many machine learning and engineering problems can be expressed as sets of linear equations, typically formulated as a set of matrix multiplications (e.g., general matrix multiplication (GEMM)). This includes the most compute-intensive parts of machine learning: CNN's fully connected layers, multi-layer perceptrons, or most of recurrent neural network (RNN)/LSTM (long short-term memory)” (emphasis added)
	This citation teaches that the methodologies in Zejda apply to LSTM networks. ]
	Please refer to the motivation to combine from claim 1. 

In regards to claim 10, The method of claim 9, is taught by Rub/Zejda as seen in the rejection for claim 9 above, the rest of the claim is taught by Rub as seen below:
wherein the weight matrix includes a first weight matrix and a second weight matrix,
[ (Fig. 1A) and (Col. 3, Lines 3-9)
	Reference number 100 in Fig. 1A shows the example matrix vector product. With the matrix containing As being considered the first weight matrix and the matrix containing the Bs being considered the second weight matrix. Examiner notes that the matrices being considered “weight” matrices are taught by Zejda as in the rejection for claim 1 above. ]
 	the first weight matrix being multiplied with an input vector, the second weight matrix being multiplied with a previous output vector, and wherein the method further includes disposing the first weight matrix and the second weight matrix in a row.
[ (Fig. 1A) and (Col. 3, Lines 3-9) “FIG. 1A, identified as prior art, is a block diagram showing standard full matrix vector multiplication. An example 4×4 matrix vector product is shown in block 100. A standard “row times column” approach typically used in scalar computations for the 4×4 matrix vector product is shown in block 102”
	This figure and the accompanying description teach the disposition where the normal (or typical) matrix multiplication format (as seen by reference number 100) is transformed into row format (reference number 102). Examiner notes that the row operations as presented in 1A are the expanded foil format of applicant’s Fig.3A-3C in their specification. ]
	Please refer to the motivation to combine from claim 1.

In regards to claim 11, The method of claim 10, is taught by Rub/Zejda as seen in the rejection for claim 10 above, the rest of the claim is taught by Rub as seen below:
wherein performing the row transformation includes rearranging an order of the rows of the weight matrix according to the rearrangement information.
[ (Col. 3, Lines 53-59) “In block 202, the method 200 includes sorting the rows of the matrix by the number of the non-zero elements in the rows. In some embodiments, the rows are sorted in ascending order of the numbers of the non-zero elements. In other embodiments, the rows are sorted in descending order of the numbers of the non-zero elements”
	This citation and the surrounding paragraphs, teaches the row transformation operations. ]
	Please refer to the motivation to combine from claim 1.

In regards to claim 14, Rub teaches the following:
and a control circuit configured to provide the PE array with formatted data that is generated by formatting a weight matrix such that sums of row lengths of rows of the weight matrix allocated to the plurality of PEs become substantially even,
[ (Col. 1, Lines 54-58) “The exemplary method also includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units” 
This citation teaches that after the row sorting has occurred, for formatted data would be transferred to computational units (equivalent to PE array). Examiner notes that the control circuit is taught by the secondary reference as seen below but the limitation is kept together for clarity of the office action. ]
[ (Col. 4, Lines 51-53) “The overhead is required to make all rows within one group have the same length”
	This citation teaches Rub performing the row operations includes all rows are of the same length. ]
 the weight matrix including a plurality of elements that are arranged in a plurality of rows and a plurality of columns, each of the sums of row lengths representing a sum of row lengths of rows allocated to each of the PEs,
[ (Col. 3, Lines 3-8) “FIG. 1A, identified as prior art, is a block diagram showing standard full matrix vector multiplication. An example 4×4 matrix vector product is shown in block 100. A standard “row times column” approach typically used in scalar computations for the 4×4 matrix vector product is shown in block 102”
	This citation shows that the starting matrix is indeed in normal matrix format which consists of rows and columns. ]
 [ (Col. 4, Lines 51-53) “wherein zero elements in rows 3 and 4 and columns 4 and 5 represent overhead. The overhead is required to make all rows within one group have the same length”
	This citation teaches the row lengths and the overhead representing the extra data passed to the computational units (equivalent to the processing elements) to make sure that the rows are all of the same size length. ]
 	a row length corresponding to a number of elements each having a non-zero value in a row, wherein the formatted data includes non-zero values of elements allocated to each of the plurality of PEs,
[ (Table 2) and (Col. 4 Lines 34-65)
	This table from Rub teaches the method containing a row length (in this example it is a size 5 for group 1 and 2 for group 2) and also shows the distribution to each of the computation units (equivalent to PEs) by the groups. ]
 column numbers corresponding to the non-zero values, and row information from which row numbers corresponding to the non-zero values are decoded, a column number indicating a column to which a non-zero value belongs, a row number indicating a row to which the non-zero value belongs.
[ (Table 3) and (Col. 5 Lines 15-20)
	This table shows another representation that follows from table 2. The table shows that for each MAC operator, the location of each non-zero element with both row and column. Row is designated vertically down and the columns are represented by the “b” number next to the non-zero element. Example being under MAC 1, “13b3” is in the first row and 3rd column. Which is confirmed by looking at the table 1 which shows the original matrix. ]
	What Rub does not distinctly disclose and is instead taught by Zejda is seen below:
An accelerator comprising: a processing element (PE) array including a plurality of PEs;
[ (Col. 4, Lines 18-20) “For some examples, the hardware accelerator(s) 116 include programmable integrated circuits (ICs), such as field programmable gate arrays (FPGAs)”
	Teaches the accelerator. ]
[ (Col. 6, Line 67 – Col. 7 Line 4) “The processing circuits 341 may include an IM2COL circuit (“IM2COL 344”), a read control circuit (“read control 346”), a multiplexer 356, first-in-first-out circuits (“FIFOs 358”), a compute array 362”
	Teaches the compute array which is equivalent to the processing element array. ]
[ (Col. 10, Lines 15-20) “The compute array 600 may have a systolic array structure of compute cores 602 suitable for use in massively parallel GEMM, for example. A compute core 602 may also be referred to as a compute element, data processing unit (DPU), cell, or node”
	Teaches the compute array containing plurality of compute cores (equivalent to PEs) ]
an output register configured to store an output vector output from the PE array;
[ (Col. 8, Lines 14-18) “The support circuits 31 include dedicated circuits, such as transceivers, input/output blocks, digital signal processors, memories, and the like. The logic cells and the support circuits 31 may be interconnected using the programmable interconnect 32”
	This teaches the logic being connected to memory cells (equivalent to an output register) which would hold the output of the computation units. ]
 an input register configured to store an input vector to be provided to the PE array;
[ (Col. 7, Lines 28-30) “An output of the multiplexer 356 is coupled to an input of the FIFOs 358. An output of the FIFOs 358 is coupled to a first input of the compute array 362”
	The FIFO (First-In-First-Out) is a memory structure that is equivalent to a register and the citation above shows that it inputs the data to the compute array. ]
and a control circuit configured to provide the PE array with formatted data that is generated
[ (Col. 5, Lines 4-11) “The microprocessor 212 is configured to execute program code that performs one or more operations described herein and which may be stored in the system memory 216 and/or the storage 218. The support circuits 214 include various devices that cooperate with the microprocessor 212 to manage data flow between the microprocessor 212, the system memory 216, the storage 218, the hardware accelerator 116”
	This citation shows the microprocessor (equivalent to the control circuit) running the instructions that make up the system. Further, the microprocessor is shown to manage the data flow between the other components of the system, including the hardware accelerator. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for sparse matrix vector multiplication as taught by Rub with the system for memory bandwidth reduction in neural networks as taught by Zejda. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide improvement in both computational speed as well as power savings. [ Zejda (Col. 9 Lines 62-64) ]. This would facilitate the recognized benefit of an increased efficiency in how the system runs both in delivering results to the user and the power drain the system exhibits.

In regards to claim 15, The accelerator of claim 14, is taught by Rub/Zejda as seen in the rejection for claim 14 above, the rest of the claim is taught by Rub as seen below:
further comprising: a decoder configured to decode the row numbers corresponding to the non-zero values from the row information.
[ (Tables 1, 2, and 3)
	These tables from the Rub reference teach an example of a sparse matrix (table 1) being converted to the new format (table 2). Table 3 then shows the row formation in a row by row manner which is equivalent to the decoded row numbers. Examiner notes that applicant’s specification points to Fig. 10, reference number 430 for the decoder but does not provide any structure/algorithm for the decoder. As such, the decoder is interpreted to be a software function running on a generic circuit/processor. The citation for which can be seen below with the functionality of the decoder taught already in this citation. Please see the 112(f) section for more details. ]
[ (Col. 8, Lines 36-41) “The computer system 700 of FIG. 7 includes one or more processor unit(s) 710 and main memory 720. Main memory 720 stores, in part, instructions and data for execution by processor unit(s)” ]

In regards to claim 16, The accelerator of claim 14, is taught by Rub/Zejda as seen in the rejection for claim 14 above, the rest of the claim is taught by Zejda as seen below:
further comprising: a state register configured to temporarily store an operation result of the PE array and to provide the PE array with the stored operation result.
[ (Col. 7, Lines 28-35) “An output of the multiplexer 356 is coupled to an input of the FIFOs 358. An output of the FIFOs 358 is coupled to a first input of the compute array 362. An output of the cache 348 is coupled to an input of the read control circuit 350. An output of the read control circuit 350 is coupled to an input of the FIFOs 360. An output of the FIFOs 360 is coupled to a second input of the compute array”
	This citation teaches the FIFO (equivalent to the register) getting the output of the compute array and also sending it back to the compute array as in the claim limitation. ]
	Please see the motivation to combine from claim 14.




In regards to claim 17, The accelerator of claim 14, is taught by Rub/Zejda as seen in the rejection for claim 14 above, the rest of the claim is taught by Rub as seen below:
wherein the formatted data includes a plurality of data groups, and each of the plurality of data groups includes non-zero values to be provided to each of the plurality of PEs and column numbers corresponding to the non-zero values.
[ (Table 3) and (Col. 5 Lines 15-20)
	This table shows another representation that follows from table 2. The table shows that for each MAC operator (equivalent to the PEs), the location of each non-zero element with both row and column. Row is designated vertically down and the columns are represented by the “b” number next to the non-zero element. Example being under MAC 1, “13b3” is in the first row and 3rd column. Which is confirmed by looking at the table 1 which shows the original matrix. ]

In regards to claim 18, The accelerator of claim 17, is taught by Rub/Zejda as seen in the rejection for claim 17 above, the rest of the claim is taught by Rub as seen below:
wherein the control circuit sequentially provides the plurality of data groups to the PE array.
[ (Col. 1, Lines 54-58) “The exemplary method further includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units”
	This citation teaches the system providing the plurality of data groups (finished/updated data to the processing units every clock cycle. ]

In regards to claim 19, Rub teaches the following:
formatted data which is generated by formatting a weight matrix such that sums of row lengths of rows of the weight matrix distributed to the plurality of PEs become substantially even,
[ (Col. 4, Lines 51-53) “wherein zero elements in rows 3 and 4 and columns 4 and 5 represent overhead. The overhead is required to make all rows within one group have the same length”
	This citation teaches the row lengths and the overhead representing the extra data passed to the computational units (equivalent to the processing elements) to make sure that the rows are all of the same size length. ]
 the weight matrix including a plurality of elements that are arranged in a plurality of rows and a plurality of columns,
[ (Col. 3, Lines 3-8) “FIG. 1A, identified as prior art, is a block diagram showing standard full matrix vector multiplication. An example 4×4 matrix vector product is shown in block 100. A standard “row times column” approach typically used in scalar computations for the 4×4 matrix vector product is shown in block 102”
	This citation shows that the starting matrix is indeed in normal matrix format which consists of rows and columns. ]
 each of the sums of row lengths representing a sum of row lengths of rows allocated to each of the PEs, a row length corresponding to a number of elements each having a non-zero value in a row.
[ (Table 2)
	This table from Rub teaches the method containing a row length (in this example it is a size 5 for group 1 and 2 for group 2) and also shows the distribution to each of the computation units (equivalent to PEs) by the groups. ]
What rub does not distinctly disclose and is instead taught by Zejda is seen below:
A system comprising: an accelerator including a plurality of PEs;
[ (Col. 4, Lines 18-20) “For some examples, the hardware accelerator(s) 116 include programmable integrated circuits (ICs), such as field programmable gate arrays (FPGAs)”
	Teaches the accelerator. ]
[ (Col. 6, Line 67 – Col. 7 Line 4) “The processing circuits 341 may include an IM2COL circuit (“IM2COL 344”), a read control circuit (“read control 346”), a multiplexer 356, first-in-first-out circuits (“FIFOs 358”), a compute array 362”
	Teaches the compute array which is equivalent to the processing element array. ]
[ (Col. 10, Lines 15-20) “The compute array 600 may have a systolic array structure of compute cores 602 suitable for use in massively parallel GEMM, for example. A compute core 602 may also be referred to as a compute element, data processing unit (DPU), cell, or node”
	Teaches the compute array containing plurality of compute cores (equivalent to PEs) ]
and a neural network application circuit configured to control an inference operation performed in the accelerator by providing the accelerator with an input vector generated from an input signal and formatted data
[ (Col. 5, Lines 4-11) “The microprocessor 212 is configured to execute program code that performs one or more operations described herein and which may be stored in the system memory 216 and/or the storage 218. The support circuits 214 include various devices that cooperate with the microprocessor 212 to manage data flow between the microprocessor 212, the system memory 216, the storage 218, the hardware accelerator 116”
	This citation shows the microprocessor (equivalent to the circuit able to control the operations) running the instructions that make up the system. Further, the microprocessor is shown to manage the data flow between the other components of the system, including the hardware accelerator. Examiner notes that the formatted data of the claim limitation is taught by the primary reference above but is repeated here for clarity of the office action. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for sparse matrix vector multiplication as taught by Rub with the system for memory bandwidth reduction in neural networks as taught by Zejda. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide improvement in both computational speed as well as power savings. [ Zejda (Col. 9 Lines 62-64) ]. This would facilitate the recognized benefit of an increased efficiency in how the system runs both in delivering results to the user and the power drain the system exhibits.

In regards to claim 20, The system of claim 19, is taught by Rub/Zejda as seen in the rejection for claim 19 above, the following pieces of the claim is taught by Rub as seen below:
and a control circuit configured to provide the PE array with the formatted data of the weight matrix, 
[ (Col. 1, Lines 54-58) “The exemplary method also includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units” 
This citation teaches that after the row sorting has occurred, for formatted data would be transferred to computational units (equivalent to PE array). Examiner notes that the control circuit is taught by the secondary reference as seen below but the limitation is kept together for clarity of the office action. ]
wherein the formatted data includes non-zero values of elements allocated to each of the plurality of PEs, 
[ (Table 3)
	This table teaches the formatted data including the non-zero values being allocated to each of the MACs (equivalent to the PEs) ]
column numbers corresponding to the non-zero values, and row information from which row numbers corresponding to the non-zero values are decoded, a column number indicating a column to which a non-zero value belongs, a row number indicating a row to which the non-zero value belongs.
[ (Table 3)
	This table shows another representation that follows from table 2. The table shows that for each MAC operator, the location of each non-zero element with both row and column. Row is designated vertically down and the columns are represented by the “b” number next to the non-zero element. Example being under MAC 1, “13b3” is in the first row and 3rd column. Which is confirmed by looking at the table 1 which shows the original matrix. ]
	What Rub fails to distinctly disclose and is instead taught by Zejda is seen below:
wherein the accelerator comprises: a processing element (PE) array including the plurality of PEs;
[ (Col. 4, Lines 18-20) “For some examples, the hardware accelerator(s) 116 include programmable integrated circuits (ICs), such as field programmable gate arrays (FPGAs)”
	Teaches the accelerator. ]
[ (Col. 6, Line 67 – Col. 7 Line 4) “The processing circuits 341 may include an IM2COL circuit (“IM2COL 344”), a read control circuit (“read control 346”), a multiplexer 356, first-in-first-out circuits (“FIFOs 358”), a compute array 362”
	Teaches the compute array which is equivalent to the processing element array. ]
[ (Col. 10, Lines 15-20) “The compute array 600 may have a systolic array structure of compute cores 602 suitable for use in massively parallel GEMM, for example. A compute core 602 may also be referred to as a compute element, data processing unit (DPU), cell, or node”
	Teaches the compute array containing plurality of compute cores (equivalent to PEs) ]
an output register configured to store an output vector output from the PE array;
[ (Col. 8, Lines 14-18) “The support circuits 31 include dedicated circuits, such as transceivers, input/output blocks, digital signal processors, memories, and the like. The logic cells and the support circuits 31 may be interconnected using the programmable interconnect 32”
	This teaches the logic being connected to memory cells (equivalent to an output register) which would hold the output of the computation units. ]
 an input register configured to store the input vector to be provided to the PE array;
[ (Col. 7, Lines 28-30) “An output of the multiplexer 356 is coupled to an input of the FIFOs 358. An output of the FIFOs 358 is coupled to a first input of the compute array 362”
	The FIFO (First-In-First-Out) is a memory structure that is equivalent to a register and the citation above shows that it inputs the data to the compute array. ]
	Please refer to the motivation to combine from claim 19.

Claim(s) 4, 5, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Rub/Zejda as applied above, and further in view of Mathworks (“Permutation and Reordering”).

In regards to claim 4, The method of claim 3, is taught by Rub/Zejda as seen in the rejection for claim 3 above, Zejda continues teaching the following:
wherein the neural network further includes a previous layer of the current layer
[ (Col. 1, Lines 27-29) “The deep learning algorithm may be implemented using layers of an artificial neural network (ANN) (referred to herein as a “neural network”)” 
This citation and the one below it show that there are multiple layers in the neural network as taught by Zejda and that they follow a linear order (previous, next and current layers). ]
[ (Col. 15, Lines 16-19) “In this case, the first matrix may be a weight matrix, and the second matrix may be an input data matrix, which may include an image matrix, voice samples, or channels of data from activation functions of a previous neural network layer” ]
according to the rearrangement information generated at the previous layer.
[ (Col. 15, Lines 16-19) “In this case, the first matrix may be a weight matrix, and the second matrix may be an input data matrix, which may include an image matrix, voice samples, or channels of data from activation functions of a previous neural network layer” 
This citation shows that the matrices of the previous layers within the neural network can be passed forward to following layers. ]
	What Rub/Zejda do not distinctly disclose and is instead taught by Mathworks is seen below:
 and wherein performing the column transformation includes rearranging an order of columns of the weight matrix 
[ (Pg. 2, “Reordering for Sparsity”) “Reordering the columns of a matrix can often make its LU or QR factors sparser. Reordering the rows and columns can often make its Cholesky, factors sparser. The simplest such reordering is to sort the columns by nonzero count”
	This citation from Mathworks teaches reordering (or sorting) the columns of the matrix by the total amount of non-zero elements. Examiner notes that the matrix being a weight matrix is taught by Zejda. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for sparse matrix vector multiplication as taught by Rub/Zejda with column operations as taught by Mathworks. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide increased efficiency with respect to certain matrix operations. [ Mathworks (“Reordering for Sparsity”) ]. This would facilitate the recognized benefit of an increased efficiency in completing the matrix operations which would provide an efficiency boost to the system overall. 

In regards to claim 5, The method of claim 3, is taught by Rub/Zejda as seen in the rejection for claim 3 above, Mathworks continues teaching the following:
wherein performing the column transformation includes rearranging one or more columns of the weight matrix according to the rearrangement information.
[ (Pg. 2, “Reordering for Sparsity”) “Reordering the columns of a matrix can often make its LU or QR factors sparser. Reordering the rows and columns can often make its Cholesky, factors sparser. The simplest such reordering is to sort the columns by nonzero count”
	This citation from Mathworks teaches reordering (or sorting) the columns of the matrix by the total amount of non-zero elements. Examiner notes that the matrix being a weight matrix is taught by Zejda. ]
	Please refer to the motivation to combine from claim 4. 

In regards to claim 12, The method of claim 11, is taught by Rub/Zejda as seen in the rejection for claim 11 above, the rest of the claim is taught by Mathworks as seen below:
wherein performing the column transformation includes rearranging an order of columns of the second weight matrix.
[ (Pg. 2, “Reordering for Sparsity”) “Reordering the columns of a matrix can often make its LU or QR factors sparser. Reordering the rows and columns can often make its Cholesky, factors sparser. The simplest such reordering is to sort the columns by nonzero count”
	This citation from Mathworks teaches reordering (or sorting) the columns of the matrix by the total amount of non-zero elements. Examiner notes that the “second weight” matrix is taught by the previous claims which claim 12 relies upon and that Mathworks is relied upon to teach the column transformation including rearranging an order of columns. ]
	Please see the motivation to combine from claim 4. 

In regards to claim 13, The method of claim 12, is taught by Rub/Zejda as seen in the rejection for claim 12 above, the rest of the claim is taught by Zejda as seen below:
wherein the neural network further includes a previous layer of the current layer,
[ (Col. 1, Lines 27-29) “The deep learning algorithm may be implemented using layers of an artificial neural network (ANN) (referred to herein as a “neural network”)” 
This citation and the one below it show that there are multiple layers in the neural network as taught by Zejda and that they follow a linear order (previous, next and current layers). ]
[ (Col. 15, Lines 16-19) “In this case, the first matrix may be a weight matrix, and the second matrix may be an input data matrix, which may include an image matrix, voice samples, or channels of data from activation functions of a previous neural network layer” ]
according to the rearrangement information generated at the previous layer.
[ (Col. 15, Lines 16-19) “In this case, the first matrix may be a weight matrix, and the second matrix may be an input data matrix, which may include an image matrix, voice samples, or channels of data from activation functions of a previous neural network layer” 
This citation shows that the matrices of the previous layers within the neural network can be passed forward to following layers. ]
What is not distinctly disclosed by Rub/Zejda and is instead taught by Mathworks is seen below:
and wherein performing the column transformation includes rearranging an order of columns of the first weight matrix 
[ (Pg. 2, “Reordering for Sparsity”) “Reordering the columns of a matrix can often make its LU or QR factors sparser. Reordering the rows and columns can often make its Cholesky, factors sparser. The simplest such reordering is to sort the columns by nonzero count”
	This citation from Mathworks teaches reordering (or sorting) the columns of the matrix by the total amount of non-zero elements. Examiner notes that the matrix being a first weight matrix is taught in the previous claim rejection on which claim 13 relies. Mathworks is relied upon to teach the column transformation including the arranging an order of columns. ]
	Please see the motivation to combine from claim 4. 


Claims 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Rub/Zejda as applied above, and further in view of Byun (US 20190087729 A1).

In regards to claim 21, The system of claim 19, is taught by Rub/Zejda as seen in the rejection for claim 19 above, the rest of the claim is taught by Rub as seen below:
 	and a neural network format circuit configured to generate the formatted data.
[ (Col. 8, lines 14-17) “A memory (e.g., non-transitory computer readable storage medium) may store, at least in part, instructions and data for execution by processor 620”
	This teaches the processor (equivalent to the circuit of the claim limitation) with the instructions on it to perform the operations detailed in the reference, including the formatting of data seen below. ]
[ (Col. 1, Lines 54-58) “The exemplary method also includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units” 
This citation teaches that after the row sorting has occurred, formatted data would be transferred to computational units (equivalent to PE array) with the formatted data being the packed rows. ]
	What is not distinctly disclosed by Rub and is instead taught by Byun is seen below:
further comprising: a neural network generation circuit configured to train a neural network for generating the weight matrix and to prune the weight matrix;
[ (¶0019) “As shown in FIG. 1, the memory 104 includes a CNN 108. In some examples, the CNN 108 is built, trained, and utilized by the processor 102 to detect and classify content”
	Teaches the neural network being trained. ]
[ (¶0017) “The processor 102 includes various computing circuitry, such as a control unit, an arithmetic-logic unit, and register memory, that can execute instructions defined by an instruction set”
	Teaches the circuit with the circuit able to perform any of the actions taught within the reference via the instruction set. ]
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio”
	Teaches the pruning of the weight matrix. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for sparse matrix vector multiplication as taught by Rub/Zejda with the convolutional neural network tuning and pruning as taught by Byun. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the references would provide increased accuracy and computational efficiency. [ Byun (Abstract) ]. This would facilitate the recognized benefit of an increased efficiency in completing the matrix operations which would provide an efficiency boost to the system overall while maintaining or improving the accuracy which results in a more accurate model. 

In regards to claim 22, The system of claim 21, is taught by Rub/Zejda as seen in the rejection for claim 21 above, the rest of the claim is taught by Rub as seen below:
wherein the neural network format circuit generates the formatted data based on a number of the plurality of PEs.
[ (Col. 1, Lines 54-58) “The exemplary method also includes providing, per clock cycle, C elements of the packed rows to computational units in the SIMD architecture, wherein C is the number of computational units” 
This citation teaches the formatted data being generated and transferred based on how many computational units there are (C in the equation). ]
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 10127495 B1 – Reducing the size of a neural network through the reduction of the weight matrices which teaches removing the values from a matrix and other various matrix operations
US 10810484 B2 – Hardware accelerator for compressed GRU on FPGA which teaches matrix encoding and decoding, compressed row storage, different representations of a matrix and processing elements to perform the operations
US 20210286789 A1 – Area allocation device which teaches column reordering and row reordering and various other matrix operations

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL MERABI whose telephone number is (571)272-9685. The examiner can normally be reached Mon-Fri 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.A.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145