DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix” in claims 3, 11, and 19;
“bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix” in claims 4, 12, and 20
Examiner notes that the term “tensor permutation engine” in not interpreted under 35 U.S.C. 112(f) because the generic placeholder “engine” is modified by sufficient structure, material, or acts for performing the claimed function. Examiner further notes that the read address generation unit (AGU) and write address generation units (AGU) are also not interpreted under 35 U.S.C. 112(f) because the word “unit” is modified by “address generation” that denotes a type of structural device (i.e. address generation unit) with a generally understood meaning in the computer arts.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-10 and 16-22 of U.S. Patent No. 10,908,906. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-20 are anticipated by claims 1-10 and 16-22 of the ‘906 patent as indicated in the table below.

Instant Application (No. 17/131424)
U.S Patent 10,908,906
9. A method comprising: 

generating, by a read address generation unit (AGU), a plurality of read addresses for a plurality of tensor data elements in a first storage; generating, by a write AGU, a plurality of write addresses for the plurality of tensor data elements in the first storage; 

reading, by a parallel input register of a shuffle register bank (SRB), a first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU; 

receiving, by a first register bank of the shuffle register bank, the first subset of the plurality of tensor data elements; 

receiving, by a shift register of the shuffle register bank, a tensor data element from each bank in the first register bank; and writing each tensor data element in the shift register to a write address from the plurality of write addresses generated by the write AGU.
1. A method comprising: 

generating, by a read address generation unit (AGU), a plurality of read addresses for a plurality of tensor data elements in a first storage; generating, by a write AGU, a plurality of write addresses for the plurality of tensor data elements in the first storage; 

reading, by a parallel input register of a shuffle register bank (SRB), a first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU; 

receiving, by a first register bank of the shuffle register bank, the first subset of the plurality of tensor data elements; 

receiving, by a shift register of the shuffle register bank, a tensor data element from each bank in the first register bank; and writing each tensor data element in the shift register to a write address from the plurality of write addresses generated by the write AGU, 

wherein the SRB includes the first register bank and a second register bank, and wherein when the first register bank is filled with the first subset of the plurality of tensor data elements, the shift register connects to the first register bank and outputs the first subset of the plurality of tensor data elements while a second subset of the plurality of tensor data elements is read into a second register bank through the parallel input register.
10. The method of claim 8 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension.
2. The method of claim 1 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension.
11. The method of claim 9 wherein the read address generation unit further comprises: a read counter, a value of the read counter to increment each time a tensor data element is read; and a bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix.
3. The method of claim 1 wherein the read AGU further comprises: a read counter, a value of the read counter to increment each time a tensor data element is read; and a bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix.
12. The method of claim 9 wherein the write AGU further comprises: a write counter, a value of the write counter to increment each time a tensor data element is written; and a bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix.
4. The method of claim 1 wherein the write AGU further comprises: a write counter, a value of the write counter to increment each time a tensor data element is written; and a bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix.
13. The method of claim 9 wherein the SRB includes the first register bank and a second register bank, and wherein when the first register bank is filled with the first subset of the plurality of tensor data elements, the shift register connects to the first register bank and outputs the first subset of the plurality of tensor data elements while a second subset of the plurality of tensor data elements is read into a second register bank through the parallel input register.
1. A method comprising:  . . . wherein the SRB includes the first register bank and a second register bank, and wherein when the first register bank is filled with the first subset of the plurality of tensor data elements, the shift register connects to the first register bank and outputs the first subset of the plurality of tensor data elements while a second subset of the plurality of tensor data elements is read into a second register bank through the parallel input register.
14. The method of claim 9 wherein an output of the SRB overwrites the first subset of the plurality of tensor data elements in the first storage.
5. The method of claim 1 wherein an output of the SRB overwrites the first subset of the plurality of tensor data elements in the first storage.
15. The method of claim 9 wherein an output of the SRB is written to a second storage.
6. The method of claim 1 wherein an output of the SRB is written to a second storage.
16. The method of claim 9 wherein the first storage is a level 2 cache or a last level cache.
7. The method of claim 1 wherein the first storage is a level 2 cache or a last level cache.
17. A tensor permutation engine (TPE), comprising: 

a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in the first storage; 








a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage; and 

a shuffle register bank (SRB) comprising: a parallel input register to read a first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU; a first register bank to receive the first subset of the plurality of tensor data elements; and a shift register to receive a tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
8. A tensor permutation engine (TPE), comprising: 


a read address generation unit (AGU) to generate a plurality of read addresses for a plurality of tensor data elements in a first storage, 

wherein the read AGU comprises: a read counter, a value of the read counter to increment each time a tensor data element is read; and a first bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix; 

a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage; and 

a shuffle register bank (SRB) comprising: a parallel input register to read a first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU; a first register bank to receive the first subset of the plurality of tensor data elements; and a shift register to receive a tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
18. The TPE of claim 17 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension.
9. The TPE of claim 8 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension.
19. The TPE of claim 17 wherein the read address generation unit further comprises: a read counter, a value of the read counter to increment each time a tensor data element is read; and a bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix.
8. A tensor permutation engine (TPE), comprising: 
…
wherein the read AGU comprises: a read counter, a value of the read counter to increment each time a tensor data element is read; and a first bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix;
20. The TPE of claim 17 wherein the write AGU further comprises: a write counter, a value of the write counter to increment each time a tensor data element is written; and a bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix.
10. The TPE of claim 8 wherein the write AGU further comprises: a write counter, a value of the write counter to increment each time a tensor data element is written; and a second bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix.
1. A processor comprising: 

a first storage to store one or more tensors, each tensor comprising a plurality of tensor data elements organized on a first dimension; 

a tensor permutation engine (TPE) to reorganize the plurality of tensor data elements on a second dimension, the first storage accessible to the TPE, the TPE comprising: 

a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in the first storage; a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage; and a shuffle register bank (SRB) comprising: a parallel input register to read a first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU; a first register bank to receive the first subset of the plurality of tensor data elements; and a shift register to receive a tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
16. A processor comprising: 

a first storage to store one or more tensors, each tensor comprising a plurality of tensor data elements organized on a first dimension; 

a tensor permutation engine (TPE) to reorganize the plurality of tensor data elements on a second dimension, the first storage accessible to the TPE, the TPE comprising: 

a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in the first storage, wherein the read AGU comprises: 

a read counter, a value of the read counter to increment each time a tensor data element is read; and a first bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix; a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage; and a shuffle register bank (SRB) comprising: a parallel input register to read a first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU; a first register bank to receive the first subset of the plurality of tensor data elements; and a shift register to receive a tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
2. The processor of claim 1 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension.
17. The processor of claim 16 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension.
4. The processor of claim 1 wherein the write AGU further comprises: a write counter, a value of the write counter to increment each time a tensor data element is written; and a bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix.
18. The processor of claim 16 wherein the write AGU further comprises: a write counter, a value of the write counter to increment each time a tensor data element is written; and a second bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix.
5. The processor of claim 1 wherein the SRB includes the first register bank and a second register bank, and wherein when the first register bank is filled with the first subset of the plurality of tensor data elements, the shift register connects to the first register bank and outputs the first subset of the plurality of tensor data elements while a second subset of the plurality of tensor data elements is read into a second register bank through the parallel input register.
19. The processor of claim 16 wherein the SRB includes the first register bank and a second register bank, and wherein when the first register bank is filled with the first subset of the plurality of tensor data elements, the shift register connects to the first register bank and outputs the first subset of the plurality of tensor data elements while a second subset of the plurality of tensor data elements is read into a second register bank through the parallel input register.
6. The processor of claim 1 wherein an output of the SRB overwrites the first subset of the plurality of tensor data elements in the first storage.
20. The processor of claim 16 wherein an output of the SRB overwrites the first subset of the plurality of tensor data elements in the first storage.
7. The processor of claim 1 wherein an output of the SRB is written to a second storage.
21. The processor of claim 16 wherein an output of the SRB is written to a second storage.
8. The processor of claim 1 wherein the first storage is a level 2 cache or a last level cache.
22. The processor of claim 16 wherein the first storage is a level 2 cache or a last level cache.



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3, 4, 10-12, 19, and 20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 3, 11, and 19 recite a “bit shuffle unit to generate the plurality of read addresses based on the read counter value and a read bit matrix” which invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the specification at paragraph 144-150 describes usage of bit shuffle units generally by reciting the function and providing examples. However, how the function is performed is not specified. Mere recitation of the function with potential examples is not disclosure the corresponding structure, material, or acts for performing the entire claimed function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Claims 4, 12, and 20 recite a “bit shuffle unit to generate the plurality of read addresses based on the write counter value and a write bit matrix” which invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the specification at paragraph 144-150 describes usage of bit shuffle units generally by reciting the function and providing examples. However, how the function is performed is not specified. Mere recitation of the function with potential examples is not disclosure the corresponding structure, material, or acts for performing the entire claimed function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim 10 recites “the method of claim 8”. However, claim 8 recites a processor, not a method, and, therefore, this statement renders the claim indefinite. Independent claim 9 recites a method and it appears that applicant intended claim 10 to recite “the method of claim 9”. Therefore, claim 10 will be understood to depend from claim 9 for the remainder of the examination.
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 3, 4, 11, 12, 19, and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. As described above regarding the rejection of claims 3, 4, 11, 12, 19, and 20 under 35 U.S.C. 112(b), the disclosure doesn’t provide adequate structure to perform the claimed functions of “generate the plurality of read addresses based on the read counter value and a read bit matrix” and “generate the plurality of read addresses based on the write counter value and a write bit matrix”. The specification doesn’t demonstrate that applicant has made an invention that achieves the claimed function because the invention isn’t described with sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor had possession of the claimed invention.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 6, 7, 9, 10, 14, 15, 17, 18, 22, 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aly et al., U.S. Patent No. 10,664,241 (hereinafter Aly) in view of Won et al., U.S. Patent No. 7,797,362 (hereinafter Won).
Regarding claims 1, 9, and 17, taking claim 1 as exemplary, Aly teaches a processor comprising: 
a first storage to store one or more tensors, each tensor comprising a plurality of tensor data elements organized on a first dimension [Memory 105 stores vectors in a particular direction-major order wherein the vectors are n-dimensional arrays (i.e. tensors). Column 3, lines 54-61; Column 4, lines 47-54]; 
a tensor permutation engine (TPE) to reorganize the plurality of tensor data elements on a second dimension, the first storage accessible to the TPE [System 100. Column 3, line 54 – column 4, line 2; Column 4, lines 43-46; FIG.1], 
the TPE comprising: 
a shuffle register bank (SRB) [Transpose memory 110] comprising: 
a parallel input register to read a first subset of the plurality of tensor data elements from the plurality of read addresses [The registers into which the elements are shifted into the transpose memory 110 from the memory 105. Column 4, lines 17-26; Column 5, lines 4-28; FIG. 1]; 
a first register bank to receive the first subset of the plurality of tensor data elements [A particular row or column of memory cells, each comprising a register (FIG. 2), that receives the input elements. Column 5, lines 4-28; FIG. 1]; and 
a shift register to receive a tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses [The row or column memory cells function as shift registers that shift the output from the transpose memory. Column 5, lines 4-28].
Aly doesn’t explicitly teach the TPE comprises a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in the first storage, wherein the first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU, and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage, wherein the tensor data elements in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU. That is, although Aly teaches reading and writing elements from the memory, Aly is silent on how this is perform and by what circuity. However, in the same field of matrix transposition, Won teaches: a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in the first storage, wherein the first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU [Input memory control 110 generates read address from which the elements are read from memory and into the input registers. Column 2, lines 36-55; FIG. 1], and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage, wherein the tensor data elements in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU [Output memory control 111 generates write address for the data elements of the transposition output into memory which the elements are read from memory and into the input registers. Column 2, lines 36-55; FIG. 1]. Therefore, It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Aly’s transpose system to further include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in the first storage, wherein the first subset of the plurality of tensor data elements from the plurality of read addresses generated by the read AGU, and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage, wherein the tensor data elements in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU, as taught by Won, because doing so would provide the required addressing components to read from and write to Aly’s memory.

Regarding claims 2, 10, and 18, taking claim 2 as exemplary, Aly and Won teach the processor of claim 1 wherein the plurality of tensor data elements are read from sequential memory on a first dimension, and wherein the plurality of tensor data elements are to be written in sequential memory on a second dimension [The data is stored in a row-major order (i.e. sequential in a first dimension) and are written in column-major order (i.e. sequential in a second dimension). Aly at column 4, lines 3-15; 47-54].

Regarding claim 6 and 14, taking claim 6 as exemplary, Aly and Won teach the processor of claim 1 wherein an output of the SRB overwrites the first subset of the plurality of tensor data elements in the first storage [The output of the transpose memory is stored back to same memory 105, thereby overwriting the first subset. See Aly at column 4, lines 16-36].

Regarding claim 7 and 15, taking claim 7 as exemplary, Aly and Won teach the processor of claim 1. Aly doesn’t teach that an output of the SRB is written to a second storage. However, Won further teaches that the output of transposition is written into different storage than the storage used for the input (i.e. a second storage) [Transposition output is written to output memory bank 309 (i.e. second storage), which is different from the input memory bank 301. Column 2, lines 36-55; FIG. 3]. Writing the output of transposition to a different storage than the storage used for the input (i.e. a second storage) enables read/writing to memory without increasing the required memory bandwidth, as would have been recognized by a person of ordinary skill in the art. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Aly’s system so that an output of the SRB (i.e. transposition output) is written to a second storage, as taught by Won, because doing so would enable reading and writing to memory without increasing the required memory bandwidth.


Claims 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aly in view of Won and, further, in view of Blomgren et al., U.S. Patent No. 6,898,691 (hereinafter Blomgren).

Regarding claim 8 and 16, taking claim 8 as exemplary, Aly and Won teach the processor of claim 1. wherein the first storage is a level 2 cache or a last level cache. Aly doesn’t teach that the first storage is a level 2 cache or a last level cache. In the same field of matrix transposition, Blomgren teaches performing matrix operations using a matrix processor that receives input from a level 2 cache [Blomgren at column 4, lines 58-62]. Coupling matrix unit to storage that is a level 2 cache or last level cache enables the provision data at the high bandwidth required by the matrix unit [Blomgren at column 4, lines 58-62]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Aly’s transpose system so that the first storage in a level 2 cache, as taught by Blomgren, because doing so would facilitate the high bandwidth necessary for matrix operations [Blomgren at column 4, lines 58-62].


Allowable Subject Matter
Claims 3-4, 11-12, 19, and 20 are rejected: on grounds of nonstatutory double patenting,  under 35 U.S.C. 112(a), and under 35 U.S.C. 112(b) and depend upon a rejected base claims. However, the claim would be would be allowable if rewritten or amended to overcome these rejections and rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 5 and 13 are rejected on grounds of nonstatutory double patenting and depend upon a rejected base claim. However, the claims would be would be allowable if rewritten or amended to overcome these rejections and rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123