Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This Office Action is in response to Application No. 17/336701 filed on June 2, 2021. Claims 1-20 are presented for examination and are currently pending.

Election/Restrictions
Restriction to one of the following inventions is required under 35 U.S.C. 121:
I. Claims 1-16 and 18, drawn to data buffering, classified in G06F 3/0656.
II. Claims 17 and 19-20, drawn to integrated circuit manufacturing, classified in H05K (PRINTED CIRCUITS;  MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS).
The inventions are independent or distinct, each from the other because:
Inventions I and II are related as process of making and product made.  The inventions are distinct if either or both of the following can be shown: (1) that the process as claimed can be used to make another and materially different product or (2) that the product as claimed can be made by another and materially different process (MPEP § 806.05(f)).  In the instant case the process as claimed can be used to make a materially different product such as any other integrated circuit.
During a telephone conversation with Vincent DeLuca (Reg. No. 32408) on August 18, 2022 a provisional election was made with traverse to prosecute the invention of Group I, claims 1-16 ad 18.  Affirmation of this election must be made by applicant in replying to this Office action.  Claims 17 and 19-20 are withdrawn from further consideration by the examiner, 37 CFR 1.142(b), as being drawn to a non-elected invention.
Restriction for examination purposes as indicated is proper because all the inventions listed in this action are independent or distinct for the reasons given above and there would be a serious search and/or examination burden if restriction were not required because one or more of the following reasons apply:
the inventions have acquired a separate status in the art in view of their different classification; and 
the inventions have acquired a separate status in the art due to their recognized divergent subject matter.
Applicant is advised that the reply to this requirement to be complete must include (i) an election of an invention to be examined even though the requirement may be traversed (37 CFR 1.143) and (ii) identification of the claims encompassing the elected invention. 
The election of an invention may be made with or without traverse. To reserve a right to petition, the election must be made with traverse. If the reply does not distinctly and specifically point out supposed errors in the restriction requirement, the election shall be treated as an election without traverse. Traversal must be presented at the time of election in order to be considered timely. Failure to timely traverse the requirement will result in the loss of right to petition under 37 CFR 1.144. If claims are added after the election, applicant must indicate which of these claims are readable upon the elected invention.
Should applicant traverse on the ground that the inventions are not patentably distinct, applicant should submit evidence or identify such evidence now of record showing the inventions to be obvious variants or clearly admit on the record that this is the case. In either instance, if the examiner finds one of the inventions unpatentable over the prior art, the evidence or admission may be used in a rejection under 35 U.S.C. 103 or pre-AIA  35 U.S.C. 103(a) of the other invention.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Baeckler (US 9330740).
Regarding claim 1, Yang teaches a hardware unit for manipulating data stored in a memory, the hardware unit comprising:
an internal buffer [internal memory 140; FIG. 1 and ¶0026];
a memory reading block, configured to read the data from the memory and write the data to the internal buffer [the data moving controller 120 may move the kernel data and the input data (for example, image data) stored in the external memory 190 into the internal memory 140; ¶0026 and First controller 123 on FIG. 3]; 
a memory writing block, configured to read the data from the internal buffer and write the data to the memory [the data moving controller 120 may move the output data, which is output as a result of the operation performed by the operator 130, from the internal memory 140 to the external memory 190; ¶0026 and Second Controller 125 on FIG. 3].
Yang, however, does not explicitly teach a control channel between the memory reading block and the memory writing block, wherein the memory reading block and the memory writing block are configured to communicate via the control channel to maintain synchronization between them when writing the data to the internal buffer and reading the data from the internal buffer, respectively.
Yang, on the one hand, discloses an address generator circuit wherein to read the output data of the second multidimensional array from the internal memory 140, the second address generator 121b may execute the multiple nested loops of the number of dimensions of the matrix or greater to generate the read address; and wherein to write the output data of the second multidimensional array read from the internal memory 140 into the external memory 190, the first address generator 121a may generate the write address for reordering the output data of the second multidimensional array into the data of the first multidimensional array [¶0042 and FIG. 3].
On the other hand, Baeckler, when addressing issues regarding buffers and controlling accesses to buffers, discloses a synchronization mechanism for a buffer wherein the FIFO circuit 100 is a dual clock FIFO circuit that provides data from a write clock domain to a read clock domain; wherein timing circuits in the write clock domain are clocked by a write clock signal WCK; wherein timing circuits in the read clock domain are clocked by a read clock signal RCK; and wherein FIFO circuit 100 can provide data from the write clock domain to the read clock domain without generating errors in the data [c3 L45-55]. Baeckler further discloses that the read pointer output by the second synchronizer circuit is compared to the write pointer within the first clock domain to generate an indication of when the FIFO circuit is full; wherein when the FIFO circuit is full, additional data cannot be stored in the FIFO circuit without overwriting data that has not yet been read from the FIFO circuit [c2 L30-35]. Thus, Baeckler teaches wherein the memory reading block and the memory writing block are configured to to maintain synchronization between them when writing the data to the internal buffer and reading the data from the internal buffer, respectively [c2 L30-35 and c3 L45-55].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to synchronizes (i.e. through the address generator 121 on Yang) the reading and writing of the buffers (i.e. internal memory 140 on Yang) by comparing the generated addresses to determine when buffers are full as disclosed in Baeckler. The combination would have be obvious because a person of ordinary skill in the art would understand that synchronization is necessary because when the buffer is full, additional data cannot be stored in the buffer circuit without overwriting data that has not yet been read from the buffer circuit [c2 L30-35 on Baeckler].
Regarding claim 10, Yang/Baeckler teach the hardware unit of claim 1, wherein each of the memory reading block and the memory writing block has a respective synchronization counter, the blocks being configured to communicate their synchronization counters with each other via the control channel, wherein the hardware unit is configured to maintain synchronization between the blocks by comparing the synchronization counters [write pointer generator circuit 102 may include a counter circuit that generates numerically consecutive write addresses that increase by 1 ( or decrease by 1) in each period of the write clock signal WCK (c4 L25-30 on Baeckler); and wherein read pointer generator circuit 103 may include a counter circuit that generates numerically consecutive values for the read addresses in response to read clock signal RCK (c5 L55-60 on Baeckler)].
Claim(s) 2 and 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Baeckler (US 9330740); and further in view of Schmidt (US 8,095,745).
Regarding claim 2, Yang/Baeckler teach the hardware unit of claim 1, wherein the data comprises a multidimensional array comprising a plurality of data elements, wherein at least one of the memory reading block and the memory writing block is configured to traverse the multidimensional array using a plurality of nested loops [to read the input data of the form of the multidimensional matrix from the external memory 190, the first address generator 121a may generate a read address of the external memory 190 by executing multiple nested loops of the number of dimensions of the matrix or greater; ¶0038 on Yang]. 
Yang/Baeckler, however, does not explicitly teach each loop having associated with it a corresponding stride between data elements of the multidimensional array.
Schmidt, when addressing issues relating to memory management on information processing systems, teaches each loop having associated with it a corresponding stride between data elements of the multidimensional array [the address is then incremented to the next row by incrementing the line spacing (c10 L45); wherein A2 incremented by the column counter ( col.) multiplied by the burst length (BL) (c10 L60-65); and wherein Accordingly, the column counter (col.) is incremented to the next column 304B (c11 L5). That is the loops are incremented by a stride amount equal to the size of the rows, columns, area of memory holding requested data, burst length, etc.].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to process the multidimensional array of Yang using the nested loop scheme shown in Schmidt. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. using a nested loop with a block size as determined by the buffer depth) to a known device ready for improvement to yield predictable results.
Regarding claim 4, Yang/Baeckler/Schmidt teach the hardware unit of claim 2, wherein each loop of the plurality of loops is configured to perform a variable number of iterations, the variable number being selected at runtime from a group comprising: a first number of iterations to be performed when one or more outer loops of the plurality of nested loops are not in their end iteration; and a second number of iterations to be performed when the one or more outer loops of the plurality of nested loops are in their end iteration [FIG. 5; loop 508-516 iterates a variable number of times depending on blocks 512 and 514 as a function of whether all of the rows 302 for a given column 304A have been read (i.e. the loop is at its end) of if there are remaining rows (i.e. the loop it still on its initial iterations, or there are more rows to be read); c10 L35 and c11 L1-5 on Schmidth].
Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Baeckler (US 9330740); and further in view of Schmidt (US 8,095,745); and still further in view of Gropp (Matrix Transpose, November 23, 2018).
Regarding claim 3, Yang/Baeckler/Schmidt explicitly teach all the claim limitations except for the hardware unit of claim 2, wherein at least one loop of the plurality of nested loops is configured to iterate a different number of times depending on at least one of: (a) a loop index of at least one other loop of the plurality of nested loops; and (b) a software configurable flag.
Gropp, when disclosing how to transpose matrixes in computer system with buffer memories (i.e. caches), teaches wherein at least one loop of the plurality of nested loops is configured to iterate a different number of times depending on at least one of: (a) a loop index of at least one other loop of the plurality of nested loops [do j=jj,min(n,jj+stride-1); do i=ii,min(n,ii+stride-1); b(i,j) = a(j,i); Loop Reordering on slide 15]; and (b) a software configurable flag.
That is, Schmidt discloses a system and method wherein the DMA controller 102 effectively transposes the array of data as it is transferred, sequentially providing each column vector of the array [c7 L30-50]. Schmidt further shows how to transpose the matrix using nested loops [FIG. 5 and FIG. 6]. Gropp when disclosing the more general case of transposing matrices on a general computer shows several methods using different arrangements of nested loops [slides 2 and 13-15].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to transpose the matrix of Schmidt using the loop reordering shown in Gropp. The combination would have be obvious because a person of ordinary skill in the art would want to use a loop reordering of Gropp to account for blocking for buffers resulting in a better matrix transpose implementation.
Claim(s) 5, 6 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Baeckler (US 9330740); and further in view of Schmidt (US 8,095,745); and still further in view of Kuwabara (US 2018/0343656)
Regarding claim 5, Yang/Baeckler explicitly teach all the claim limitations except for the hardware unit of claim 1, wherein the memory reading block is configured to read the data from the memory in discrete bursts and/or the memory writing block is configured to write the data to the memory in discrete bursts, the discrete bursts having a predetermined first size, wherein the memory reading block is configured to write the data to the internal buffer in discrete units and/or the memory writing block is configured to read the data from the internal buffer in discrete units, the discrete units having a second size, wherein the second size is different from the first size.
Schmidt, when addressing issues relating to memory management on information processing systems, teaches wherein the memory reading block is configured to read the data from the memory in discrete bursts and/or the memory writing block is configured to write the data to the memory in discrete bursts, the discrete bursts having a predetermined first size, wherein the memory reading block is configured to write the data to the internal buffer in discrete units and/or the memory writing block is configured to read the data from the internal buffer in discrete units, the discrete units having a second size [the exemplary memory 104 stores 256 bytes of data in each row 302 and provides the data in 16 byte bursts; c8 L1-5 on Schmidt].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to process the multidimensional array of Yang using the nested loop scheme shown in Schmidt. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. using a nested loop with a block size as determined by the buffer depth) to a known device ready for improvement to yield predictable results.
Kuwabara, in analogous art, teaches wherein the second size is different from the first size [the read request may further include burst size (ARSIZE) and burst length (ARLEN). The transfer processing unit 22 obtains the read data (RDATA) from the memory 14 via the read data channel of the bus 52 (¶0028); wherein The write request may further include burst size (AWSIZE) and burst length (AWLEN) (¶0029)].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to write data into the buffer using a burst size that is suited for the buffer and buffer bus and to read data out of the buffer (and into the memory) that is better suited for the memory as disclosed in Kuwabara. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. having different configurable burst sizes for reading and writing) to a known device ready for improvement to yield predictable results.
Regarding claim 6, Yang/Baeckler/Schmidt/Kuwabara teach the hardware unit of claim 5, wherein the data comprises a multidimensional array comprising a plurality of data elements, wherein at least one of the memory reading block and the memory writing block is configured to traverse the multidimensional array using a plurality of nested loops [Therefore, to read the input data of the form of the multidimensional matrix from the external memory 190, the first address generator 121a may generate a read address of the external memory 190 by executing multiple nested loops of the number of dimensions of the matrix or greater. In addition, the first address generator 121a may generate a write address for transferring the output data read from the internal memory 140 to the external memory 190. ¶0038 on Yang], each loop having associated with it a corresponding stride between data elements of the multidimensional array, wherein, when reading or writing a desired segment of the multidimensional array, said at least one block is configured to select the number of iterations in at least one loop, based on a relationship between the size of the desired segment and the first size [blocks 512, 514, 516 and 520 on FIG 5 of Schmidt shows how the relations between buffer size (depth) and multidimensional array size control the iteration of the nested loops].
Regarding claim 7, Yang/Baeckler/Schmidt/Kuwabara teach the hardware unit of claim 6, wherein said at least one block is configured to: determine, based on said relationship, that a discrete burst to be read or written contains extra data, which is additional to the desired segment and which is scheduled to be read or written in a later iteration of at least one of the plurality of loops; and in response, to operate on the extra data in the current iteration according to an operation scheduled for said later iteration [a temporary variable, D is set to the value of the depth of the first non-contiguous portion, D1 (c10 L35); wherein 512 shows that iteration is a function of the buffer depth; and wherein 506 and 516 on FIG. 5 shows that the address is function for the burst length (BL); Schmidt].
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Baeckler (US 9330740); and further in view of Schmidt (US 8,095,745).
Regarding claim 8, Yang/Baeckler explicitly teach all the claim limitations except for the hardware unit of claim 1, wherein the data comprises a multidimensional array comprising a plurality of data elements, wherein the multidimensional array is stored in the memory in a storage format having storage units of a predetermined third size, wherein one or more dimensions of the multidimensional array are not an integer multiple of the third size.
Schmidt, when addressing issues relating to memory management on information processing systems, teaches wherein the data comprises a multidimensional array comprising a plurality of data elements, wherein the multidimensional array is stored in the memory in a storage format having storage units of a predetermined third size, wherein one or more dimensions of the multidimensional array are not an integer multiple of the third size [the memory 104 has a defined width of n colunms 304, such as 16 colunms 304, by a defined number of rows 302 (X+2), suchas215 rows 302. Further, each addressable memory location in the memory 104 further stores one or more bits or bytes w of data, the amount, in one embodiment, being equivalent to the burst size of the memory, and in the given example, stores 16 bytes (4 words, each 32 bits); c7 L55-c8 L35 and FIG. 3].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to organize the data of Yang/Baeckler as shown in Schmidt. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. organizing data in a memory based on bytes, etc.) to a known device ready for improvement to yield predictable results.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Baeckler (US 9330740); and further in view of Schmidt (US 8,095,745); and still further in view of Gropp (Matrix Transpose, November 23, 2018).
Regarding claim 9, Yang/Baeckler/Schmidt explicitly teach all the claim limitations except for the hardware unit of claim 8, wherein at least one of the memory reading block and the memory writing block is configured to traverse the multidimensional array using a plurality of nested loops, each loop having associated with it a corresponding stride between data elements of the multidimensional array, wherein said at least one block is configured to select, for at least one loop of the plurality of nested loops, a different number of iterations when one or more outer loops are in their end iteration, as compared with the number of iterations of said at least one loop when the one or more outer loops are not in their end iteration.
Gropp, when disclosing how to transpose matrixes in computer system with buffer memories (i.e. caches), teaches wherein at least one of the memory reading block and the memory writing block is configured to traverse the multidimensional array using a plurality of nested loops, each loop having associated with it a corresponding stride between data elements of the multidimensional array, wherein said at least one block is configured to select, for at least one loop of the plurality of nested loops, a different number of iterations when one or more outer loops are in their end iteration, as compared with the number of iterations of said at least one loop when the one or more outer loops are not in their end iteration [do j=jj,min(n,jj+stride-1); do i=ii,min(n,ii+stride-1); b(i,j) = a(j,i); Loop Reordering on slide 15].
That is, Schmidt discloses a system and method wherein the DMA controller 102 effectively transposes the array of data as it is transferred, sequentially providing each column vector of the array [c7 L30-50]. Schmidt further shows how to transpose the matrix using nested loops [FIG. 5 and FIG. 6]. Gropp when disclosing the more general case of transposing matrices on a general computer shows several methods using different arrangements of nested loops [slides 2 and 13-15].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to transpose the matrix of Schmidt using the loop reordering shown in Gropp. The combination would have be obvious because a person of ordinary skill in the art would want to use a loop reordering of Gropp to account for blocking for buffers resulting in a better matrix transpose implementation.
Claim(s) 12, 14, 16 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Schmidt (US 8,095,745); and further in view of Gropp (Matrix Transpose, November 23, 2018).
Regarding claim 12, Yang teaches a hardware-implemented method of manipulating data stored in a memory, the data comprising a multidimensional array comprising a plurality of data elements [the input data may be provided in the form of a multidimensional array of digitized image data; ¶0018], the method comprising:
 (i) reading the data from the memory and writing the data to the internal buffer [the data moving controller 120 may move the kernel data and the input data (for example, image data) stored in the external memory 190 into the internal memory 140; ¶0026]; and
(ii) reading the data from the internal buffer and writing the data to the memory [the data moving controller 120 may move the output data, which is output as a result of the operation performed by the operator 130, from the internal memory 140 to the external memory 190; ¶0026].
Yang, however, does not explicitly teach wherein at least one of the steps (i) and (ii) is performed using a plurality of nested loops, each loop having associated with it a corresponding stride between data elements of the multidimensional array, and wherein at least one loop of the plurality of nested loops is configured to iterate a different number of times depending on a loop index of at least one other loop of the plurality of nested loops.
Schmidt, when addressing issues relating to memory management on information processing systems, teaches wherein the at least one of the steps (i) and (ii) is performed using a plurality of nested loops [FIG. 5 three nested loops: blocks 508-512; blocks 506-520; and blocks 508-516], each loop having associated with it a corresponding stride between data elements of the multidimensional array [the address is then incremented to the next row by incrementing the line spacing (c10 L45); wherein A2 incremented by the column counter ( col.) multiplied by the burst length (BL) (c10 L60-65); and wherein Accordingly, the column counter (col.) is incremented to the next column 304B (c11 L5). That is the loops are incremented by a stride amount equal to the size of the rows, columns, area of memory holding requested data, burst length, etc.].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to process the multidimensional array of Yang using the nested loop scheme shown in Schmidt. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. using a nested loop with a block size as determined by the buffer depth) to a known device ready for improvement to yield predictable results.
Finally, when disclosing how to transpose matrixes in computer system with buffer memories (i.e. caches), Gropp teaches wherein at least one loop of the plurality of nested loops is configured to iterate a different number of times depending on a loop index of at least one other loop of the plurality of nested loops [do j=jj,min(n,jj+stride-1); do i=ii,min(n,ii+stride-1); b(i,j) = a(j,i); Loop Reordering on slide 15].
That is, Schmidt discloses a system and method wherein the DMA controller 102 effectively transposes the array of data as it is transferred, sequentially providing each column vector of the array [c7 L30-50]. Schmidt further shows how to transpose the matrix using nested loops [FIG. 5 and FIG. 6]. Gropp when disclosing the more general case of transposing matrices on a general computer shows several methods using different arrangements of nested loops [slides 2 and 13-15].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to transpose the matrix of Schmidt using the loop reordering shown in Gropp. The combination would have be obvious because a person of ordinary skill in the art would want to use a loop reordering of Gropp to account for blocking for buffers resulting in a better matrix transpose implementation.
Regarding claim 14, Yang/Schmidt/Gropp teach the method of claim 12, wherein each loop of the plurality of loops is configured to perform a variable number of iterations, the variable number being selected at runtime from a group comprising: a first number of iterations to be performed when one or more outer loops of the plurality of nested loops are in their first iteration; a second number of iterations to be performed when the one or more outer loops of the plurality of nested loops are in their end iteration [FIG. 5; loop 508-516 iterates a variable number of times depending on blocks 512 and 514 as a function of whether all of the rows 302 for a given column 304A have been read (i.e. the loop is at its end) of if there are remaining rows (i.e. the loop it still on its initial iterations, or there are more rows to be read); c10 L35 and c11 L1-5 on Schmidth].
Regarding claim 16, Yang/Schmidt/Gropp teach the method of claim 12, further comprising:
maintaining a first synchronisation counter associated with a first loop among the plurality of loops performing step (i), wherein the first synchronisation counter is incremented with each iteration of the first loop [jj counter set to 1 on outer loop on slide 15 of Gropp]; 
maintaining a second synchronisation counter associated with a second loop among the plurality of loops performing step (ii), wherein the second synchronization counter is incremented with each iteration of the second loop [j counter set to jj of inner loop on slide 15 of Gropp];
comparing a current value of the first synchronisation counter with a current value of the second synchronisation counter; and  controlling the progress of step (i) and or step (ii) based on a result of the  comparison [do j=jj,min(n,jj+stride-1); translate to comparing the current value of j to the value of “jj+stride-1” and controlling the iteration of the j loop based on this comparison; slide 15 on Gropp].
Regarding claim 18; these claim(s) limitations are significantly similar to those of claim(s) 12; and, thus, are rejected on the same grounds.
Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Schmidt (US 8,095,745); and further in view of Gropp (Matrix Transpose, November 23, 2018); and Lee (US 2020/0202198).
Regarding claim 13, Yang/Schmidt/Gropp explicitly teach all the claim limitations except for the method of claim 12, wherein at least one loop of the plurality of nested loops is further configured to iterate a different number of times depending on a software configurable flag.
Lee, in analogous art, teaches the method of claim 12, wherein at least one loop of the plurality of nested loops is further configured to iterate a different number of times depending on a software configurable flag [the system 100 is configured to identify an occurrence of a parameter (e.g., end_flag) that is used to indicate that an end flag condition has been satisfied (e.g., end_flag=1). As indicated above, the occurrence of an end flag condition that is satisfied means a current memory word is the end of a process iteration involving the kernel location memory; ¶0139].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a flag to indicate termination of a loop (i.e. end of process iteration) as disclosed in Lee. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. using parameters, or flags, to indicate termination of an iterative process)to a known device ready for improvement to yield predictable results.
Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 2020/0133854) in view of Schmidt (US 8,095,745); and further in view of Gropp (Matrix Transpose, November 23, 2018); and still further in view of Kuwabara (US 2018/0343656).
Regarding claim 15, Yang/Schmidt/Gropp teach the method of claim 12, wherein: the data is read from the memory in discrete bursts, and/or the data is written to the memory in discrete bursts, the discrete bursts having a predetermined first size [the exemplary memory 104 stores 256 bytes of data in each row 302 and provides the data in 16 byte bursts; c8 L1-5 on Schmidt].
Yang/Schmidt/Gropp, however, does not explicitly teach wherein the data is written to the internal buffer in discrete units, and/or the data is read from the internal buffer in discrete units, the discrete units having a second size; and wherein the second size is different from the first size.
Kuwabara, in analogous art, teaches wherein the data is written to the internal buffer in discrete units, and/or the data is read from the internal buffer in discrete units, the discrete units having a second size; and wherein the second size is different from the first size [the read request may further include burst size (ARSIZE) and burst length (ARLEN). The transfer processing unit 22 obtains the read data (RDATA) from the memory 14 via the read data channel of the bus 52 (¶0028); wherein The write request may further include burst size (AWSIZE) and burst length (AWLEN) (¶0029)].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to write data into the buffer using a burst size that is suited for the buffer and buffer bus and to read data out of the buffer (and into the memory) that is better suited for the memory as disclosed in Kuwabara. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. having different configurable burst sizes for reading and writing) to a known device ready for improvement to yield predictable results.

Allowable Subject Matter
Claim 11 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAMON A MERCADO whose telephone number is (571)270-5744.  The examiner can normally be reached on Monday to Friday from 7:00AM to 3:00PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, David Yi, can be reached on 571-270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).	
/Ramon A. Mercado/Primary Examiner, Art Unit 2132