Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

ACTION
2. 	This Office Action is taken in response to Applicants’ Amendments and Remarks filed on 2/16/2022 regarding application 16/908,707 filed on 6/22/2020.  
 	Claims 1-5, 7-11, 13-17, 19-24, and 26 are pending for consideration.

3.				Response to Amendments and Remarks 
	Applicants’ amendments and remarks have been fully and carefully considered, with the Examiner’s response set forth below.
(1) Applicant raises the question “The Applicant is therefore uncertain about which one of the memory 1900 and the HBM 1740 is considered by the examiner as correspondence to the “off-chip storage” in the present claim 1” (see page 8 of Applicant’s Remarks). The Examiner explains as follows:
First, claim 1 recites, “an electronic device comprising at least one processor and an off-chip storage …” Thus, so long as the electronic device comprises one processor and an off-chip storage, it would read on the limitation. Significantly, the limitation does not recite where the processor is located, or whether it’s on-chip or off-chip.
Second, Lau teaches [… The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands (e.g., multidimensional matrix operands). Instructions of the MPUs may take tensors as inputs or operands … These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations). This data may be stored and transferred as tensors in on-chip and off-chip memory, and between the host and the chip. For instance, data to be fetched or written to using the MPUs may be stored in tensor form, among other example features … (¶ 0044)]. 
Significantly, Lau specifically points out that “These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations). This data may be stored and transferred as tensors in on-chip and off-chip memory, and between the host and the chip.” Thus, Lau’s apparatus includes both off-chip and on-chip memory, and data is initially stored in off-chip memory and then moved into the on-chip memory for matrix operations. The memory 1900 is the corresponding on-chip memory storing data for matrix operations. Lau may not explicitly show the off-chip memory, but certainly clearly indicates the existence of an off-chip memory, as ““These instructions may include data movement from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations).”
Therefore, Lau’s electronic device reads on the cited limitation of off-chip memory.
	(2) In response to the amendments and remarks, an updated claim analysis has been made. Refer to the corresponding sections of the following Office Action for details.

4.					Examiner’s Note
(1) In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution.  MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.”  Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R.  1.131(b), (c), (d), and (h) and therefore held not fully responsive.  Generic statements such as “Applicants believe no new matter has been introduced” may be deemed insufficient.
(2) Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1-5, 7-11, 13-17, 19-24, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Lau et al. (US Patent Application Publication 2019/0392297, hereinafter Lau), and in view of Zhou et al. (US Patent Application Publication 2018/0232181, hereinafter Zhou).
	As to claim 1, Lau teaches A matrix storage [A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations … (abstract)] method, applied to an electronic device comprising at least one processor and an off-chip storage which is connected to and in communication with the at least one processor [as shown in figure 17; as shown in figure 19; … The tensor operands (e.g., multidimensional matrix operands). Instructions of the MPUs may take tensors as inputs or operands … These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations). This data may be stored and transferred as tensors in on-chip and off-chip memory, and between the host and the chip. For instance, data to be fetched or written to using the MPUs may be stored in tensor form, among other example features … (¶ 0044)], the method comprising: 
dividing a matrix into a plurality of data blocks with a preset segmentation granularity of N rows x M columns, at least one of N and M is greater than 1; wherein the plurality of data blocks comprises at least one first data block of N rows x M columns [as shown in figure 19; … FIG. 19 illustrates the manner in which memory 1900 stores or arranges the elements of matrix 1910 in memory modules 1901. For example, matrix 1910 is logically partitioned into 2.times.2 blocks of matrix elements, and each 2.times.2 block is stored in a single entry 1902 of memory modules 1901. For example, matrix 1910 is logically partitioned into blocks A-I, which are 2.times.2 blocks of matrix elements in matrix 1910, and each block A-I is stored in a single entry 1902 of memory modules 1901. For example, memory 1900 stores and retrieves these respective blocks A-I of matrix 1910 using the same approach as used by memory 1800 for the respective elements A-I of matrix 1810 from FIG. 18. Thus, memory 1900 uses the same storage approach as memory 1800, but memory 1900 
if the column number of the matrix is not an integer multiple of M, the plurality of data blocks further comprises at least one second data block of N rows x P columns, the second data block is aligned with an adjacent row of first data block, and P is less than M [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19]; 
storing the data in each of the first data blocks continuously in an off-chip storage, and storing the data in each of the second data blocks continuously in the off-chip storage [as shown in figure 17, where the matrix processing chip/deep learning chip 1700) includes a read engine (1735), slice engines (1736a and 1736), and an output engine 1737), while the data storage medium HBM (High Bandwidth Memory, 1740a, 1740b, 1740c, 1740d) are outside the chip (1700), i.e., off-chip; FIG. 17 illustrates an example embodiment of a matrix processing engine 1700. In some embodiments, matrix processing engine 1700 may be implemented by a matrix processing architecture, such as the matrix processing architectures discussed in the examples above. For example, in some embodiments, matrix processing engine 1700 may be implemented by a matrix processing cluster on a matrix processing chip … For example, the illustrated embodiment depicts high bandwidth memory ( HBM) modules 1740, master control CPU (MCC) 1732, matrix processing units (MPUs) 1734, and memory resource blocks ( MRBs) 1738 … (¶ 0104-0106); The illustrated example shows the control flow of matrix processing engine 1700 for matrix operation 1701 and matrix operation 1702. The control flow for a matrix operation begins with the read For example, for matrix operation 1701, read engine 1735 may first retrieve matrix data associated with the particular matrix operation from an HBM module 1740a … In some embodiments, read engine 1735 may use the master control CPU (MCC) 1732 on its respective cluster for storing and retrieving data on HBMs 1740 and MRBs 1738 … (¶ 0112-0113); as shown in figure 19; … The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands (e.g., multidimensional matrix operands). Instructions of the MPUs may take tensors as inputs or operands. These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations). This data may be stored and transferred as tensors in on-chip and off-chip memory, and between the host and the chip. For instance, data to be fetched or written to using the MPUs may be stored in tensor form, among other example features … (¶ 0044)], wherein the off-chip storage comprises a plurality of storage channels, and the segmentation granularity is an integral multiple of a storage granularity of each storage channel [Lau -- as shown in figure 19; … FIG. 19 illustrates the manner in which memory 1900 stores or arranges the elements of matrix 1910 in memory modules 1901. For example, matrix 1910 is logically partitioned into 2.times.2 blocks of matrix elements, and each 2.times.2 block is stored in a single entry 1902 of memory modules 1901. For example, matrix 1910 is 
Zhou more expressively teaches this limitation – 16 channels as shown in figure 2; The flash chip may be a single layer cell (SLC), or may be a multi-layer cell (MLC), or may be another storage unit … For example, if a capacity of each page is set to 16 KB, and a capacity of each block is set to 8 megabytes (MB), a value of N is set to 512, that is, each block may include 512 pages, if a capacity of each die is set to 16 gigabytes (GB), a value of M is 2048, that is, each die may include 2048 blocks. For example, if each flash chip includes two dies, a capacity of each flash chip is 32 GB, if each channel may be connected to four flash chips, it indicates that eight dies may be connected to the channel, and in this case, a capacity managed in each channel is 128 GB. Referring to FIG. 2, if the solid state disk 200 includes 16 channels, a total capacity of the solid state disk 200 is 2 terabytes (TB) … (¶ 0092)].
Regarding claim 1, Lau teaches an off-chip storage device [… These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations). This data may be stored and transferred as tensors in on-chip and off-chip memory, and between the host and the chip. For instance, data to be fetched or written to using the MPUs may be stored in 
However, Zhou specifically teaches a storage comprises a plurality of storage channels, and the segmentation granularity is an integral multiple of a storage granularity of each storage channel [16 channels as shown in figure 2; The flash chip may be a single layer cell (SLC), or may be a multi-layer cell (MLC), or may be another storage unit … For example, if a capacity of each page is set to 16 KB, and a capacity of each block is set to 8 megabytes (MB), a value of N is set to 512, that is, each block may include 512 pages, if a capacity of each die is set to 16 gigabytes (GB), a value of M is 2048, that is, each die may include 2048 blocks. For example, if each flash chip includes two dies, a capacity of each flash chip is 32 GB, if each channel may be connected to four flash chips, it indicates that eight dies may be connected to the channel, and in this case, a capacity managed in each channel is 128 GB. Referring to FIG. 2, if the solid state disk 200 includes 16 channels, a total capacity of the solid state disk 200 is 2 terabytes (TB) … (¶ 0092)].
Therefore, it would have been obvious for one of ordinary skills in the art prior to Applicant’s invention to use a storage comprises a plurality of storage channels, and the segmentation granularity is an integral multiple of a storage granularity of each storage channel, as demonstrated by Zhou, and to incorporate it into the existing scheme disclosed by Lau, in order to support multiple channels data transmission with desired size/capacity.
The method according to claim 1, wherein if the row number of the matrix is not an integer multiple of N, the plurality of data blocks further comprises at least one third data block of L rows x S columns, and L is less than N [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 3, The method according to claim 2, wherein the number of the third data block is 1, and S is equal to the column number of the matrix [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 4, The method according to claim 2, further comprising: storing the data in each of the third data blocks continuously in the off-chip storage [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 5, The method according to claim 3, further comprising: storing the data in each of the third data blocks continuously in the off-chip storage [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 7, Lau in view of Zhou teaches A matrix access method, wherein the matrix is stored in an off-chip storage using the matrix storage method according to claim 1, the matrix access method comprising: receiving an access request for the matrix, wherein the access request comprises access parameters of the matrix; if the data block corresponding to the access parameter comprises a complete first data block or a complete second data block, reading the data of the data block from the off-chip storage [Lau -- as shown in figure 19; … FIG. 19 illustrates the manner in which memory 1900 stores or arranges the elements of matrix 1910 in memory modules 1901. For example, matrix 1910 is logically partitioned into 2.times.2 blocks of matrix elements, and each 2.times.2 block is stored in a single entry 1902 of memory modules 1901. For example, matrix 1910 is logically partitioned into blocks A-I, which are 2.times.2 blocks of matrix elements in matrix 1910, and each block A-I is stored in a single entry 1902 of memory modules 1901. For example, memory 1900 stores and retrieves these respective blocks A-I of matrix 1910 using the same approach as used by memory 1800 for the respective elements A-I of matrix 1810 from FIG. 18. Thus, memory 1900 uses the same storage approach as memory 1800, but memory 1900 operates on blocks of four matrix elements while memory 1800 operates on single matrix elements (¶ 0144); … The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands (e.g., multidimensional matrix operands). Instructions of the MPUs may take tensors as inputs or operands. These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations). This data may be stored and transferred as tensors in on-chip and off-chip memory, and between the host and the chip. For instance, data to be fetched or written to using the MPUs may be stored in tensor form, among other example features … (¶ 0044)]; if the data block corresponding to the access parameter comprises an incomplete first data block or an incomplete second data block, reading the data of the data block from the off-chip storage is prohibited [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 8, A matrix access method, wherein the matrix is stored in an off-chip storage using the matrix storage method according to claim 2, the matrix access method comprising: receiving an access request for the matrix, wherein the access request comprises access parameters of the matrix; if the data block corresponding to the access parameter comprises a complete first data block or a complete second data block, reading the data of the data block from the off-chip storage- if the data block corresponding to the access parameter comprises an incomplete first data block or an incomplete second data block, reading the data of the data block from the off-chip storage is prohibited [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 9, A matrix access method, wherein the matrix is stored in an off-chip storage using the matrix storage method according to claim 3, the matrix access method comprising: receiving an access request for the matrix, wherein the access request comprises access parameters of the matrix: if the data block corresponding to the access parameter comprises a complete first data block or a complete second data block, reading the data of the data block from the off-chip storage: if the data block corresponding to the access parameter comprises an incomplete first data block or an incomplete second data block, reading the data of the data block from the off-chip storage is prohibited [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
A matrix access method, wherein the matrix is stored in an off-chip storage using the matrix storage method according to claim 4, the matrix access method comprising: receiving an access request for the matrix, wherein the access request comprises access parameters of the matrix: if the data block corresponding to the access parameter comprises a complete first data block or a complete second data block, reading the data of the data block from the off-chip storage- if the data block corresponding to the access parameter comprises an incomplete first data block or an incomplete second data block, reading the data of the data block from the off-chip storage is prohibited [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 11, A matrix access method, wherein the matrix is stored in an off-chip storage using the matrix storage method according to claim 5, the matrix access method comprising: receiving an access request for the matrix, wherein the access request comprises access parameters of the matrix: if the data block corresponding to the access parameter comprises a complete first data block or a complete second data block, reading the data of the data block from the off-chip storage: if the data block corresponding to the access parameter comprises an incomplete first data block or an incomplete second data block, reading the data of the data block from the off-chip storage is prohibited [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 13, Lau in view of Zhou teaches An electronic device, comprising: at least one processor; and a memory which is connected to and in communication with the at least one processor; wherein, instructions that can be executed by the at least one processor are stored in the memory, the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to claim 1 [Lau -- … Then, when a particular matrix operation needs to be performed, the matrix processor can retrieve the corresponding matrix subroutine from the matrix subroutine memory, and then execute the instructions and/or commands of the subroutine to perform the desired matrix operation (¶ 0360)].
	As to claim 14, An electronic device, comprising: at least one processor; and a memory which is connected to and in communication with the at least one processor; wherein, instructions that can be executed by the at least one processor are stored in the memory, the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to claim 2 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 15, An electronic device, comprising: at least one processor; and a memory which is connected to and in communication with the at least one processor; wherein, instructions that can be executed by the at least one processor are stored in the memory, the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to claim 3 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 16, An electronic device, comprising: at least one processor; and a memory which is connected to and in communication with the at least one processor; wherein, instructions that can be executed by the at least one processor are stored in the memory, the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to claim 4 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 17, An electronic device, comprising: at least one processor; and a memory which is connected to and in communication with the at least one processor; wherein, instructions that can be executed by the at least one processor are stored in the memory, the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to claim 5 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 19, Lau in view of Zhou teaches An electronic device, comprising: at least one processor; and a memory which is connected to and in communication with the at least one processor; wherein, instructions that can be executed by the at least one processor are stored in the memory, the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to claim 7 [Lau -- … Then, when a particular matrix operation needs to be performed, the matrix processor can retrieve the corresponding matrix subroutine from the matrix subroutine memory, and then execute the instructions and/or commands of the subroutine to perform the desired matrix operation (¶ 0360)].
	As to claim 20, Lau in view of Zhou teaches A non-transient computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions are executed to cause the computer to perform the method according to claim 1 [Lau -- An example machine accessible storage medium may have instructions stored thereon, where the instructions, when executed on a machine, cause the machine to: write a particular row or a particular column of a matrix to a memory, where the instructions that cause the machine to write the particular row or the particular column to the memory cause the machine to: shift a plurality of matrix elements of the particular row or the particular column; and write the plurality of matrix elements to a plurality of memory modules of the memory. In one example embodiment of a storage medium, the instructions further cause the machine to perform a particular number of shifts based on a row number of the particular row or a column number of the particular column (¶ 0469)].
As to claim 21, A non-transient computer-readable storage medium having computer instructions stored thereon. wherein the computer instructions are executed to cause the computer to perform the method according to claim 2 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
	As to claim 22, A non-transient computer-readable storage medium having computer instructions stored thereon. wherein the computer instructions are executed to cause the computer to perform the method according to claim 3 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
As to claim 23, A non-transient computer-readable storage medium having computer instructions stored thereon. wherein the computer instructions are executed to cause the computer to perform the method according to claim 4 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
A non-transient computer-readable storage medium having computer instructions stored thereon. wherein the computer instructions are executed to cause the computer to perform the method according to claim 5 [this limitation is conditional, and does not apply to Lau’s disclosure shown in figure 19].
As to claim 26, Lau in view of Zhou teaches A non-transient computer-readable storage medium having computer instructions stored thereon. wherein the computer instructions are executed to cause the computer to perform the method according to claim 7 [Lau -- An example machine accessible storage medium may have instructions stored thereon, where the instructions, when executed on a machine, cause the machine to: write a particular row or a particular column of a matrix to a memory, where the instructions that cause the machine to write the particular row or the particular column to the memory cause the machine to: shift a plurality of matrix elements of the particular row or the particular column; and write the plurality of matrix elements to a plurality of memory modules of the memory. In one example embodiment of a storage medium, the instructions further cause the machine to perform a particular number of shifts based on a row number of the particular row or a column number of the particular column (¶ 0469)].

					Conclusion
6.	Claims 1-5, 7-11, 13-17, 19-24, and 26 are rejected as explained above. 
7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHENG JEN TSAI whose telephone number is 571-272-4244.  The examiner can normally be reached on Monday-Friday, 9-6.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/SHENG JEN TSAI/Primary Examiner, Art Unit 2136                                                                                                                                                                                                        
March 29, 2022