NON-FINAL REJECTION
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
	Claims 1-23 are rejected under 35 U.S.C. 103 as being unpatentable.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4, 7, 10, 11, 13, 18, 21, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Acocella et al. (US 7,750,915), Lea et al. (US 2018/0276539), Oh et al. (US 2017/0358327).
Regarding claim 1, Acocella et al. disclose: 
A memory device comprising: 
(Col 2, line 48:  the shared memory resource comprising a plurality of banks)…and the plurality of banks including at least a first bank and a second bank (FIG. 8A Bank 0…Bank 5)…
a calculation logic including a plurality of processor…circuits arranged in correspondence to the plurality of banks (FIG. 5 Core 310 including Processing Engines 0…P-1), the plurality of PIM circuits including at least a first…circuit arranged in correspondence to the first bank and a second…circuit arranged in correspondence to the second bank (Col 11, line 16:  the number of banks B may be equal to the number of processors P), and each of the plurality of…circuits being configured to perform a calculation processing (Col 6, line 32:  multithreaded core array 202 provides a highly parallel architecture that supports concurrent execution of a large number of instances of vertex, geometry, and/or pixel shader programs in various combinations; see Col 4, line 59-Col 5, line 22; Col 8 line 13:  Each parallel processing engine 402 advantageously includes an identical set of functional units (e.g., arithmetic logic units, etc.)) using at least one selected from data provided from a host or memory information read from a corresponding bank among the plurality of banks (Col 11, line 1:  data to be accessed by processing engines 402, such as vertex data, geometry data, and pixel data, are loaded into global register file 406, which is shared among processing engines 402. The present invention allows data to be stored in shared global register file 406 such that the data can be concurrently accessed in an efficient manner by different processing units 402, in order to support parallel processing); and 
a control logic (FIG. 4 Instruction Unit 412) configured to…control a memory operation on the memory bank, and control the calculation logic to perform the calculation processing (Col 12, line 33:  The thread executed by processing engine 402(0) and the thread executed by processing engine 402(1) may be part of the same SIMD group being carried out in connection with a single instruction processed by instruction unit 412, shown in FIG. 4), 
wherein the control logic (FIG. 4 Instruction Unit 412) is further configured to control in parallel at least a first reading operation from the first bank and a second reading operation from the second bank for the calculation processing (Col 12, line 25:  As discussed previously, processing engines such as 402(0) and 402(1) are capable of executing different threads. For example, processing engine 402(0) may be executing one thread by accessing a vertex comprised of data elements A0, A1, A2, A3. Concurrently, processing engine 402(1) may be executing another thread by accessing a different vertex comprised of data elements B0, B1, B2, B3; Col 9, line 7:  instruction unit 412 issues the same instruction to all P processing engines 402 in parallel), 
wherein a first offset for the first bank and a second offset for the second bank having different values are respectively configured for at least the first bank and the second bank (FIG. 7 Offset 702 and Offset 704; Col 13, line 2:  The skipped data entry 702 introduces an offset between the storage of data elements A0, A1, A2, A3 accessed by processing engine 402(0) and the storage of data elements B0, B1, B2, B3 accessed by processing engine 402(1). This offset effectively eliminates potential "bank conflicts" that may otherwise occur as processing engines 402(0) and 402(1) attempt the concurrent accesses shown in FIG. 6. For example, in the same clock cycle, processing engine 402(0) may access data element A0 while processor 402(1) accesses data element B0, as indicated by reference 602 in FIG. 6), and 
wherein the memory operation is configured to: 
(FIG. 9 step 906 Allow processing engines to concurrently access stored data elements), and 
provide at least the first memory information to the first (FIG. 9 step 906 Allow processing engines to concurrently access stored data elements; Col 15, line 43:  Data entries 802 and 804 are skipped in order to introduce offsets that allow concurrently accessed data elements to be stored in different banks. For example, as result of the offsets introduced by the skipped data entries 802 and 804, data elements A0, B0, and C0 are stored in different banks 406(0), 406(1), and 406(2). Consequently, in one clock cycle, the following three concurrent data accesses are possible: processing engine 402(0) accesses data element A0, processing engine 402(1) accesses data element B0, and processing engine 402(2) accesses data element C0).
Acocella et al. do not appear to explicitly teach “each bank of the plurality of banks including a memory cell array…the first bank including a first memory cell array and the second bank including a second memory cell array;…processor-in-memory (PIM)…based on at least one of a command or an address received from the host.” However, Lea et al. disclose:
each bank of the plurality of banks including a memory cell array…the first bank including a first memory cell array and the second bank including a second memory cell array (FIG. 1B Each bank 112-1 comprises an array of rows and columns of memory cells 123; [0024] 123-N of memory cells in the bank 121-1);
…processor-in-memory (PIM) ([0005] Processing performance may be improved in a processing-in-memory (PIM) device, in which a processing and/or logic resource may be implemented internally and/or near to a memory (e.g., directly on a same chip as the memory array). A processing-in-memory (PIM) device may save time by reducing and eliminating external communications and may also conserve)
…based on at least one of a command or an address received from the host ([0030] executing instructions from the host 110 and accessing the memory array 130)
Acocella et al. and Lea et al. are analogous art because Acocella et al. teach concurrently accessing multiple memory banks and Lea et al. teach operating neural networks with a processing-in-memory (PIM) architecture. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Acocella et al. and Lea et al. before him/her, to modify the teachings of Acocella et al. with the Lea et al. teachings of operating neural networks with a PIM architecture because implementing a neural network in a PIM architecture saves time by reducing and eliminating external communications (Lea et al. [0005]).
Acocella et al. and Lea et al. do not appear to explicitly teach “based on at least one of a command or an address received from the host.” However, Oh et al. disclose:
…based on at least one of a command or an address received from the host ([0033] accesses data DATA of a memory cell array 210A by pro providing a command CMD and an address ADD to memory device 200A; [0034] access memory device 200A according to a request from a host; FIG. 2 Host 100B sends CMD and ADD to memory device)
Acocella et al., Lea et al., and Oh et al. are analogous art because Acocella et al. teach concurrently accessing multiple memory banks; Lea et al. teach operating neural networks with a processing-in-memory (PIM) architecture; and Oh et al. teach a memory device for performing an internal process. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Acocella et al., Lea et al., and Oh et al. before him/her, to modify the combined teachings of Acocella et al. and Lea et al. with the Oh et al. teachings of controlling a memory operation because the command and address received from the host enable the operation to be directed to a particular location within the memory.
Regarding claim 3, Acocella et al. further disclose: 
The memory device of claim 1, 
…the first memory information from the first row and the second memory information from the second row are read in parallel according to the first offset and the second offset respectively (Col 15, line 43:  Data entries 802 and 804 are skipped in order to introduce offsets that allow concurrently accessed data elements to be stored in different banks. For example, as result of the offsets introduced by the skipped data entries 802 and 804, data elements A0, B0, and C0 are stored in different banks 406(0), 406(1), and 406(2). Consequently, in one clock cycle, the following three concurrent data accesses are possible: processing engine 402(0) accesses data element A0, processing engine 402(1) accesses data element B0, and processing engine 402(2) accesses data element C0).
Acocella et al., Lea et al., and Oh et al. do not appear to explicitly teach “wherein the first memory cell array of the first bank includes a first plurality of rows including a first row, the second memory cell array of the second bank includes a second plurality of rows including a second row, and.” However, Lea et al. further disclose:
(FIG. 1B Each bank 112-1 comprises an array of rows and columns of memory cells 123; [0024] 123-N of memory cells in the bank 121-1), 
Regarding claim 4, Acocella et al. further disclose: 
The memory device of claim 3, wherein the memory operation is further configured to sequentially read the first memory information from the first plurality of rows in the first bank in response to the at least one of the command or the address received from the host (Col 11, line 18:  Each data entry may hold a particular amount of data, such as a 32-bit word. The banks allow concurrent access such that at any given time, each bank can be separately reached to access a data entry within that bank. The banks may be synchronized to the same clock timing. Thus, in one clock cycle, a selected data entry in bank 402(0), a selected data entry in bank 402(1), . . . , and a selected data entry in 406(B-1) may all be concurrently accessed in parallel); and 
wherein a first start position of the first reading operation from the first plurality of rows is different from a second start position of the second reading operation from the second plurality of rows according to the first offset and the second offset (FIG 8A; Start of A0 is offset at position 0 (bank 0, row 1) and the start of B0 is offset after skipped data entry 802 (bank 1, row 2; Col 11, line 27:  In each bank, a register address or index may be specified to select the data entry to be accessed. The register address or index specified in one bank can be different than the register address or index specified in another bank. That is, in each clock cycle, the data entry selected in one bank may be independent of the data entry selected in another bank).
Regarding claim 7, Lea et al. further disclose: 
([0044] the plurality of neural networks 296-1. . . 296-M can simultaneously receive instructions to operate on the particular portion of data); and 
the plurality of PIM circuits are configured to perform the calculation processing using the memory information and the data ([0014] the plurality of neural networks are configured to receive a particular portion of data and wherein each of the plurality of neural networks are configured to operate on the particular portion of data during a particular time period to make a determination regarding a characteristic of the particular portion of data. In some embodiments the plurality of neural networks may include processing in memory (PIM) architecture).
Regarding claim 10, Oh et al. further disclose:
The memory device of claim 1, wherein the memory device includes a high bandwidth memory (HBM) including a plurality of channels, the plurality of channels including a first channel and a second channel ([0083] FIG. 9 is a block diagram illustrating an example embodiment of a memory device having a stacked structure. In FIG. 9, a memory device in a high bandwidth memory (HBM) form having an increased bandwidth by including a plurality of independent channels having independent interfaces for a corresponding plurality of memory cell groups), 
the first channel (FIG. 3 CH1) includes the plurality of banks ([0057] each of the memory devices 200C may include a plurality of memory cell groups, and each of the memory devices 200C may include a plurality of independent channels corresponding to the plurality of memory cell groups; corresponding to the teachings of Lea) and the plurality of PIM circuits (as taught by the combination of Acocella and Lea in claim 1), and 
(FIG. 3 CH1) includes a second plurality of banks ([0057] each of the memory devices 200C may include a plurality of memory cell groups, and each of the memory devices 200C may include a plurality of independent channels corresponding to the plurality of memory cell groups; corresponding to the teachings of Lea) and a second plurality of PIM circuits (as taught by the combination of Acocella and Lea in claim 1), the second channel configured in a manner similar to that of the first channel ([0087] each of the independent channels for the corresponding memory cell groups has a 128-bit bandwidth).
Claim 11 recites an “operating method of a memory device” with limitations substantially similar to the limitations a claim 1. Claim 18 an “operating method of a memory controller controlling a memory device” with limitations substantially similar to the limitations of claim 1. Claims 11 and 18 are rejected in substantially the same manner as claim 1. Claim 13 recites claim limitations substantially similar to those of claim 7. Therefore, claim 13 is rejected in substantially the same manner as claim 7.
Regarding claim 21, et al. further disclose:
The operating method of claim 18, further comprising resetting a register before the configuring of the plurality of offsets (FIG. 7 Offset 702 and Offset 704), the register being included in at least a first PIM circuit (Col 8, line 28:  Each processing engine 402 is allocated space in a local register file 404 for storing its local input data, intermediate results, and the like. In one embodiment, local register file 404 is physically or logically divided into P banks, each having some number of entries (where each entry might be, e.g., a 32-bit word). One bank is allocated to each processing unit, and corresponding entries in different banks can be populated with data for corresponding thread types to facilitate SIMD execution. The number of entries in local register file 404 is advantageously large enough to support multiple concurrent threads per processing engine 402; It would be obvious to one skilled in the art at the time of the effective filing date of the claimed invention to reset the register when performing new accesses to a shared memory resource in order to clear data that is no longer needed for the new accesses) of the plurality of PIM circuits (as taught by Lea et al. above).
Regarding claim 22, Acocella et al. further disclose:
The operating method of claim 18, wherein the controlling of the memory operation (Col 12, line 33:  The thread executed by processing engine 402(0) and the thread executed by processing engine 402(1) may be part of the same SIMD group being carried out in connection with a single instruction processed by instruction unit 412, shown in FIG. 4) includes controlling the memory operation to respectively store different items of the table information in different banks among the plurality of banks (Col 13, line 13:  Referring back to FIG. 7, this concurrent access is now possible because data element A0 is stored in bank 406(0), and data element B0 is stored in bank 406(1). In other words, the offset introduced by the skipped data element 702 allows data elements A0 and B0 to be stored in different banks. As a result, data elements A0 and B0 can be accessed in the same clock cycle by separately accessing bank 406(0) and bank 406(1)).

Claims 2, 5, 6, 15, and 16 rejected under 35 U.S.C. 103 as being unpatentable over Acocella et al., Lea et al., and Oh et al. as applied to claim 1 above, and further in view of Fleischer et al. (US 2014/019799).
Regarding claim 2, Acocella et al., Lea et al., and Oh et al. do not appear to explicitly teach while Fleischer et al. disclose: 
(FIG. 6 604 Read an offset address value from the register file group of the processing element; [0004] reading a base address value and an offset address value from a register file group of the processing element).
Acocella et al., Lea et al., and Oh et al. are analogous art because Acocella et al. teach concurrently accessing multiple memory banks; Lea et al. teach operating neural networks with a processing-in-memory (PIM) architecture; Oh et al. teach a memory device for performing an internal process; and Fleischer et al. teach address generation in a memory device. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Acocella et al., Lea et al., Oh et al., and Fleischer et al. before him/her, to modify the combined teachings of Acocella et al., Lea et al., and Oh et al. with the Fleischer et al. teachings of address generation in a memory device because including an offset storage circuit in the memory device would reduce the data movement between the location where the data processing is performed and the memory is greatly reduced (Fleischer et al. [0020]).  
Regarding claim 5, Acocella et al., Lea et al., and Oh et al. do not appear to explicitly teach while Fleischer et al. disclose:
The memory device of claim 1, further comprising an internal address generator (FIG. 5 address generation logic 504) configured to: 
generate a first internal address indicating a first read position in the first memory cell array of the first bank based on a first calculation using the address from the host and the first offset (FIG. 6 Step 60 Read a base address value from a register file group of a processing element; Step 604 Read an offset address value from the register file group of the processing element; Step 610 Output the physical address and access a location in memory based on the physical address), and 
generate a second internal address indicating a second read position in the second memory cell array of the second bank based on a second calculation using the address from the host and the second offset (FIG. 6).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Acocella et al., Lea et al., Oh et al., and Fleischer et al. before him/her, to modify the combined teachings of Acocella et al., Lea et al., and Oh et al. with the Fleischer et al. teachings of address generation in a memory device because Acocella teaches a processing engine (i.e. element) for each bank and Fleischer teaches an address generating logic in each processing element (Fleischer, FIG. 2). Therefore, the combination would enable generation of an internal address for each bank of the combination.
Regarding claim 6, the combination of Acocella et al., Lea et al., Oh et al., and Fleischer et al. further disclose: 
The memory device of claim 5, wherein the internal address generator includes a first internal address generator corresponding to the first bank, and a second internal address generator corresponding to the second bank (Acocella discloses a processing circuit for each bank and Fleischer discloses a plurality of processing elements, each with an address generating logic as discussed in claim 5), and 
(Lea et al. further discloses at [0044] the plurality of neural networks 296-1. . . 296-M can simultaneously receive instructions to operate on the particular portion of data).
Claims 15 and 16 recite claim limitations substantially similar to those of claims 2, 5 and 6. Therefore, claims 15 and 16 are rejected in substantially the same manner as claims 2, 5 and 6.

Claims 8, 9, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Acocella et al., Lea et al., and Oh et al. as applied to claim 1 above, and further in view of Ma et al. (US 2020/0226201).
Regarding claim 8, Lea et al. further discloses: 
The memory device of claim 1, wherein the calculation processing includes a neural network computation ([0018] the plurality of neural networks can include an array of memory cells coupled to sensing circuitry including a sense amplifier and a compute component) using…at least a first vector ([0038] Operations described herein can include operations associated with a processing in memory (PIM) capable device. PIM capable device operations can use bit vector based operations)…
Acocella et al., Lea et al., and Oh et al. do not appear to explicitly teach “using a weight matrix and at least a first vector and a second vector, the weight matrix includes the data from the host, the first vector includes the first memory information read from the first bank, and the second vector includes the second memory information read from the second bank.” However, Ma et al. discloses:
using a weight matrix (FIG. 4B 4x4 Weight Matrix 438) and at least a first vector and a second vector (FIG. 4B Data Vectors 422), the weight matrix includes the data from the host ([0072] FIG. 1A, inputs 101 to 104 are provided with training data during training sessions and then with new input data when the artificial neural network is used to make inferences. The input data (101 to 104) are processed with a weighted matrix 120 to create output data (141 to 144); One of ordinary skill in the art understand that input data is provided by hosts), the first vector includes the first memory information read from the first bank, and the second vector includes the second memory information read from the second bank ([0080] The Matrix Processor 200 of FIG. 2A has access to a wide State Random Access Memory (SRAM) bank 230. The wide SRAM 230 is configured such that entire wide rows of data can be accessed in a single memory cycle. In this manner, an entire input vector or an entire row of weight values from a weight matrix can be read out from the SRAM 230 or written to the SRAM 230 in a single memory cycle).
Acocella et al., Lea et al., Oh et al., and Ma et al. are analogous art because Acocella et al. teach concurrently accessing multiple memory banks; Lea et al. teach operating neural networks with a processing-in-memory (PIM) architecture; Oh et al. teach a memory device for performing an internal process; and Ma et al. teach digital processing circuits for neural network matrix operations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Acocella et al., Lea et al., Oh et al., and Ma et al. before him/her, to modify the combined teachings of Acocella et al., Lea et al., and Oh et al. with the teachings of Ma et al. regarding matrix processing because matrix operations are required to implement artificial neural networks (ANNs) (Ma et al. [0006]) and creating specialized matrix processing circuits for performing matrix operations needed to implement the ANNs is a popular technique for computations optimization (Ma et al. [0007]). 
Regarding claim 9, Lea et al. further disclose: 
The memory device of claim 8, wherein…is provided in common to the plurality of PIM circuits ([0044] the plurality of neural networks 296-1. . . 296-M can simultaneously receive instructions to operate on the particular portion of data); and 
the plurality of PIM circuits perform the calculation processing ([0014] the plurality of neural networks are configured to receive a particular portion of data and wherein each of the plurality of neural networks are configured to operate on the particular portion of data during a particular time period to make a determination regarding a characteristic of the particular portion of data. In some embodiments the plurality of neural networks may include processing in memory (PIM) architecture) using the weight matrix and at least the first vector and the second vector, the weight matrix including the data from the host provided in common from the host and the at least the first vector and the second vector including a third vector including third memory information read from a third bank among the plurality of banks.
Acocella et al., Lea et al., and Oh et al. do not appear to explicitly teach “the weight matrix…the plurality of PIM circuits perform the calculation processing using the weight matrix and at least the first vector and the second vector, the weight matrix including the data from the host provided in common from the host and the at least the first vector and the second vector including a third vector including third memory information read from a third bank among the plurality of banks.” However, Ma et al. further disclose:
the weight matrix ([0007] Due to the very heavy usage of matrix computations, artificial intelligence is a very computationally intensive field of computing desperately in need of computational optimizations. One of the most popular techniques is to create specialized digital matrix processing circuits for the performing matrix operations needed to implement an artificial neural network)…the plurality of PIM circuits perform the calculation processing using the weight matrix and at least the first vector and the second vector, the weight matrix including the data from the host provided in common from the host and the at least the first vector and the second vector including a third vector including third memory information read from a third bank among the plurality of banks ([0080] The Matrix Processor 200 of FIG. 2A has access to a wide State Random Access Memory (SRAM) bank 230. The wide SRAM 230 is configured such that entire wide rows of data can be accessed in a single memory cycle. In this manner, an entire input vector or an entire row of weight values from a weight matrix can be read out from the SRAM 230 or written to the SRAM 230 in a single memory cycle).
Claims 19 and 20 recite claim limitations substantially similar to those of claims 8 and 9. Therefore, claims 19 and 20 are rejected in substantially the same manner as claims 8 and 9.

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Acocella et al., Lea et al., and Oh et al. as applied to claim 1 above, and further in view of Ma et al., and further in view of Luo (US 2021/0173893).
Regarding claim 23, Acocella et al., Lea et al., and Oh et al. do not appear to explicitly teach while Ma et al. disclose:
 The memory device of claim 8, further comprising…
store the weight matrix (FIG. 4B 4x4 Weight Matrix 438), 
input the first vector to the weight matrix (FIG. 4B Data Vectors 422), and 
(FIG. 4B Output Data Vector 491).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Acocella et al., Lea et al., Oh et al., and Ma et al. before him/her, to modify the combined teachings of Acocella et al., Lea et al., and Oh et al. with the teachings of Ma et al. regarding matrix processing because matrix operations are required to implement artificial neural networks (ANNs) (Ma et al. [0006]) and creating specialized matrix processing circuits for performing matrix operations needed to implement the ANNs is a popular technique for computations optimization (Ma et al. [0007]). The combination would optimize computational tasks associated with implementing ANNs (Ma et al. [0007]).
Acocella et al., Lea et al., Oh et al., and Ma et al. do not appear to explicitly teach “a memristor array, wherein the memristor array is configured to:” store data. However, Luo et al. (‘893) disclose:
… a memristor array ([0138] the memory array 320 is composed of a resistive random access memory (ReRAM). ReRAM is a non-volatile memory that changes the resistance of memory cells across a dielectric solid-state material, sometimes referred to as a “memristor.”)
Acocella et al., Lea et al., Oh et al., Ma et al., and Luo et al. are analogous art because Acocella et al. teach concurrently accessing multiple memory banks; Lea et al. teach operating neural networks with a processing-in-memory (PIM) architecture; Oh et al. teach a memory device for performing an internal process; Ma et al. teach digital processing circuits for neural network matrix operations; Luo et al. teach matrix operations within a memory fabric.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRACY A WARREN whose telephone number is (571)270-7288. The examiner can normally be reached M-Th 7:30am-5pm, Alternate F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Arpan P. Savla can be reached on 571-272-1077. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more 





/TRACY A WARREN/Primary Examiner, Art Unit 2137