DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1, 2, 6 and 8-20 is/are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Barry (US 2015/0046675).
Regarding claim(s) 1, Barry teaches:
A memory system comprising: a first memory card, wherein the first memory card comprises: a memory device, wherein the memory device comprises a logic die and a memory die;	[0099] shows that a memory system in the memory fabric 106 (“HBM+ card”) can include a plurality of memory slices. [0117] shows that the accelerator 104 can be coupled to the AMC 204 to access memory slices in the memory fabric 106 at a high bandwidth.  [0234] shows that a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). [0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104. [0099] shows that each memory slice being associated with one of the vector processors 102 (part of “logic die”). Each memory slice may be provided with a memory slice controller (part of “logic die”) for providing access to a related memory slice. [0099] shows that each memory slice can include a plurality of Random Access Memory (RAM) tiles (“memory die”), where each RAM tile can include a read port and a write port. [0234] shows that the plurality of buffers 1902 can be a part of a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). The repository can comprise a memory slice from the memory fabric 106.		
a controller connected to the memory device and capable of interfacing with a host;    [0091] The peripheral device 108 can be configured to provide a communication channel for sending and receiving data bits to and from external devices, such as an image sensor and an accelerometer. The peripheral device 108 can provide a communication mechanism for the vector processors 102, the hardware accelerators 104, and the memory fabric 106 to communicate with external devices. Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.  
a first connection configured to connect to the host; and 	[0014] shows that the computing device can include a peripheral device coupled to a plurality of input/output (I/O) pins, wherein the peripheral device is configured to provide a communication channel between at least one of the plurality of vector processors and an external device (“host”).			
a fabric connection configured to connect to a second memory card, 	Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.				
the second memory card comprising a second connection capable of connecting to the host, wherein the first memory card is capable of communicating with the host via the fabric connection and the second connection.	[0252] shows that alternative embodiments may include multiple instances of a particular operation.
In addition, having two HBM+ card instances instead of one instance provides more resources to process and store/retrieve data, but does not produce new and unexpected result.  Therefore, having two HBM+ card instances instead of one instance pertains to duplication of parts which has no patentable significance.  See In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960).				
					
Regarding claim(s) 2, Barry teaches:				
wherein the logic die comprises an accelerator logic configured to: receive instructions from the controller; 	[0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104. [0099] shows that each memory slice being associated with one of the vector processors 102 (part of “logic die”). [0098] shows that a processing unit can read/write up to 128-bits per cycle through its load-store unit (LSU) ports and read up to 128 bit program code per cycle through its instruction port. In addition to IPI 202 and AMC 204 interfaces for processors 102 and hardware accelerators 104, respectively. [0091] The peripheral device 108 can be configured to provide a communication channel for sending and receiving data bits to and from external devices, such as an image sensor and an accelerometer. The peripheral device 108 can provide a communication mechanism for the vector processors 102, the hardware accelerators 104, and the memory fabric 106 to communicate with external devices. [0182] shows a data receiver module 604 for receiving one or more scan-lines of an image for processing.					
input vectors to a computational component; 	[0087] The one or more vector processors 102 includes a central processing unit (CPU) that implements an instruction set containing instructions that operate on an array of data called vectors.				
execute a mathematical operation; and return an output to an accumulator.	[0182] shows a data output module 606 for outputting one or more scan-lines that have been processed by one or more generic ISP function modules 602A-602H. [0188] shows that when a first ISP function module 602A completes its operation on a scan-line of an image, the first ISP function module 602A can store the processed scan-line in a FIFO buffer 704. As the first ISP function module 602A continues to process additional scan-lines, the first ISP function module 602A can continue to store the processed scan-lines in the FIFO buffer 704 until the FIFO buffer 704 is full.				
					
Regarding claim(s) 6, Barry teaches:								
wherein the memory die comprises at least one volatile memory component.	[0099] shows that each memory slice can include a plurality of Random Access Memory (RAM) tiles (“memory die”), where each RAM tile can include a read port and a write port.				
Regarding claim(s) 8, Barry teaches:								
wherein the memory device is configured to send or receive data to another memory device in the second memory card using at least one of a buffer-based communication link or peer-to-peer communication link.	[0234] shows that a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). [0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104. [0099] shows that each memory slice being associated with one of the vector processors 102 (part of “logic die”). Each memory slice may be provided with a memory slice controller (part of “logic die”) for providing access to a related memory slice.				
					
Regarding claim(s) 9 and 16, Barry teaches:					
wherein the host provides one or more third connections, wherein a number of the third connections is fewer than a number of memory cards included in the system.	[0014] shows that the computing device can include a peripheral device coupled to a plurality of input/output (I/O) pins, wherein the peripheral device is configured to provide a communication channel (“third connections”) between at least one of the plurality of vector processors and an external device (“host”). 
The Examiner notes that defining the limitation “third connections”, (i.e. specifying which entities “the third connections” connect) would help overcome the cited prior art. 			
Regarding claim(s) 10 and 17, Barry teaches:						
wherein instructions from the host is configured to be received by the second memory card via the second connection, and transmitted to the first memory card via the fabric connection.	[0252] shows that alternative embodiments may include multiple instances of a particular operation. Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.				
					
Regarding claim(s) 11 and 18, Barry teaches:					
wherein the logic die is configured to perform a computation based on the instructions.	[0087] The one or more vector processors 102 includes a central processing unit (CPU) that implements an instruction set containing instructions that operate on an array of data called vectors.				
					
Regarding claim(s) 12, Barry teaches:						
A memory system comprising: a first memory card, wherein the first memory card comprises: a first memory device, 	[0099] shows that a memory system in the memory fabric 106 (“HBM+ card”) can include a plurality of memory slices (part of “HBM+ cubes”). [0117] shows that the accelerator 104 can be coupled to the AMC 204 to access memory slices in the memory fabric 106 at a high bandwidth.				
wherein the first memory device comprises a first logic die and a first memory die;     [0234] shows that a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). [0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104. [0099] shows that each memory slice being associated with one of the vector processors 102 (part of “logic die”). Each memory slice may be provided with a memory slice controller (part of “logic die”) for providing access to a related memory slice. [0099] shows that each memory slice can include a plurality of Random Access Memory (RAM) tiles (“memory die”), where each RAM tile can include a read port and a write port. [0234] shows that the plurality of buffers 1902 can be a part of a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). The repository can comprise a memory slice from the memory fabric 106.				
a first controller connected to the first memory device and capable of interfacing with a host; 	[0091] The peripheral device 108 can be configured to provide a communication channel for sending and receiving data bits to and from external devices, such as an image sensor and an accelerometer. The peripheral device 108 can provide a communication mechanism for the vector processors 102, the hardware accelerators 104, and the memory fabric 106 to communicate with external devices. Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.				
a first connection configured to connect to the host; and 	[0014] shows that the computing device can include a peripheral device coupled to a plurality of input/output (I/O) pins, wherein the peripheral device is configured to provide a communication channel between at least one of the plurality of vector processors and an external device (“host”).			
a first fabric connection configured to connect to another memory card; and 	Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.				
a second memory card, wherein the second memory card comprises: a second memory device, wherein the second memory device comprises a second logic die and a second memory die; a second controller connected to the second memory device and capable of interfacing with the host; a second connection configured to connect to the host; and a second fabric connection configured to connect to the first memory card, wherein the first memory card is capable of communicating with the host via the first fabric connection, the second fabric connection, and the second connection.		[0252] shows that alternative embodiments may include multiple instances of a particular operation.
In addition, having two HBM+ card instances instead of one instance provides more resources to process and store/retrieve data, but does not produce new and unexpected result.  Therefore, having two HBM+ card instances instead of one instance pertains to duplication of parts which has no patentable significance.  See In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960).				
					
Regarding claim(s) 13, Barry teaches:						
further comprising: a third memory card having a third connection configured to connect to the host and a third fabric connection configured to connect to the second memory card; and a fourth memory card having a fourth connection configured to connect to the host and a fourth fabric connection configured to connect to the second memory card.	[0252] shows that alternative embodiments may include multiple instances of a particular operation. [0014] shows that the computing device can include a peripheral device coupled to a plurality of input/output (I/O) pins, wherein the peripheral device is configured to provide a communication channel between at least one of the plurality of vector processors and an external device (“host”). Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.				
					
Regarding claim(s) 14, Barry teaches:						
wherein: the first fabric connection is connected to the second fabric connection, the third fabric connection, and the fourth fabric connection; the second fabric connection is connected to the first fabric connection, the third fabric connection, and the fourth fabric connection; the third fabric connection is connected the first fabric connection, the second fabric connection, and the fourth fabric connection; and the fourth fabric connection is connected to the first fabric connection, the second fabric connection, and the third fabric connection.	[0252] shows that alternative embodiments may include multiple instances of a particular operation. Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.				
					
Regarding claim(s) 15, Barry teaches:						
wherein the second connection, the third connection, and the fourth connection are connected to the host.	[0252] shows that alternative embodiments may include multiple instances of a particular operation. [0014] shows that the computing device can include a peripheral device coupled to a plurality of input/output (I/O) pins, wherein the peripheral device is configured to provide a communication channel between at least one of the plurality of vector processors and an external device (“host”).

Regarding claim(s) 19, Barry teaches:
A memory system comprising: a first memory card, wherein the first memory card comprises: a memory device, 	[0099] shows that a memory system in the memory fabric 106 (“HBM+ card”) can include a plurality of memory slices. [0117] shows that the accelerator 104 can be coupled to the AMC 204 to access memory slices in the memory fabric 106 at a high bandwidth.  [0234] shows that a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). [0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104. [0099] shows that each memory slice being associated with one of the vector processors 102 (part of “logic die”). Each memory slice may be provided with a memory slice controller (part of “logic die”) for providing access to a related memory slice. [0099] shows that each memory slice can include a plurality of Random Access Memory (RAM) tiles (“memory die”), where each RAM tile can include a read port and a write port. [0234] shows that the plurality of buffers 1902 can be a part of a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). The repository can comprise a memory slice from the memory fabric 106.
wherein the memory device is configured to send and receive data to another memory device using at least one of a buffer-based or peer-to- peer communication link, 	[0234] shows that a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). [0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104. [0099] shows that each memory slice being associated with one of the vector processors 102 (part of “logic die”). Each memory slice may be provided with a memory slice controller (part of “logic die”) for providing access to a related memory slice.
the memory device comprising a memory and an accelerator; a controller coupled to the memory device and configured to interface with a host; 	[0091] The peripheral device 108 can be configured to provide a communication channel for sending and receiving data bits to and from external devices, such as an image sensor and an accelerometer. The peripheral device 108 can provide a communication mechanism for the vector processors 102, the hardware accelerators 104, and the memory fabric 106 to communicate with external devices. Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.
a first connection configured to connect to the host; and 	[0014] shows that the computing device can include a peripheral device coupled to a plurality of input/output (I/O) pins, wherein the peripheral device is configured to provide a communication channel between at least one of the plurality of vector processors and an external device (“host”).
a fabric connection configured to connect to a second memory card, 	Fig. 2 and [0093] shows that the vector processors 102 can communicate with one another via the inter-processor interconnect (IPI) 202. The vector processors 102 can also communicate with other components in the computing device 100, including the memory fabric 106 and/or hardware accelerators 104, via the IPI 202 and the Accelerator Memory Controller (AMC) crossbar 204 or a memory-mapped processor bus 208.
the second memory card comprising a second connection capable of connecting to the host, wherein the first memory card is capable of communicating with the host via the fabric connection and the second connection.	[0252] shows that alternative embodiments may include multiple instances of a particular operation.
In addition, having two HBM+ card instances instead of one instance provides more resources to process and store/retrieve data, but does not produce new and unexpected result.  Therefore, having two HBM+ card instances instead of one instance pertains to duplication of parts which has no patentable significance.  See In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960).

Regarding claim(s) 20, Barry teaches:
wherein the first memory card is configured to operate in accordance with computer program instructions for executing operations on the accelerator and controlling communication with the memory device.	[0098] shows that a processing unit can read/write up to 128-bits per cycle through its load-store unit (LSU) ports and read up to 128 bit program code per cycle through its instruction port. In addition to IPI 202 and AMC 204 interfaces for processors 102 and hardware accelerators 104, respectively. [0091] The peripheral device 108 can be configured to provide a communication channel for sending and receiving data bits to and from external devices, such as an image sensor and an accelerometer. The peripheral device 108 can provide a communication mechanism for the vector processors 102, the hardware accelerators 104, and the memory fabric 106 to communicate with external devices. [0182] shows a data receiver module 604 for receiving one or more scan-lines of an image for processing.	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Barry (US 2015/0046675) in view of Fowers (US 2019/0325297).
Regarding claim(s) 3, Barry teaches:
wherein the logic die comprises an accelerator, 	[0234] shows that a repository of buffers that can be partitioned and exclusively assigned to one of the processing units (“logic die”). [0245] shows an example of a processing unit (“logic die”) as a vector processor 102 and a hardware accelerator 104.
wherein the accelerator comprises: a control component; a buffer; an instruction decoder; and 	[0109] In some embodiments, the AMC 204 (“control engine”) can include a pair of 64 bit ports into each memory slice of the memory fabric 106. The AMC 204 (“control engine”) can be configured to route requests from a hardware accelerator 104 to an appropriate memory slice by partial address decode (“instruction decoder”). [0188] shows that when a first ISP function module 602A completes its operation on a scan-line of an image, the first ISP function module 602A can store the processed scan-line in a FIFO buffer 704. As the first ISP function module 602A continues to process additional scan-lines, the first ISP function module 602A can continue to store the processed scan-lines in the FIFO buffer 704 until the FIFO buffer 704 is full.
Barry does not explicitly teach, but Fowers teaches	a general matrix multiply (GEMM) component.		[0040] shows that one way of executing CNNs on CPUs, GPUs, or ASICs is to transform convolution operations into faster General Matrix to Matrix Multiplication (GEMM) operations.
It would have been obvious to a person having ordinary skill in the art, at the time the invention was filed, to combine the image computing system/method of Barry with the convolution operation method/system taught by Fowers. The motivation for doing so would have been to enable higher performance through faster convolution operations as taught by Fowers in [0040].			
							

Claim(s) 4 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Barry (US 2015/0046675) and Fowers (US 2019/0325297), further in view of Koren (US 2019/0042538).
Regarding claim(s) 4, Barry teaches:
wherein the control component is configured to operate as at least one of a routing controller, a high bandwidth memory controller, a direct memory access (DMA) engine, a power controller, or 	[0106] shows that the memory fabric 106 can also include a direct memory access (DMA) controller (“DMA engine”) that coordinates the data transfer amongst vector processors 102, hardware accelerators 104, and memory. [0117] shows that the accelerator 104 can be coupled to the AMC 204 to access memory slices in the memory fabric 106 at a high bandwidth. [0092] shows that the power management module 110 can be configured to control activities of designated blocks within the computing device 100. More particularly, the power management module 110 can be configured to control the power supply voltage of designated blocks, also referred to as power islands, within the computing device 100.	
The combination of Barry and Fowers does not explicitly teach, but Koren teaches            a multiple model adaptive controller (MMAC) scheduler.	[0026] shows an accelerator for a processor with two modes: a first mode for dense layers and a second mode for sparse layers where an element such as a software element determines the mode of the accelerator.
It would have been obvious to a person having ordinary skill in the art, at the time the invention was filed, to combine the image computing system/method of Barry and Fowers with the accelerator system/method for processing data taught by Koren. The motivation for doing so would have been to efficiently handling layers or data elements that are either dense or sparse by minimizing the use of memory space for redundant zero elements when handling a sparse layer and therefore improve processing speed. This is taught by Koren in [0025, 0027, 0028].		
					
Regarding claim(s) 5, Koren teaches:				
wherein the GEMM component comprises one or more of: a first multiple model adaptive controller (MMAC); a second MMAC; or a multiplexer configured to route first data to the second MMAC and route second data to first MMAC.	[0026] shows an accelerator for a processor with two modes: a first mode for dense layers and a second mode for sparse layers where an element such as a software element determines the mode of the accelerator.
Abstract and [0034-0039] shows that in a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer.	


Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Barry (US 2015/0046675) in view of Chung (US 2012/0290793).
Regarding claim(s) 7, Barry teaches:			
wherein the logic die is stacked on top of the memory die.	[0002] shows that stacked memory, or 3D stacking, is a recent proposal that addresses this limitation by stacking memory directly on top of a processor. 
It would have been obvious to a person having ordinary skill in the art, at the time the invention was filed, to combine the image computing system/method of Barry with the 3D stacked memory system/method of Chung. The motivation for doing so would have been to significantly reduce wire delays between the processor and memory, as taught by Chung in [0002]. This would increase the processing speed of a memory system.	


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhao (US 2018/0074727): discloses heterogeneous computing that leverages different computing elements to accelerate applications.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES J CHOI whose telephone number is (571)270-0605. The examiner can normally be reached MON-FRI: 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JARED RUTZ can be reached on 571-272-5535. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES J CHOI/Examiner, Art Unit 2133