DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

REJOINDER 
Claims 1-3 and 5-21 are allowable. Claims 9-16, previously withdrawn from consideration as a result of a restriction requirement, include all the limitations of an allowable claim. Pursuant to the procedures set forth in MPEP § 821.04(a), the restriction requirement between inventions I and II, as set forth in the Office action mailed on 08/16/2021, is hereby withdrawn and claims 9-16 are hereby rejoined and fully examined for patentability under 37 CFR 1.104. In view of the withdrawal of the restriction requirement, applicant(s) are advised that if any claim presented in a continuation or divisional application is anticipated by, or includes all the limitations of, a claim that is allowable in the present application, such claim may be subject to provisional statutory and/or nonstatutory double patenting rejections over the claims of the instant application. Once the restriction requirement is withdrawn, the provisions of 35 U.S.C. 121 are no longer applicable. See In re Ziegler, 443 F.2d 1211, 1215, 170 USPQ 129, 131-32 (CCPA 1971). See also MPEP § 804.01.



Allowable Subject Matter
Claims 1-3 and 5-21 are allowed.
The following is a list of the closes prior art:
Sugazaki (US 2010/0217939) teaches a plurality of crossbars connecting a plurality of system boards, each system board including a plurality of memories and processors.  This fails to teach: “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units.” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole. 
Nag (Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration, 2018) teaches a system used for matrix decomposition using crossbars in memory to carry out computations, implying use of crossbars with decomposition units. The reference otherwise fails to teach the combinations of “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole. 
Dai (GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing, April 2019) teaches a plurality of memories including PIM (and by implication decomposition units) where each memory unit is connected to a crossbar which is connected to a processor.  Dai does not teach applying this structure for decomposing of partial memory requests and therefore cannot teach: “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units.” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole. 
Okada (US 6,728,258) teaches memory communicating across a crossbar including memory requests and changing distribution schemes but fails to teach at least two different schemes for scattering data across the plurality of memory units and therefore cannot teach “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units.” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole. 
Hansen (US 5,778,419) teaches a plurality of memory banks but generally fails to teach “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units.” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole.
Li (CPU versus GPU: which can perform matrix computation faster performance comparison for basic linear algebra subprograms 2018) teaches matrix compute operations but fails to teach “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units.” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole. 
Tanaka (US 6,789, 173) teaches: “Yet another object of the present invention is to provide a node controller which allows a plurality of nodes to be directly connected to form a small system and can omit an external crossbar, and also to provide a multiprocessor system of a main memory shared type using such a node controller.”  Tanaka column 2 lines 60-65.  “The node 5-i includes k CPUs 4-i-j (0.ltoreq.i.ltoreq.k, k being a natural number of 1 or more), a cache 6-i, at least one main memory 7-i, an I/O controller 8-i and a node controller 2-i. The node controller 2-i has a communication controller 3-i and a crossbar 1-i.” Tanaka column 6, lines 23-27.  “The crossbar 1-i is connected to the communication controller 3-i within its own node (having the crossbar 1-i therein) by the signal lines a-i and b-i and also connected to other nodes by the signal lines c-i and d-i.” Tanaka column  lines 58-62.  This teaches a plurality of nodes (including memories) having their own crossbar.  But there is no clear motivation to combine Tanaka with the other references to teach “A system, comprising: a plurality of memory units, wherein a first memory unit included in the plurality of memory units includes a first request processing unit and a first plurality of memory banks, and wherein the first request processing unit includes a first plurality of memory data request decomposition units and a first crossbar switch of the first memory unit, the first crossbar switch communicatively connecting each of the first plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, wherein at least a portion of the first plurality of decomposition units is configured to decompose a memory data request for memory data spanning a plurality of data access units into a plurality of different partial requests for different subsets of the memory data, and wherein a second memory unit included in the plurality of memory units includes a second crossbar switch; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine configured to perform a matrix compute operation of an artificial intelligence compute, and the control logic unit is configured to access the plurality of memory units using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of the first processing element for scattering data across the plurality of memory units different from a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of a second processing element of the plurality of processing elements for scattering data across the plurality of memory units.” as recited in claim 1, “A method comprising: receiving a memory request provided from a first processing element at a first decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; decomposing the memory request for memory data spanning a plurality of data access units into at least a first partial request for a first subset of the memory data and a second partial request for a second subset of the memory data; determining that a requested first data of the first partial request resides in a first memory bank of the first plurality of memory banks; determining that a requested second data of the second partial request resides in a second memory bank of the first plurality of memory banks; directing the first partial request to the first memory bank via the first crossbar switch; directing the second partial request to the second memory bank via the first crossbar switch; retrieving the requested first data from the first memory bank via the first crossbar switch; retrieving the requested second data from the second memory bank via the first crossbar switch; preparing a partial response that includes the requested first and second data; and providing the partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the first memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data of the first processing element for scattering data across the plurality of memory units, and wherein a second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data of the second processing element for scattering data across the plurality of memory units” as recited in claim 17, or “A method comprising: receiving a first memory request provided from a first processing element at a first memory data request decomposition unit of a first memory unit, wherein the first memory unit includes a first plurality of memory banks and a first crossbar switch, the first crossbar switch communicatively connecting each of a plurality of decomposition units included in the first memory unit to each of the first plurality of memory banks included in the first memory unit, and wherein the first processing element is connected to a second memory unit including a second crossbar switch; receiving a second memory request provided from a second processing element at a second memory data request decomposition unit of the first memory unit; decomposing the first memory request for memory data spanning a first plurality of data access units into a first plurality of partial requests for a first subset of the memory data and the second memory request spanning a second plurality of data access units into a second plurality of partial requests for a second subset of the memory data; determining for each partial request of the first plurality of partial requests and the second plurality of partial requests whether the partial request is to be served from the first plurality of memory banks; discarding a first group of partial requests from the first plurality of partial requests and the second plurality of partial requests that is not to be served from the first plurality of memory banks; for each partial request of a second group of partial requests from the first plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a first partial response using the retrieved data, and providing the first partial response to the first processing element, wherein the first processing element is configured to access a plurality of memory units including the rs memory unit and the second memory unit using a first dynamically programmable distribution scheme specifying a first programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units; and for each partial request of a third group of partial requests from the second plurality of partial requests that is to be served from the first plurality of memory banks, retrieving data of the partial request via the first crossbar switch, preparing a second partial response using the retrieved data, and providing the second partial response to the second processing element, wherein the second processing element is configured to access the plurality of memory units using a second dynamically programmable distribution scheme specifying a second programmatically modifiable distribution pattern of data for scattering data across the plurality of memory units.” as recited in claim 20, as a whole.





Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571)272-8646.  The examiner can normally be reached on Monday - Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on 571 272 4204.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


PAUL M. KNIGHT
Examiner
Art Unit 2139



/PAUL M KNIGHT/Examiner, Art Unit 2139