DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This Office Action is sent in response to Applicant’s Communication received 5/16/2019 for application number 16/414,534. The Office hereby acknowledges receipt of the following and placed of record in file: Specification, Drawings, Abstract, Oath/Declaration, claims.
3.	Claims 1 – 20 are presented for examination.

Claim Rejections - 35 USC § 101

4.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
	
5.	Claim 19 is directed to an abstract idea without significantly more.  The independent claim recites a computer implemented method for receiving a local matrix multiplication operation result formatted using a first data layout format; applying a transpose operation to transpose the local matrix multiplication operation result into a transposed result; scattering the transposed result into a shared memory using a second data layout format; gathering an input data matrix from the shared memory to finalize the distributed transpose; performing a matrix operation on the input data matrix to generate a matrix operation result; and writing the matrix operation result to the shared memory.
The limitations, as drafted, describe a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “scattering … into a shared memory” and “gathering … from the shared memory,” nothing in the claim elements preclude the step from practically being performed in the mind such that the “applying …” and “performing …” limitations are mental processes under Prong I of step 2A.  For example, but for the noted language, all of the limitations including are pre/post-activity solutions for getting/obtaining/displaying data without significantly more.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
Additionally, the “applying a transpose operation …“ and “performing a matrix operation …” limitations recite mathematical concepts and are thus abstract ideas. 
This judicial exception is not integrated into a practical application. In particular, the components in the writing step are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of receiving information, executing a function and making a decision) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Additionally, the steps of “receiving…,” “scattering…,” “gathering…” and “writing…” are pre-activity solutions as gathering data that are insignificant under Prong II step 2A and 2B.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computer to perform the noted steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.  
Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
8.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
9.	Claims 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lau et al. (U.S. Publication 2019/0392297) (Lau hereinafter) in view of Jacob et al. (U.S. Publication 2020/0341764) (Jacob hereinafter).
10. 	As per claim 1, Lau teaches a microprocessor [Master Control CPU 1632, fig. 16C, “Master control CPU (MCC) 1632 may be configured to control and/or manage matrix operations performed by a matrix processing cluster 1630. In some embodiments, master control CPU 1632 may be a microprocessor, an integrated circuit, and/or any other type of circuitry and/or processing logic,” ¶ 0095], comprising: 
a shared memory [“Memory resource blocks (MRBs) 1638 may be memory components on matrix processing cluster 1630 used to store matrix operands and other matrix data … memory resource blocks (MRBs) 1638 may be shared by the matrix processing units (MPUs) 1634 of a particular matrix processing cluster 1630,” ¶ 0098]; and a processing element including:
a matrix processor unit configured to perform a matrix operation [Matrix Processing Units 1634, fig. 16C]; 
a transpose hardware unit configured to perform a matrix transpose operation [Slicing Engine 1636, fig. 16C; “slicing engine 1636 and/or the associated convolution slicing engine (CSE) may be used to perform the dimension shuffle operations to reorder the dimensions of a matrix,” ¶ 0098; reordering matrix dimension mapped to matrix transpose operation; “A transpose operation, for example, is used to “transpose” the rows and columns of a matrix, by rearranging the rows as columns and the columns as rows.  A transpose operation can be performed on a matrix processor, for example, by retrieving each row of a matrix from memory, and then storing each row back in memory as a column,” ¶ 0125; “A particular dimension shuffle operation may involve one or more non-transpose and/or transpose convolutional reads,” ¶ 0348];
a scatter hardware unit; and a gather hardware unit [“scatter/gather DMA is supported in hardware, and the DMA descriptors are generalized to support multiple configurations (e.g. ring buffers, linear buffer, etc.), allowing for different types of host driver optimization,” ¶ 0053].
Lau does not explicitly disclose but Jacob discloses a scatter hardware unit configured to place data to the shared memory at locations selected for an output data layout conversion; and a gather hardware unit configured to obtain input data from the shared memory from non-contiguous locations for an input data layout conversion [“performing one or more scatter gather operations by accessing a shared memory that is shared amongst multiple nodes interconnected through one or more networks, the shared memory comprising a coordination namespace that is shared amongst the multiple nodes, the operations comprising: gathering data from multiple processes at corresponding multiple nodes into a one or more locations in the coordination namespace, and creating one or more tuples having a same tuple name in the coordination namespace, wherein the one or more tuples have information referencing the gathered data in the one or more locations; or scattering data that has been gathered using the same tuple name to multiple processes participating in the coordination namespace, the scattering using the one or more tuples in the coordination namespace, the scattering performed from the one or more locations into other locations at one or multiple nodes for one or multiple processes at the corresponding one or multiple nodes; or performing both the gathering data and the scattering data,” cl. 1].
          It would have been obvious to one of ordinary skill in the art, having the teachings of Lau and Jacob available before the effective filing date of the claimed invention, to modify the capability of managing deep learning hardware as disclosed by Lau to include the capability of data scatter and gather as taught by Jacob, thereby providing a mechanism to enhance system efficiency by facilitating the optimization of data storage and access in the context of the available system resources.
11. 	As per claim 2, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the transpose hardware unit, the scatter hardware unit, and the gather hardware unit are different units configured to be operated at least in part in parallel [“Matrix processing clusters 1630 may include processing resources configured to perform matrix operations, such as matrix multiplication , convolutions, and/or dimension shuffling, among other examples. In some embodiments, matrix processing clusters 1630 may be collectively used to execute a particular matrix operation by performing matrix processing in parallel. In the illustrated embodiment, matrix processing chip 1620 includes twelve matrix processing clusters 1630a-I,” ¶ 0093].
12. 	As per claim 3, Lau and Jacob teach the microprocessor of claim 2.  Lau further teaches wherein operations of the transpose hardware unit, the scatter hardware unit, and the gather hardware unit are configured to be scheduled to execute in parallel [“The diagram 600 of FIG. 6 shows how flit segmentation and re-assembly may be performed at the interface between the on-chip and inter-chip networks. The primary components of the Inter-chip network, in one example, may be an inter-chip link (ICL) blocks (e.g., 325a-f) and an inter-chip crossbar (ICC) hardware (e.g., 510).  An on-chip fabric 605 may be used to interconnect the DHL processing clusters (which, in turn, may connect to HBMs (e.g., 320a-d) ). In some implementations multiple (e.g., 12) ICLs (e.g., 325a-f) may be provided to support multiple interconnect topologies, among other example implementations,” ¶ 0056; “Matrix processing clusters 1630 may include processing resources configured to perform matrix operations, such as matrix multiplication , convolutions, and/or dimension shuffling, among other examples. In some embodiments, matrix processing clusters 1630 may be collectively used to execute a particular matrix operation by performing matrix processing in parallel. In the illustrated embodiment, matrix processing chip 1620 includes twelve matrix processing clusters 1630a-I,” ¶ 0093].
13. 	As per claim 4, Lau and Jacob teach the microprocessor of claim 2.  Lau further teaches wherein the transpose hardware unit, the scatter hardware unit, and the gather hardware unit are configured for pipelined operation [“Design of an example DLH device may be fully pipelined and can take in up to four sets of 32 operands (e.g., tensor operands) per cycle to perform matrix multiplication, as well as partial product addition and pre - and post - multiplication operations,” ¶ 0066].
14. 	As per claim 5, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the data placed by the scatter hardware unit includes at least a portion of a result data of the matrix processor unit [“the fundamental instructions and/or commands supported by the matrix processor can be used to program matrix subroutines for more complex matrix operations, such as distributed matrix multiplication and/or convolution operations, dimension shuffle operations, reshape operations, and so forth,” ¶ 0117; “output engine 1737 may provide the result 1738d to other components of the matrix processing architecture. For example, in some cases, matrix operation 1701 may be a partial matrix operation associated with a larger matrix operation distributed across multiple processing resources, and thus the result of matrix operation 1701 may be a partial result associated with the larger distributed operation. Moreover, the partial result 1738d may be needed by other processing resource(s) involved in the distributed matrix operation. Accordingly, output engine 1737 may provide the partial result 1738d to the appropriate resource, for example, for further processing and/or storage,” ¶ 0120].
15.	As per claim 6, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the matrix processor unit is configured to process the input data obtained by the gather hardware unit [“The image data stored in one of the MRBs 1738a may be used by slice engine 1736a to extract a sliced matrix operand. The sliced matrix operand, for example, may be a particular portion of the image data involved in the convolution related operations,” ¶ 0115].
16. 	As per claim 7, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein performing the output data layout conversion includes converting an output data layout format of a first neural network layer to a different input data layout format of a second neural network layer [“DLH device may include multiple processing clusters. For instance, as shown in the diagram 800 of FIG. 8, in one example, each processing cluster 305 may store local tensor information, processes instruction streams from the host, and perform the computations required by the artificial neural networks,” ¶ 0060].
17. 	As per claim 8, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein performing the output data layout conversion includes converting a first data layout format associated with a matrix processor result of a first neural network layer to a second data layout format associated with a second neural network layer, wherein the first and second data layout formats are different [“DLH device may include multiple processing clusters. For instance, as shown in the diagram 800 of FIG. 8, in one example, each processing cluster 305 may store local tensor information, processes instruction streams from the host, and perform the computations required by the artificial neural networks,” ¶ 0060].
18. 	As per claim 9, Lau and Jacob teach the microprocessor of claim 8.  Lau further teaches wherein an inner dimension of the first data layout format corresponds to one of the outer dimensions of the second data layout format [“dimension shuffling is performed for a three - dimensional (3D) matrix stored in two - dimensional (2D) memory. The example 3D matrix includes dimensions A , B , and C (or AxBxC). In the illustrated examples, the 3D matrix is stored in 2D memory with its dimensions arranged as ABXC, and dimension shuffling is used to reorder the dimensions into other 2D permutations, such as from ABxC to BAXC, and from ABC to BCXA,” ¶ 0342].
19. 	As per claim 10, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein performing the input data layout conversion includes converting an output data layout format of a first neural network layer to a different input data layout format of a second neural network layer [“FIGS. 21A - 21D illustrate examples of max pooling using a matrix processing engine. An artificial neural network, such as a convolutional neural network, includes a series of connected layers. In some cases, the neural network may include one or more max pooling layers. Max pooling is a down-sampling operation that reduces the spatial size of an input feature map, for example, to reduce the amount of parameters and computation in the neural network. A max pooling layer, for example, is often inserted between successive convolutional layers in a convolutional neural network. Max pooling is performed by sliding a “max filter ” throughout the input feature map, identifying the maximum value within each filter position on the input feature map, and storing the respective maximum values in an output feature matrix.” ¶ 0154].
20. 	As per claim 11, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein performing the input data layout conversion includes converting a first data layout format associated with a first neural network layer to a second data layout format associated with a second neural network layer, wherein the first and second data layout formats are different, and wherein the first data layout format is an output data layout format and the second data layout format is an input data layout format [“FIGS. 21A - 21D illustrate examples of max pooling using a matrix processing engine. An artificial neural network, such as a convolutional neural network, includes a series of connected layers. In some cases, the neural network may include one or more max pooling layers. Max pooling is a down-sampling operation that reduces the spatial size of an input feature map, for example, to reduce the amount of parameters and computation in the neural network. A max pooling layer, for example, is often inserted between successive convolutional layers in a convolutional neural network. Max pooling is performed by sliding a “max filter ” throughout the input feature map, identifying the maximum value within each filter position on the input feature map, and storing the respective maximum values in an output feature matrix.” ¶ 0154].
21. 	As per claim 12, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the matrix processor unit is a dot product engine [“a filter (e.g., filter 3004) for a color image may be represented by a 3D matrix with dimensions corresponding to the number of channels (C), filter height (R), and filter width (S). In these embodiments , a convolution operation 3000 may be per formed by moving the filter 3004 throughout the image 3002 and computing the dot product between the filter 3004 and the various portions of the image 3002. For example, in some embodiments, the filter 3004 may be moved along the height and width of the image 3002 using a certain stride or interval, the dot product may be computed at each location, and the result may be stored in the corresponding location of a result matrix 3006,” ¶ 0243].
22. 	As per claim 13, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the transpose hardware unit, the scatter hardware unit, and the gather hardware unit are each configured to operate at a throughput that at least meets a maximum throughput of the matrix processor unit [“An example DLH device may be designed to have the ability to scale-out processing across multiple chips/boards/systems so that larger computational models can be transparently deployed by the end user. In artificial neural networks, inter-chip communication may be utilized for instance to scale up the capacity of a network (i.e. more layers, nodes, more parameters, etc.), speed up the training of a network by splitting the computation of the network across multiple nodes, among other example functions,” ¶ 0055].
23. 	As per claim 14, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the gather hardware unit is configured to obtain the input data from the shared memory including by being configured to perform cache-line block reads [“An example DLH device may include a Super Memory Block (SMB) that groups together all the memory resource blocks (MRBs) in that corresponding processing cluster. Multiple on-chip clients have both read and write access to the MRBs within the SMB … , each MRB may be configured to read and write 32 matrix values either row-wise or column-wise every cycle.  As an example, a MRB (e.g., 830a - n) may be composed of 16 logical memories with individual addressing and input and output data rotation to support both the row and column access,” ¶ 0067].
24.	As per claim 15, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the matrix operation is a depthwise convolution or a three-dimensional convolution [“FIG. 34C illustrates an example of matrix dimension shuffling using non-transpose convolutional read operations . In the illustrated example, matrix 3400C-1 corresponds to a 3D matrix (e.g., 3D matrix 3400A of FIG . 34A) that is stored in 2D memory with its dimensions arranged as ABxC. Similarly, matrix 3400C-2 corresponds to the same matrix but with its dimensions arranged as BAxC. In some embodiments, matrix 3400C-1 may be converted into matrix 3400C-2, or from ABxC to BAxC, using non-transpose convolutional read operations.” ¶ 0349].
25. 	As per claim 16, Lau and Jacob teach the microprocessor of claim 1.  Jacob further teaches wherein the locations selected for the output data layout conversion are specified using arguments to a scatter operation primitive [“scattering data that has been gathered using the same tuple name to multiple processes participating in the coordination namespace, the scattering using the one or more tuples in the coordination namespace, the scattering performed from the one or more locations into other locations at one or multiple nodes for one or multiple processes at the corresponding one or multiple nodes; or performing both the gathering data and the scattering data,” cl. 1].
          It would have been obvious to one of ordinary skill in the art, having the teachings of Lau and Jacob available before the effective filing date of the claimed invention, to modify the capability of managing deep learning hardware as disclosed by Lau to include the capability of data scatter and gather as taught by Jacob, thereby providing a mechanism to enhance system efficiency by facilitating the optimization of data storage and access in the context of the available system resources.
26. 	As per claim 17, Lau and Jacob teach the microprocessor of claim 1.  wherein the non-contiguous locations for the input data layout conversion are specified using arguments to a gather operation primitive [“Slice engine 1736a may then “slice” the matrix data stored in MRBs 1738a to extract the particular matrix operands associated with matrix operation 1701. For example, in some cases, the associated matrix operands may only include a subset of the matrix data stored in MRBS 1738a, and/or the matrix operands may not be arranged contiguously in the matrix data stored in MRBs 1738a. Accordingly, slice engine 1736a may extract particular "slices” or pieces of the matrix data stored in MRBs 1738a, and may then arrange the slices in a particular manner to form the respective matrix operands,” ¶ 0114].
27. 	As per claim 18, Lau and Jacob teach the microprocessor of claim 1.  Lau further teaches wherein the processing element further includes a scheduler unit configured to schedule overlapping operations to the matrix processor unit, the transpose hardware unit, the scatter hardware unit, and the gather hardware unit [“An MCC (Master Control CPU) may take in a stream of instructions from a host CPU connected to a DLH device. These instructions can be thought of as macro commands from the host CPU. These instructions may pass or include tensor data for operation using processing clusters of the DLH. With each instruction, the MCC may invoke a series of operations on the MPUs of one or more processing clusters. The MCC may coordinate the data flow and arithmetic operations that are sequenced to the MPUs.” ¶ 0061].
28. 	As per claim 19, Lau teaches a method, comprising: 
receiving a local matrix multiplication operation result formatted using a first data layout format [“Design of an example DLH device may be fully pipelined and can take in up to four sets of 32 operands (e.g., tensor operands) per cycle to perform matrix multi plication, as well as partial product addition and pre- and post- multiplication operations,” ¶ 0066];
applying a transpose operation to transpose the local matrix multiplication operation result into a transposed result [Slicing Engine 1636, fig. 16C; “slicing engine 1636 and/or the associated convolution slicing engine (CSE) may be used to perform the dimension shuffle operations to reorder the dimensions of a matrix,” ¶ 0098; reordering matrix dimension mapped to matrix transpose operation; “A transpose operation, for example, is used to “transpose” the rows and columns of a matrix, by rearranging the rows as columns and the columns as rows.  A transpose operation can be performed on a matrix processor, for example, by retrieving each row of a matrix from memory, and then storing each row back in memory as a column,” ¶ 0125; “A particular dimension shuffle operation may involve one or more non-transpose and/or transpose convolutional reads,” ¶ 0348];
performing a matrix operation on the input data matrix to generate a matrix operation result [“The flowchart may begin at block 3302 by receiving a command to perform a matrix operation. The matrix operation, for example, may comprise an operation on a plurality of input matrices (e.g., matrix operands). Moreover , the matrix operation may be associated with one or more convolution operations,” ¶ 0331, fig. 33]; and 
writing the matrix operation result to the shared memory [“the result of each convolutional read may be stored in a result matrix based on the order in which the data is accessed using strided memory access,” ¶ 0348].
Lau does not explicitly disclose but Jacob discloses scattering the transposed result into a shared memory using a second data layout format; and gathering an input data matrix from the shared memory to finalize the distributed transpose [“performing one or more scatter gather operations by accessing a shared memory that is shared amongst multiple nodes interconnected through one or more networks, the shared memory comprising a coordination namespace that is shared amongst the multiple nodes, the operations comprising: gathering data from multiple processes at corresponding multiple nodes into a one or more locations in the coordination namespace, and creating one or more tuples having a same tuple name in the coordination namespace, wherein the one or more tuples have information referencing the gathered data in the one or more locations; or scattering data that has been gathered using the same tuple name to multiple processes participating in the coordination namespace, the scattering using the one or more tuples in the coordination namespace, the scattering performed from the one or more locations into other locations at one or multiple nodes for one or multiple processes at the corresponding one or multiple nodes; or performing both the gathering data and the scattering data,” cl. 1].
          It would have been obvious to one of ordinary skill in the art, having the teachings of Lau and Jacob available before the effective filing date of the claimed invention, to modify the capability of managing deep learning hardware as disclosed by Lau to include the capability of data scatter and gather as taught by Jacob, thereby providing a mechanism to enhance system efficiency by facilitating the optimization of data storage and access in the context of the available system resources.
29. 	As per claim 20, Lau teaches a microprocessor [Master Control CPU 1632, fig. 16C, “Master control CPU (MCC) 1632 may be configured to control and/or manage matrix operations performed by a matrix processing cluster 1630. In some embodiments, master control CPU 1632 may be a microprocessor, an integrated circuit, and/or any other type of circuitry and/or processing logic,” ¶ 0095], comprising: 
a shared memory [“Memory resource blocks (MRBs) 1638 may be memory components on matrix processing cluster 1630 used to store matrix operands and other matrix data … memory resource blocks (MRBs) 1638 may be shared by the matrix processing units (MPUs) 1634 of a particular matrix processing cluster 1630,” ¶ 0098]; and 
a plurality of processing elements configured to operate in parallel [“workloads involving a convolution or matrix multiplication operation may be performed by orchestrating portions of the work to be performed substantially in parallel by multiple MPUs . Data transferred between MPUs or even between multiple DLHs ( e.g. , as in the example of FIG . 2 ) may be transferred as tensors . Additionally , specialized memory blocks may be provided , with access to the memory shared by the multiple MPUs to limit data exchanges and simplify and expedite workloads involving multiple cooperating MPUs , among other example functions and advantages,” ¶ 0044], wherein each processing element includes: 
a matrix processor unit configured to perform a matrix operation [Matrix Processing Units 1634, fig. 16C]; 
a transpose hardware unit configured to perform a matrix transpose operation [Slicing Engine 1636, fig. 16C; “slicing engine 1636 and/or the associated convolution slicing engine (CSE) may be used to perform the dimension shuffle operations to reorder the dimensions of a matrix,” ¶ 0098; reordering matrix dimension mapped to matrix transpose operation; “A transpose operation, for example, is used to “transpose” the rows and columns of a matrix, by rearranging the rows as columns and the columns as rows.  A transpose operation can be performed on a matrix processor, for example, by retrieving each row of a matrix from memory, and then storing each row back in memory as a column,” ¶ 0125; “A particular dimension shuffle operation may involve one or more non-transpose and/or transpose convolutional reads,” ¶ 0348];
a scatter hardware unit; and a gather hardware unit [“scatter/gather DMA is supported in hardware, and the DMA descriptors are generalized to support multiple configurations (e.g. ring buffers, linear buffer, etc.), allowing for different types of host driver optimization,” ¶ 0053].
Lau does not explicitly disclose but Jacob discloses a scatter hardware unit configured to place data to the shared memory at locations selected for an output data layout conversion; and a gather hardware unit configured to obtain input data from the shared memory from non-contiguous locations for an input data layout conversion [“performing one or more scatter gather operations by accessing a shared memory that is shared amongst multiple nodes interconnected through one or more networks, the shared memory comprising a coordination namespace that is shared amongst the multiple nodes, the operations comprising: gathering data from multiple processes at corresponding multiple nodes into a one or more locations in the coordination namespace, and creating one or more tuples having a same tuple name in the coordination namespace, wherein the one or more tuples have information referencing the gathered data in the one or more locations; or scattering data that has been gathered using the same tuple name to multiple processes participating in the coordination namespace, the scattering using the one or more tuples in the coordination namespace, the scattering performed from the one or more locations into other locations at one or multiple nodes for one or multiple processes at the corresponding one or multiple nodes; or performing both the gathering data and the scattering data,” cl. 1].
          It would have been obvious to one of ordinary skill in the art, having the teachings of Lau and Jacob available before the effective filing date of the claimed invention, to modify the capability of managing deep learning hardware as disclosed by Lau to include the capability of data scatter and gather as taught by Jacob, thereby providing a mechanism to enhance system efficiency by facilitating the optimization of data storage and access in the context of the available system resources.
Conclusion
30.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM C WOOD whose telephone number is (571)272-5285. The examiner can normally be reached Monday - Friday, 8:00 am - 4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat C Do can be reached on 571-272-3721. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WILLIAM C WOOD/Examiner, Art Unit 2193                                         


/Chat C Do/Supervisory Patent Examiner, Art Unit 2193