DETAILED ACTION
Claims 1-21 are pending in this application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 16-21 are objected to because of the following informalities: 
Claim 16 (line 1) seems to include typographical error. Specifically, “the microcomputer” may have been used in error. 
Appropriate correction is required, for instance, “the microcomputers”.
	Claims 17-21 are objected to for the same reason as claim 16 above.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
Claims 16-21 do not invoke 35 U.S.C. 112(f) because it recites defined or sufficient structure as described in the specification.
Claim 16 recites “…the microprocessor configured to..." and its functional languages and therefore meets two of the three prong analysis. 
However, claim 16 recites sufficiently definite structure because the structures (“…the microprocessor configured to...") are described in the specification (Processors 1 206-1-4) as structures for performing the respective functions and as such are not generic placeholder, (for instance “means to”, "means for", “module for" and the like) and therefore does not meet the third prong analysis and are presumed not to invoke 35 U.S.C. 112(f).

Claims 17-21 are presumed not to invoke 35 U.S.C. 112(f) for the same reason as claim 16 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 6, 8-10, 12, 14, 16, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. 2020/0274733 A1 to Graham et al. in view of U.S. Pub. No. 2009/0063529 A1 to Gustavson et al.

As to claim 1, Graham teaches a method comprising: 
at a first computing task of a plurality of computing tasks (Process 0/1/2/3) executing at a plurality of processors (a plurality of CPUs (comprising CPU 1, CPU 2, and CPU N)) (“...Exemplary methods of operation of the system of FIG. 1A are described below. In FIG. 1A, by way of non-limiting example, a plurality of CPUs (comprising CPU 1, CPU 2, and CPU N), interconnected in a system-on-chip are shown running the plurality of processes 120. Other system examples, by way of non-limiting example, include: a single CPU; a plurality of systems or servers connected by a network; or any other appropriate system. As described above, the concept of operations on processes, as described herein, is decoupled from any particular hardware infrastructure, although it is appreciated that in any actual implementation some hardware infrastructure (as shown in FIG. 1A or otherwise as described above) would be used....” paragraph 0029): 
determining submatrix data to send to a second computing task of the plurality of computing tasks (“...FIG. 8 is a simplified pictorial illustration depicting all-to-all submatrix distribution, in accordance with another exemplary embodiment of the present invention...FIG. 9 is a simplified pictorial illustration depicting transposition of a sub-block, in accordance with exemplary embodiments of the present invention...” paragraphs 0025/0026); 
dividing the submatrix data in-place into a first plurality of submatrix data blocks (“...The collective operation performs a transpose of the data blocks...3. To transpose the distributed matrix, the matrix is subdivided into rectangular submatrices, of dimension d.sub.h×d.sub.v, where d.sub.h is the size in the horizontal dimension and d.sub.v is the size in the vertical dimension. Subblocks need not be logically contiguous. The submatrices may be predefined, or may be determined at run-time based on some criteria, such as, by way of non-limiting example, an order of entry into the all-to-all operation...” paragraphs 0058/0059/0061); 
using a message passing interface application programming interface to send each data block of the first plurality of submatrix data blocks to the second computing task (“...The all-to-all operation, defined in communication standards such as the Message Passing Interface (MPI) (Forum, 2015), is a collective data operation in which each process sends data to every other process in the collective group, and receives the same amount of data from each process in the group. The data sent to each process is of the same length, a, and is unique, originating from distinct memory locations. In communications standards such as MPI, the concept of operations on processes is decoupled from any particular hardware infrastructure. A collective group, as discussed herein, refers to a group of processes over which a (collective) operation is defined. In the MPI specification a collective group is called a “communicator”, while in OpenSHMEM (see, for example, www.openshmem.org/site/) a collective group is called a “team”...” paragraph 0027); and 
using a message passing interface application programming interface to receive a second plurality of submatrix data blocks from the second computing task (“...The all-to-all operation, defined in communication standards such as the Message Passing Interface (MPI) (Forum, 2015), is a collective data operation in which each process sends data to every other process in the collective group, and receives the same amount of data from each process in the group. The data sent to each process is of the same length, a, and is unique, originating from distinct memory locations. In communications standards such as MPI, the concept of operations on processes is decoupled from any particular hardware infrastructure. A collective group, as discussed herein, refers to a group of processes over which a (collective) operation is defined. In the MPI specification a collective group is called a “communicator”, while in OpenSHMEM (see, for example, www.openshmem.org/site/) a collective group is called a “team”...” paragraph 0027).
Graham is silent with reference to overwriting in-place submatrix data corresponding to one or more submatrix data blocks of the first plurality of submatrix data blocks with submatrix data corresponding to one or more submatrix data blocks of the second plurality of submatrix data blocks received from the second computing task after using the message passing interface application programming interface to send one or more submatrix data blocks of the first plurality of the submatrix data blocks to the second computing task.  
Gustavson teaches overwriting (replaced) in-place submatrix data corresponding to one or more submatrix data blocks of the first plurality of submatrix data blocks (Sub-Matrix A1) with submatrix data corresponding to one or more submatrix data blocks of the second plurality of submatrix data blocks received (Contents of T(A1) after sending one or more submatrix data blocks of the first plurality of the submatrix data blocks (“...The present invention generally relates to improving efficiency of in-place data transformations such as a matrix transposition. More specifically, part of the data to be transformed is pre-arranged, if necessary, to first be contiguously arranged in memory as contiguous blocks of contiguous data, which data is then available to be retrieved from memory into cache in units of the blocks of contiguous data, for application of a transformation on the data such as a matrix transposition, and then replaced in the same memory space... In a first exemplary aspect of the present invention, described herein is a computerized method for an in-place transformation of matrix data, including, for a matrix A stored in one of a standard full format or a packed format and a transformation T having a compact representation, choosing blocking parameters MB and NB based on a cache size; working on a sub-matrix A1 of A, A1 having size M1=m*MB by N1=n*NB and saving any of a residual remainder of A in a buffer B, the sub-matrix being worked on as follows: contiguously moving and contiguously transforming A1 in-place into a New Data Structure (NDS), applying the transformation T in units of MB*NB contiguous double words to the NDS format of A1, thereby replacing A1 with the contents of T(A1), moving and transforming NDS T(A1) to standard data format T(A1) with holes for the remainder of A in buffer B, and contiguously copying buffer B into the holes of A2, thereby providing in-place transformed matrix T(A)...” paragraphs 00007/0018/0022).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Graham with the teaching of Gustavson because the teaching of Gustavson would improve the system of Graham by providing an in-place transformation of effectively and optimally managing memory utilization.

As to claim 2, Graham teaches the method of claim 1, wherein: each submatrix data block of the first plurality of submatrix data blocks has up to at most a predetermined size; and each submatrix data block of the second plurality of submatrix data blocks has up to at most the predetermined size (“...The all-to-all-v/w operation is used for each process to exchange unique data with every other process in the group of processes participating in this collective operation. The size of data exchanged between two given processes may be asymmetric, and each pair may have a different data pattern than other pairs, with large variations in the data sizes being exchanged. A given rank need only have local, API-level information on the data exchanges in which it participates...” paragraph 0036).  

As to claim 4, Graham teaches the method of claim 1, further comprising: at the first computing task: using a message passing interface application programming interface to send each submatrix data block of the first plurality of submatrix data blocks in a separate message passing interface message (is unique) (“...The all-to-all operation, defined in communication standards such as the Message Passing Interface (MPI) (Forum, 2015), is a collective data operation in which each process sends data to every other process in the collective group, and receives the same amount of data from each process in the group. The data sent to each process is of the same length, a, and is unique, originating from distinct memory locations. In communications standards such as MPI, the concept of operations on processes is decoupled from any particular hardware infrastructure. A collective group, as discussed herein, refers to a group of processes over which a (collective) operation is defined. In the MPI specification a collective group is called a “communicator”, while in OpenSHMEM (see, for example, www.openshmem.org/site/) a collective group is called a “team”...” paragraph 0027).  

As to claim 6, Gustavson teaches the method of claim 1, further comprising: at the first computing task: overwriting in-place the first plurality of submatrix data blocks sent with the second plurality of submatrix data blocks received (“...The present invention generally relates to improving efficiency of in-place data transformations such as a matrix transposition. More specifically, part of the data to be transformed is pre-arranged, if necessary, to first be contiguously arranged in memory as contiguous blocks of contiguous data, which data is then available to be retrieved from memory into cache in units of the blocks of contiguous data, for application of a transformation on the data such as a matrix transposition, and then replaced in the same memory space... In a first exemplary aspect of the present invention, described herein is a computerized method for an in-place transformation of matrix data, including, for a matrix A stored in one of a standard full format or a packed format and a transformation T having a compact representation, choosing blocking parameters MB and NB based on a cache size; working on a sub-matrix A1 of A, A1 having size M1=m*MB by N1=n*NB and saving any of a residual remainder of A in a buffer B, the sub-matrix being worked on as follows: contiguously moving and contiguously transforming A1 in-place into a New Data Structure (NDS), applying the transformation T in units of MB*NB contiguous double words to the NDS format of A1, thereby replacing A1 with the contents of T(A1), moving and transforming NDS T(A1) to standard data format T(A1) with holes for the remainder of A in buffer B, and contiguously copying buffer B into the holes of A2, thereby providing in-place transformed matrix T(A)...” paragraphs 00007/0018/0022).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Graham with the teaching of Gustavson because the teaching of Gustavson would improve the system of Graham by providing an in-place transformation of effectively and optimally managing memory utilization.

As to claim 8, Graham teaches the method of claim 1, further comprising: at the first computing task: using a message passing interface application programming interface to send all of the first plurality of submatrix data blocks to the second computing task before using the message passing interface application programming interface to receive any of the second plurality of submatrix data blocks (“...The all-to-all operation, defined in communication standards such as the Message Passing Interface (MPI) (Forum, 2015), is a collective data operation in which each process sends data to every other process in the collective group, and receives the same amount of data from each process in the group. The data sent to each process is of the same length, a, and is unique, originating from distinct memory locations. In communications standards such as MPI, the concept of operations on processes is decoupled from any particular hardware infrastructure. A collective group, as discussed herein, refers to a group of processes over which a (collective) operation is defined. In the MPI specification a collective group is called a “communicator”, while in OpenSHMEM (see, for example, www.openshmem.org/site/) a collective group is called a “team”...” paragraph 0027).  

As to claim 9, Graham teaches the method of claim 1, further comprising at the first computing task: using a message passing interface application programming interface to send one submatrix data block to each other computing task of the plurality of computing tasks before using a message passing interface application programming interface to receive one submatrix data block from each other computing task of the plurality of computing tasks (“...This data may be viewed as a sub-matrix of the distributed matrix. A single aggregator may handle multiple blocks of the submatrix from a single individualized all-to-all or all-to-all-v/w algorithm. [0049] 3. The sub-blocks may be formed by discontinuous groups of processes, which are in certain exemplary embodiments formed on-the-fly to handle load imbalance in the calling application. In such a case, the matrix sub-blocks may be non-contiguous. [0050] 4. The term “aggregator” is used herein to refer to an entity which aggregates a sub-matrix, transposes the same, and then says results to their final destination. In certain exemplary embodiments of the present invention, the aggregator is a logic block within an HCA. Then the present step 4 may comprise having the aggregator: [0051] a. Gather data from all the sources [0052] b. Shuffle the data to prepare so that data destined to a specific process may be sent in as a single message to this destination. T...” paragraph 0046).  

 	As to claims 10 and 16, see the rejection of claim 1 above, expect for one or more non-transitory storage media and a plurality of microprocessors.
	Graham teaches one or more non-transitory storage media and a plurality of microprocessors (“...Exemplary methods of operation of the system of FIG. 1A are described below. In FIG. 1A, by way of non-limiting example, a plurality of CPUs (comprising CPU 1, CPU 2, and CPU N), interconnected in a system-on-chip are shown running the plurality of processes 120. Other system examples, by way of non-limiting example, include: a single CPU; a plurality of systems or servers connected by a network; or any other appropriate system. As described above, the concept of operations on processes, as described herein, is decoupled from any particular hardware infrastructure, although it is appreciated that in any actual implementation some hardware infrastructure (as shown in FIG. 1A or otherwise as described above) would be used...It is appreciated that software components of the present invention may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It is further appreciated that the software components may be instantiated, for example: as a computer program product or on a tangible medium. In some cases, it may be possible to instantiate the software components as a signal interpretable by an appropriate computer, although such an instantiation may be excluded in certain embodiments of the present invention...” paragraph 0029).

	As to claims 12 and 18, see the rejection of claim 4 above.

	As to claims 14 and 20, see the rejection of claim 6 above.

Claims 3, 11 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. 2020/0274733 A1 to Graham et al. in view of U.S. Pub. No. 2009/0063529 A1 to Gustavson et al. as applied to claims 2, 10 and 16, above, and further in view of U.S. Pub. No. 2001/0052119 A1 to Inaba.

As to claim 3, Graham as modified by Gustavson teaches the method of claim 2, however it is silent with reference to at the first computing task: determining the predetermined size based on a type of a message passing interface network interconnect between the plurality of computing tasks.  
Inaba teaches at the first computing task: determining the predetermined size based on a type of a message passing interface network interconnect between the plurality of computing tasks (“...According to another embodiment of the present invention, an argument relating to the communication data size of the invoked MPI procedure is checked, and if the communication data size can be determined during compiling, an object program is outputted such that the optimum procedure according to the communication data size is invoked...According to this embodiment, when the communication size is found during compiling, a reduction of the execution time can be realized by altering such that an MPI routine is invoked that uses the optimum protocol according to the size... It is assumed that, a procedure is employed that uses protocol A when the communication data size is less than, for example, 80 bytes and protocol B when the communication data size is equal to or greater than 80 bytes. Source program 10 is described in Fortran, and an MPI procedure is invoked as shown in FIG. 7. In this case, the communication data size can be determined during compiling, and optimization is performed such that, if "MPI_REAL" is 4 bytes, the first line having a communication data size of 40 bytes uses protocol A and the second line having a communication data size of 100 bytes uses protocol B. The result of optimization is equivalent to a case of applying source program 10 that includes the statement shown in FIG. 9. In this case, "MPI_Send2" is a procedure that uses protocol A, and "MPI_SEND" is a procedure that uses protocol B...” paragraphs 0014/0015/0049).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Graham and Gustavson with the teaching of Inaba because the teaching of Inaba would improve the system of Graham and Gustavson by providing a technique for managing or reducing the latency of communicating messages.

As to claims 11 and 17, see the rejection of claims 2 and 3 above.

Claims 5, 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. 2020/0274733 A1 to Graham et al. in view of U.S. Pub. No. 2009/0063529 A1 to Gustavson et al. as applied to claims 1, 10 and 16, above, and further in view of U.S. Pub. No. 2017/0262410 A1 to Usui.

As to claim 5, Graham as modified by Gustavson teaches the method of claim 1, however it is silent with reference to at the first computing task: using in-place each submatrix data block of the first plurality of submatrix data blocks as a send buffer for a message passing interface send operation to send the submatrix data block to the second computing task.  
Usui teaches at the first computing task: using in-place (in-place)each submatrix data block of the first plurality of submatrix data blocks as a send buffer for a message passing interface send operation to send the submatrix data block to the second computing task (“...Namely, assuming that an individual local array is divided into small cuboids for overlapping of a communication and an operation, buffer areas having at least approximately four segment sizes are needed (two transmission buffers and two reception buffers for overlapping processing) (if in-place processing is performed without omitting a communication for restoring the original data arrangement)... FIG. 12 illustrates an example of data (second segment) handled by an individual process. Assuming that the segmentation illustrated in FIGS. 11 and 12 is performed, buffer areas having at least approximately four segment sizes are needed for an FFT calculation (two transmission buffers and two reception buffers for overlapping processing) (if in-place processing is performed without omitting a communication for restoring the original data arrangement)...” paragraphs 0108/0114).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Graham and Gustavson with the teaching of Inaba because the teaching of Inaba would improve the system of Graham and Gustavson by providing a buffers for temporarily storing data for later retrieval. 

As to claims 13 and 19, see the rejection of claim 5 above.

Allowable Subject Matter
Claims 7, 15 and 21 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
U.S. Pub. No. 2006/0190517 A1 to Guerrero and directed to a process for transposing items of media information and stores the transposed items in the memory.
U.S. Pub. No. 2009/0063607 A1 to Gustavson et al. and directed to a method and structure for an in-place transformation of matrix data.
U.S. Pub. No. 2020/0409664 A1 to Li et al. and directed to a MPI communication protocol for exchanging of messages among processes in high-performance computing (HPC) systems.
U.S. Pub. No. 2016/0057068 A1 to Arakawa et al. and directed to a system and method for transmitting data embedded into control information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES E ANYA whose telephone number is (571)272-3757. The examiner can normally be reached Mon-Fir. 9-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SOUGH HYUNG can be reached on 571-272-6799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES E ANYA/Primary Examiner, Art Unit 2194