DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Request for Continued Examination filed on 05 December, 2022 and Applicant Amendment and Arguments filed on 18 November, 2022.
Claims 21, 24-27, 30-33 and 36-38 are pending in this application. Claims 1-20, 22-23, 28-29 and 34-35 were cancelled. 


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 18 November, 2022 has been entered.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 21, 24-27, 30-33 and 36-38 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Claim 21 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1, Statutory Category: Yes, the claim 21 is a method that recites a series of steps and therefore falls in the statutory category of a process.
Step 2A- Prong 1: Judicial Exception Recited: Yes, the claim recites: “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order”, “merging and sorting, the reloaded data elements” and “enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load”. As drafted, the claim as a whole recites a method including steps that could be performed in the human mind, but for the recitation of generic computing components. For example, a person can easily evaluating/determine/judging the values/sizes of the data elements, performing the sorting/reordering with different orders (i.e., descending order, reverse and ascending order) for the different data elements and grouping/combining/merging the reordered/sorted/reloaded data elements, and as well as evaluating/determine/judging if the applied sorting will help the computing processing (i.e., to enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load). Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). 
 Therefore, yes, the claim do recite judicial exceptions.
Step 2A- Prong 2: Integrated into a practical Application: No, this judicial exception is not integrated into a practical application. In particular, the claim recites additional limitations that “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are Insignificant Extra-Solution Activity (i.e., mere data storing; see MPEP §2106.05(g)). In addition, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “a first portion of the plurality of data elements loaded into register rows of a first half of the plurality of slabs of registers”, “a second portion of the plurality of data elements loaded into register rows of a second half of the plurality of slabs of registers”, “wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors” and “such as multiple processing tasks can be performed simultaneously” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). Moreover, the limitation of “such as multiple processing tasks can be performed simultaneously” is not actually performed yet. The combination of these additional elements is no more than mere instructions to apply the exception using the generic computer components (i.e., “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns” etc.,). Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to the abstract idea.
Step 2B: Claim provides an Inventive Concept: No. As discussed with respect to Step 2A prong Two, the additional elements “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “a first portion of the plurality of data elements loaded into register rows of a first half of the plurality of slabs of registers”, “a second portion of the plurality of data elements loaded into register rows of a second half of the plurality of slabs of registers”, “wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors” and “such as multiple processing tasks can be performed simultaneously” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). In addition, the limitation “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are Insignificant Extra-Solution Activity (i.e., mere data storing; see MPEP §2106.05(g)), which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the steps of “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” were considered to be extra-solution activity in Step 2A as insignificant pre-solution data gathering, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. The background of the example does not provide any indication that the steps of “loading”, “storing”, “reloading” and “storing” are anything other than a generic, off-the-shelf computer component, and the specification paragraph [0003] lines 1-2 specifically recites “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”.
Accordingly, a conclusion that the steps of “loading/reloading” and “storing” is well understood, routine, conventional activity is supported under Berkheimer option 1.
For these reasons, there is no inventive concept in the claim, and thus the claim is ineligible. 

Independent claims 27 (system claim) and 33 (non-transitory computer-readable medium claim) are rejected for the same reason as claim 21 above. Claim 33 further recites “A non-transitory computer-readable medium comprising instructions”. These additional elements are directed to generic computer components providing generic computer functions (see MPEP § 2106.05(b)). 

With respect to the dependent claim 24, the claim elaborates that wherein upon each processor in the group of parallel processors reloading the subset of data elements, performing by the group of parallel processors, a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers. (“performing by the group of parallel processors” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b). In addition, performing the “bitonic merge” is being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 25, the claim elaborates that wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors. (“number of the register columns” corresponds to “a number of processors” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b).).

With respect to the dependent claim 26, the claim elaborates that wherein the plurality of data elements are loaded into the plurality of slabs of registers in a transposed order (the data elements “loading/loaded” in “a transposed order” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b). In addition, “transposed order” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).).

Dependent claims 30-32 and 36-38 recite the same features as applied to claims 24-26 respectively above, therefore they are also rejected under the same rationale.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 21, 24-27, 30-33 and 36-38 are rejected under 35 U.S.C. 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
As per claims 21, 27 and 33 (line# refers to claim 21):
Lines 13-15, it recites “reloading…from the shared memory…a subset of data elements of each of the plurality of register rows stored in the shared memory”. It is uncertain whether the “subset of data elements” is referring to the data elements from the “sorted data elements” that is stored in the shared memory (as cited in line 12) or just any “subset of data elements” stored in the shared memory? For examining purpose, examiner will interpret the “subset of data elements” is referring to the data elements from the “sorted data elements” that is stored in the shared memory.

As per claims 24-26, 30-32 and 36-38:
They are method, system and non-transitory computer readable medium claims that depend on claims 21, 27 and 33 above. Therefore, they have the same deficiencies as claims 21, 27 and 33 above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 21, 24, 26-27, 30, 32-33, 36 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren et al. (US Pub. 2002/0198911 A1) in view of Nordquist (US Patent 7,489,315 B1) and further in view of Suryono (US Pub. 2012/0086591 A1), Sano (US Pub. 2012/0259714 A1) and Jatin Chhugani et al. (Efficient Implementation of Sorting on Multi-Core SMID CPU Architecture; hereafter Jatin).
Blomgren, Nordquist and Sano were cited in the previous Office Action.
Jatin was cited in the IDS filed on 05/13/2019.

As per claim 21, Blomgren teaches the invention substantially as claimed including A computer-implemented method for sorting data, the method comprising (Blomgren, [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0056] line 10, SIMD data parallel operation; Claim 1, lines 16-19,  simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix): 
loading, from a shared memory by a group of parallel processors, a plurality of data elements into a plurality of slabs of registers (Blomgren, Fig. 2, 16, 40-70 (as parallel processors); Fig. 5, 130 (including data elements), 140, 142, 144, 146 (as slabs of registers); [0012] lines 5-7, The matrix processor 16 comprises 16 processing elements 40-70 (as parallel processors); [0019] lines 1-2, Fig. 5 shows the results of loading 4 matric registers from memory (as shared memory since it is used by processing elements 40-70); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement (as sorting) operation), wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors (Blomgren, Fig. 5, 140 (including two dimensional array of registers, rows and columns), 142, 144, 146 (as slabs of registers); [0012] lines 5-8, The matrix processor 16 comprises 16 processing elements 40-70  where an individual processing element (PE) 80 comprises 16 PE register entries M0-M15. lines 13-16, An individual matrix register is a combination of register entries that includes an individual PE register entry from each PE register file from each individual processing element (as each slab of registers is associated with at least one parallel processor) in the matrix processor)); 
sorting, by the group of parallel processors, a first portion of the plurality of data elements loaded into register rows of the plurality of slabs of registers in a descending order, and a second portion of the plurality of data elements loaded into register rows of the plurality of slabs of registers in a second ordering process (Blomgren, Figs 6-9 (as including descending order and second ordering processing for sorting); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0047] lines 2-8, The block4 instruction is implemented on 4 contiguous matrix registers and exchanges row data between the four matrix registers 140, 142, 144, and 146 in 3 steps. In step 1, the block4 operation swaps matrix register 140, row 1 with matrix register 142, row 0; and matrix register 144, row 3 with matrix register 146, row 2. This to exchange is performed by simultaneously (as sorting first portion of the plurality of slabs of registers in descending order; also see Fig. 9;  (140, row 1, 142 row 1, 144 row 1 and 146 row 1); [See specs: [0100]: “As shown in Figure 16A, the first half of the processor groups (i.e., Processor Groups 1 and 2,) may add the data elements stored in their respective slabs in a descending order by register row to the shared memory 1680 (e.g., left to right starting from the top left)”]); [0048] lines 1-4, FIG. 7 shows step 2 of the block4 instruction where the block4 operation swaps matrix register 140, row 2 with matrix register 144, row 0; and matrix register 142, row 3 with matrix register 146, row 1. This is performed by simultaneously; [0049] lines 1-4, FIG. 8 shows step 3 of the block4 instruction where the block4 operation swaps matrix register 140, row 3 with matrix register 146, row 0; and matrix register 142, row 2 with matrix register 144, row 1. This is performed by simultaneously (as sorting second portion of the plurality of slabs of registers with a second ordering process); [0050] lines 1-2, FIG. 9 shows the final state of the matrix registers 140, 142, 144, and 146 at the end of the block4 operation; See Fig. 9, 140, 142, 144 and 146 [Examiner noted: the final state of matrix registers 140, 142, 144 and 146 are sorted. For example, matrix registers 140, row 2 is sorted from  (0,4 0,5 0,6 0,7) in Fig. 5 to (1,0, 1,1, 1,2 1,3) in Fig. 9]); 
storing, by the group of parallel processors, the sorted data elements in the shared memory (Blomgren, [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement operation; [0050] lines 3-12, the contents of the four matrix registers have been rearranged from four 1x16 vectors (at the beginning of the swapping) to four contiguous 4x4 sub-matrices, as illustrated by the 4x4 sub-matrices 160, 162, 164, and 166 in AxB matrix 120. Since all of the matrix data rearrangement is based upon swap operations…which is suitable for storing back to memory).

Blomgren fails to specifically teach when sorting a first portion of the plurality of data elements, it is from a first half of the plurality of slabs of registers, and when sorting a second portion of the plurality of data elements, it is from a second half of the plurality of slabs of registers, reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory; merging and sorting, by the group of parallel processors, the reloaded data elements; and storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory.

However, Nordquist teaches when sorting a first portion of the plurality of data elements, it is from a first half of the plurality of slabs of registers, and when sorting a second portion of the plurality of data elements, it is from a second half of the plurality of slabs of registers (Nordquist, Fig. 3, 320, 315, 310, 305 (as plurality of slabs of registers), 330, 335, 340, 345 including the data elements that is sorted from the 320, 315 310, and 305 (i.e., see 330: Q4, Q5, Q6, Q7, S0, S1, S2, S3; 335: S4, S5, S6, S7, T0, T1, T2, T3, 340: T4, T5, T6, T7, R0 R1 R2 R3; 345: R4, R5, R6, R7, Q0, Q1, Q2, Q3 (as second half, which is come from the half of plurality of slabs of registers in 320, 315, 310, 305 (see left side); same applies to first half from first half of plurality of slabs of registers; Col 7, lines 32-65, In FIG. 3 the S.sub.0, S.sub.1, . . . , S.sub.15, data from the first register file output 305 goes into crossbar 325 and is then reorganized (as sorted) and routed so that S.sub.0 through S.sub.3 is stored in the first row of the first RAM 330, S.sub.4 through S.sub.7 is stored in the second row of the second RAM 335, S.sub.8 through S.sub.B is stored in the third row of the third RAM 340, and S.sub.C through S.sub.F is stored in the fourth row of the fourth RAM 345. The T.sub.0, T.sub.1, . . . , T.sub.15, data from the second register file output 310 goes into crossbar 325 and is then reorganized and routed so that T.sub.0 through T.sub.3 is stored in the first row of the second RAM 335, T.sub.4 through T.sub.7 is stored in the second row of the third RAM 340, T.sub.8 through T.sub.B is stored in the third row of the fourth RAM 345, and T.sub.C through T.sub.F is stored in the fourth row of the first RAM 330. The R.sub.0, R.sub.1, . . . , R.sub.15, data from the third register file output 315 goes into crossbar 325 and is then reorganized and routed so that R.sub.0 through R.sub.3 is stored in the first row of the third RAM 340, R.sub.4 through R.sub.7 is stored in the second row of the fourth RAM 345, R.sub.8 through R.sub.B is stored in the third row of the first RAM 330, and R.sub.C through R.sub.F is stored in the fourth row of the second RAM 335. The Q.sub.0, Q.sub.1, . . . , Q.sub.15, data from the fourth register file output 320 goes into crossbar 325 and is then reorganized and routed so that Q.sub.0 through Q.sub.3 is stored in the first row of the fourth RAM 345, Q.sub.4 through Q.sub.7 is stored in the second row of the first RAM 330, Q.sub.8 through Q.sub.B is stored in the third row of the second RAM 335, and Q.sub.C through Q.sub.F is stored in the fourth row of the third RAM 340);
reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 3, 325, (330, 335, 340, 345, as whole as shared memory), 350, (370, 365, 360, 355, as respective slab of the plurality of slabs of registers); Col 20, line 6, provide an on-chip shared memory; Col 4, lines 55-56, Cores 205 and 210 can be SIMD processors (as group of parallel processors) which execute instructions for 16 threads in parallel; Col 8, lines 5-31, Avoiding bank conflicts can improve the performance of the system…The second crossbar outputs (as to reloading) the first transpose buffer output 355, the second transpose buffer output 360, the third transpose buffer output 365, and the fourth transpose buffer output 370. The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345…The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345. (as reloading from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory)); 
merging and sorting, by the group of parallel processors, the reloaded data elements (Nordquist, Fig. 1, 105 Multiprocessor; Col 8, lines 11-31, The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350...The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345, reorganizing the order (as merging and sorting) with the second crossbar 350…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350; Col 8, lines 33-49, the S, T, R, and Q texture coordinates in the first transpose buffer output 355, second transpose buffer output 360, third transpose buffer output 365, and fourth transpose buffer output 370 are arranged (as sorted) as quads because for each clock cycle all of the data for an entire quad is obtained…Therefore the transpose buffer has transposed the data format that originally required four clock cycles to get one entire quad into a data format; [Examiner noted: Fig. 3, 370, 365, 360 and 355 (as slab of the plurality of slabs of registers) are reorganized. For example, Fig. 3, 370 (SC SD SE SF) (TC TD TE TF) (RC RD RE RF) (QC QD QE QF) are reorganized in order, such that (SC SD SE SF) from 345 row 1 is combined/merged with (TC TD TE TF) from 330) within the 370); and 
storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 13a, 1305 Quad,1Fig. 3B, 1310 Render target memory; Pixel 1320-1323; Col 20, line 6, provide an on-chip shared memory; Col 8, lines 47-55, Therefore the transpose buffer has transposed the data format that originally required four clock cycles to get one entire quad into a data format…The advantage of having quads is that many of the other graphics modules such as the texture module 220 and the ROP module 225 use quads. Since most graphics modules are designed to process quads; Col 3, lines 63-65, FIG. 13B is an illustration showing pixels of the quad stored in a pitch format memory, in accordance with one embodiment of the present invention).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren with Nordquist because Nordquist’s teaching of sorting, reloading/outputting the data from the memory again to the registers/arrays for the purpose of reorganizing the order would have provided Blomgren’s system with the advantage and capability to allow the data elements to be transferred to a quad format in order to allow the particular module to processing the transferred data which improving the system efficiency and performance. 

Both Blomgren and Nordquist fail to specifically teach when sorting a first portion of a first half of the plurality of slabs of registers, it is in a descending order, and a second portion of a second half of the plurality of slabs of registers, it is in a reverse and ascending order.

However, Suryono teaches when sorting a first portion of a first half of the plurality of slabs of registers, it is in a descending order, and a second portion of a second half of the plurality of slabs of registers, it is in an ascending order (Suryono, [0042] lines 1-8, Now the problem is reduced from sorting into the problem of splitting. To correctly split the input values into larger-value group and smaller-value group, the input values are first split and then the first half is sorted ascending and the second half is sorted descending. This will create a sequence of bitonic descending values. Then a bitonic split may be done on this sequence which will yield desired larger-value group and smaller-value group).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren and Nordquist with Suryono because Suryono’s teaching of sorting the half of data elements in descending order and sorting another half of data elements in ascending order would have provided Blomgren and Nordquist’s system with the advantage and capability to allow the system to split the different data elements into larger-value group and smaller-value group which allow the system to easily processing the data elements in order to improve the system efficiency and performance. 

Blomgren, Nordquist and Suryono fail to specifically teach when sorting, the ascending order is a reverse and ascending order.

However, Sano teaches the second ordering process is a reverse and ascending order (Sano, Fig. 4; [0030] lines 9-13, the four key buttons 50-13 to 50-16 in the fourth row (the bottom row) are orderly numbered, from right to left, 13, 14, 15 and 16, so that the numbers added for the key buttons in each row are ascended in reverse direction [See specs [0100], add the data elements stored in their respective slabs a reverse and ascending order by register row (e.g., right to left starting from the bottom right)]).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren, Nordquist and Suryono with Sano because Sano’s teaching of sorting the data in each row with ascended and reverse direction would have provided Blomgren, Nordquist and Suryono’s system with the advantage and capability to processing the operations more efficiently which improving the system performance. 

Blomgren, Nordquist, Suryono and Sano fail to specifically teach wherein the computer-implemented method is adapted to enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load such that multiple processing tasks cam be performed simultaneously.

However, Jatin teaches wherein the computer-implemented method is adapted to enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load such that multiple processing tasks cam be performed simultaneously (Jatin, Page 1313, Abstract, lines 1-14, Sorting a list of input numbers is one of the most fundamental problems in the field of computer science in general and high-throughput database applications in particular. Although literature abounds with various flavors of sorting algorithms, different architectures call for customized implementations to achieve faster sorting times… sorts 64 million floating point numbers in less than 0.5 seconds on a commodity 4-core Intel processor. This measured performance compares favorably with all previously published results; Page 1314, right column, lines 14-24,  efficient implementation of sorting on the latest processors depends heavily on careful tuning of the algorithm and the code. First, although SIMD has been shown as an efficient way to achieve good power/performance (as reducing energy consumption); lines 38-42, We present the fastest sorting performance (as enhance speed and efficiency) for modern computer architectures. Our algorithm also avoids the expensive unaligned load/store operations (as reducing memory load); Page 1314, Right column, 3.1 ILP, lines 1-4, First, modern processors with super-scalar architecture can execute multiple instructions simultaneously on different functional units. For example, on Intel Core 2 Duo processors, we can execute a min/max and a shuffle instruction on two separate units simultaneously).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren, Nordquist, Suryono and Sano with Jatin because Jatin’s teaching of sorting algorithms would have provided Blomgren, Nordquist, Suryono and Sano’s system with the advantage and capability to allow the system to achieve faster sorting times in order to avoid expensive unaligned load/store operations and improving the system performance and efficiency (see Jatin, Page 1313, Abstract, and right Col, lines 38-42).

As per claim 24, Blomgren, Nordquist, Suryono, Sano and Jatin teach the invention according to claim 21 above. Nordquist further teaches wherein upon each processor in the group of parallel processors reloading the subset of data elements, performing by the group of parallel processors, merge of each register column of the plurality of register columns in each of the plurality of slabs of registers (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 3, 370, 365, 360, 355; Col 8, lines 11-31, The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350...The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345, reorganizing the order (as merging and sorting) with the second crossbar 350…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350; [Examiner noted: Fig. 3, 370, 365, 360 and 355 (as slab of the plurality of slabs of registers) are reorganized. For example, Fig. 3, 370 (SC SD SE SF) (TC TD TE TF) (RC RD RE RF) (QC QD QE QF) (16 columns), (SC) from 345 column 1 is merged with (TC) from 330 within the 370)]).
In addition, Jatin further teaches a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers (Jatin, Page 1318, left column, Fig. 5, Bitonic merge network for merging sequences of length 16 elements each (4 of 4x4 matrix/slabs); Page 1314, left column, 2. Related work, paragraph 4, lines 10-13, GPUABiSort [9] was proposed, that is based on adaptive bitonic sort [2] and rearranges the data using bitonic trees to reduce the number of comparisons).

As per claim 26, Blomgren, Nordquist, Suryono, Sano and Jatin teach the invention according to claim 21 above. Blomgren further teaches wherein the plurality of data elements are loaded into the plurality of slabs of registers in a transposed order (Blomgren, [0019] lines 1-2, Fig. 5 shows the results of loading 4 matric registers from memory; [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose…and block rearrangement operation; [0057] lines 1-8, FIG. 15 illustrates another way to view the block4 and block4v operations by illustrating that the instructions are swapping indices. If one takes the 4 matrix registers, 140, 142, 144, and 146, then one would have a 4x.4x4 array 139 of elements (register/index, row, column). Given that the block4, block4v , and transpose operations can be viewed as swapping two of the 3 indices, these operations would then produce the following results: block4: (register/index, row, column); (row, register/index, column), block4v: (register/index, row, column); (column, row, register/ index)); transpose: (register/index, row, column; (register/index, column, row).).

As per claim 27, 30 and 32, they are system claims of claims 21, 24 and 26 respectively above. Therefore, they are rejected for the same reason as claims 21, 24 and 26 respectively above.

As per claim 33, 36 and 38, they are non-transitory computer readable medium claims of claims 21, 24 and 26 respectively above. Therefore, they are rejected for the same reason as claims 21, 24 and 26 respectively above.


Claims 25, 31 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren, Nordquist, Suryono, Sano and Jatin, as applied to claim 21 above, and further in view of Lin et al. (US Pub. 2006/0126726 A1).
Lin was cited in the previous Office Action.

As per claim 25, Blomgren, Nordquist, Suryono, Sano and Jatin teach the invention according to claim 21 above. Blomgren, Nordquist, Suryono, Sano and Jatin fail to specifically teach wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors.

	However, Lin teaches wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors (Lin, Fig. 4, Columns 0-7 (as number of register columns), processor 0-7 (each column corresponding to number of processors in the group of parallel processors); [0024] line 1, A parallel processing DSP structure).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren, Nordquist, Suryono, Sano and Jatin with Lin because Lin’s teaching of a number of the register columns corresponds to a number of processors would have provided Blomgren, Nordquist, Suryono, Sano and Jatin’s system with the advantage and capability to allow each processor to processing the data elements of each columns respectively which improving the system efficiency and power consumption (see Lin, [0024], efficiency and power consumption).

As per claim 31, it is a system claim of claim 25 above. Therefore, it is rejected for the same reason as claim 25 above.

As per claim 37, it is a non-transitory computer readable medium claim of claim 25 above. Therefore, it is rejected for the same reason as claim 25 above.


Response to Arguments
Applicant’s arguments with respect to claims 21, 24-27, 30-33 and 36-38 under 103 rejection have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

In the remark applicant’s argue in substance: 
(a). Applicant respectfully disagrees and submits that the Examiner is misquoting the claim (i.e., claim 21). Before being amended herein to address the Examiner's 112 rejection, claim 21 recited "sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order" and "merging and sorting, by the group of parallel processors, the reloaded data elements (emphasis added to show portions of claim 21 omitted from the Examiner's quote). Applicant respectfully submits that this step of sorting could not possibly be performed in the human mind, as alleged by the Examiner. How could a human possibly load data elements into the register rows of a first subset of the plurality of slabs of registers and into register rows of a second subset of the plurality of slabs of registers? Applicant respectfully submits that the claimed method cannot be performed using mental processes that can be performed in the human mind.

 (b), Applicant respectfully disagrees and submits that selecting only certain words and making a generalized statement about mental processes shows that the entire basis for the rejection is fundamentally incorrect, and fails to comport with the requirements for a proper § 101 rejection.

(c), Claim 21, as amended, expressly recites technically detailed features that cannot properly be considered to be done "in the mind", such as "loading, from a shared memory by a group of parallel processors, a plurality of data elements into a plurality of slabs of registers, wherein each slab of registers includes a two-dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors", "sorting, by the group of parallel processors, a first portion of the plurality of data elements loaded into register rows of a first half of the plurality of slabs of registers in a descending order, and a second portion of the plurality of data elements loaded into register rows of a second half of the plurality of slabs of registers in a reverse and ascending order", "storing, by the group of parallel processors, the sorted data elements in the shared memory", "reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory", "merging and sorting, by the group of parallel processors, the reloaded data elements"; and "storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory".
	
(d), While the Examiner contends that the claimed features "could be performed in the human mind" (Office Action, p.3), this formulaic statement simply ignores the requirements (e.g., loading "a plurality of data elements into a plurality of slabs of registers" and sorting "a first portion of the plurality of data elements loaded into register rows of a first half of the plurality of slabs of registers in a descending order, and a second portion of the plurality of data elements loaded into register rows of a second half of the plurality of slabs of registers in a reverse and ascending order") without articulating how it could be possible to do this "in the mind" when the technology necessarily requires computing components to be carried out.

(e), Applicant submits that the rejection as presented fails to follow USPTO guidelines and the caselaw regarding § 101. First, "Claims do not recite a mental process when they do not contain limitations that can practically be performed in the human mind, for instance when the human mind is not equipped to perform the claim limitations" (MPEP § 2106.04(a)(2)(III)(A)). The MPEP provides numerous relevant examples of computer-related operations that cannot be considered "mental processes", including "a claim to a specific data encryption method for computer communication involving a several-step manipulation of data", and Applicant notes that such "Mental Processes" (see Office Action at pages 3-4) are irrelevant to the claimed technology, and there is in fact no mention in such examples of specific computing approaches that meaningfully relate to the pending claims.

(f), The prong #2 (step 2A) conclusory assertion in the Office Action is devoid of any actual analysis of the claimed features as presented. As explained by the Supreme Court in the context of an obviousness analysis, conclusory statement by examiners cannot be the basis for a statutory rejection. 

(g), Applicant submits that a detailed review pursuant to Prong #2 of Step 2A unquestionably demonstrates that the claimed features incorporate a practical application. When looking at the claimed limitations as an ordered combination in the manner set forth by the USPTO's guidelines, the invention as a whole amounts to significantly more than the alleged abstract idea, and in fact is integrated into a practical application. As noted in the application, parallelized processing offers significantly faster more efficient sorting than offered by current technology, thereby improving the functioning of computing devices, whereby high performance can be achieved on large, bandwidth-rich data-parallel devices, high energy efficiency can be achieved by minimizing off-chip memory load and stores such that the system's CPU(s) may be free to perform other processing tasks simultaneously (Specification 0043, emphasis added).

(h), For instance, when the examiner has concluded that certain claim elements recite well understood, routine, conventional activities in the relevant field, the examiner must expressly support the rejection in writing with one of the four options specified in Subsection III. The rejection fails to comply with these requirements and the guidance from the MPEP. 

(i), As should be clear from the entire discussion above and the application as filed, there cannot be sorting data elements using slabs of registers without the corresponding elements of the shared memory and a group of parallel processors. Thus, the technology as claimed is tied to a particular machine that implements the steps of the method. See MPEP § 2106.05(b).

(j), Finally, as noted above, the claimed approaches involve a particular transformation. See MPEP § 2106.05(c). Here, by way of example, the transformation involves sorting data elements such that the order of data elements stored in the shared memory is altered to allow for effective and efficient data-parallel computing. 
When fully analyzed, the additional elements thus yield claims as a whole that amount to significantly more than the purported abstract concept, and the rejection should be withdrawn for at least this reason. 

Examiner respectfully disagreed with Applicant’s argument for the following reasons:
As to point (a), in response to Applicant’s argument “misquoting” the claim. Examiner would like to point out that under 2019 PEG, step 2A-Prong 1, the abstract idea of the claim is identified (i.e., Judicial Exception Recited?). Here, the claim 21 recites the limitations of “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” and “enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load” are identified as abstract idea. Because, a person can easily evaluating/determine/judging the values of the data elements (i.e., size, Number, Larger or Smaller?), performing the sorting/reordering based on that values of the data elements with different orders (descending order, reverse and ascending order) and grouping/combining/merging the reordered/sorted/reloaded data elements, and as well as evaluating/determine/judging if the applied sorting will help the computing processing (i.e., to enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load). Therefore, yes, the claim do recite judicial exceptions.

In addition, in response to the Applicant ‘s argument that “step of sorting could not possibly be performed in the human mind, as alleged by the Examiner. How could a human possibly load data elements into the register rows of a first subset of the plurality of slabs of registers and into register rows of a second subset of the plurality of slabs of registers? Applicant respectfully submits that the claimed method cannot be performed using mental processes that can be performed in the human mind”

Examiner respectfully disagree. As indicated in the 101 rejection. This cited additional limitation has been treated under Step 2A- Prong 2 and Step 2B under 101 rejection”.  Examiner would like direct Applicant to Step 2A- Prong 2 and Step 2B under 101 rejection. These additional elements are further analyzed. For example, “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are Insignificant Extra-Solution Activity (i.e., mere data storing; see MPEP §2106.05(g)) which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). Examiner specifically provided an evidence from specification paragraph [0003] lines 1-2, “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”. That is, the “loading”, “storing”, “reloading” and “storing” are additionally well understood, routine, conventional activity that is for data operations. And this is supported under Berkheimer option 1. (see MPEP § 2106.05(d)).

As to points (b) and (e), Again, Examiner would like to direct Applicant to 2019 PEG, step 2A-Prong 1, the abstract idea of the claim is identified (i.e., Judicial Exception Recited?). And examiner has correctly identified such limitations of “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” and “enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load” are identified as abstract idea. Because, a person can easily evaluating/determine/judging the values of the data elements (i.e., size, Number, Larger or Smaller?), performing the sorting/reordering based on that values of the data elements with different orders (descending order, reverse and ascending order) and grouping/combining/merging the reordered/sorted/reloaded data elements, and as well as evaluating/determine/judging if the applied sorting will help the computing processing (i.e., to enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load). Therefore, yes, the claim do recite judicial exceptions. See point (a) above.
	
As to points (c-d), Examiner would like to point out that Applicant is reciting all the limitations from claim 21 and indicating that cannot be performed by human mind. In fact, the 101 rejection is evaluated based on different steps. Again, Examiner has evaluated all the additional limitations cited in the claim 21 under Step 2A- Prong 2 and Step 2B of 101 rejection above (see point (a) above).

As to point (f), Examiner would like to point out that the additional elements are clearly analyzed under step 2A and 2B. For instance, the additional elements “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “a first portion of the plurality of data elements loaded into register rows of a first half of the plurality of slabs of registers”, “a second portion of the plurality of data elements loaded into register rows of a second half of the plurality of slabs of registers”, “wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors” and “such as multiple processing tasks can be performed simultaneously” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). In addition, the limitation “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are Insignificant Extra-Solution Activity (i.e., mere data storing; see MPEP §2106.05(g)), which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the steps of “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” were considered to be extra-solution activity in Step 2A as insignificant pre-solution data gathering, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. 
Under 2019 PEG, In a step 2B analysis, an additional element (or combination of elements) is not well-understood, routine or conventional unless the examiner finds, and expressly supports a rejection in writing with, one or more of the following four options:
Option 1 – Statement(s) by Applicant
	Option 2 – Court Decisions in MPEP § 2106.05(d)(II)
Option 3 – Publication(s)
Option 4 – Official Notice
Examiner specifically citing the specification paragraph [0003] to provide support for rejection. For example, the background of the example does not provide any indication that the steps of “loading”, “storing”, “reloading” and “storing” are anything other than a generic, off-the-shelf computer component, and the specification paragraph [0003] lines 1-2 specifically recites “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”.
Accordingly, a conclusion that the steps of “loading/reloading” and “storing” is well understood, routine, conventional activity is supported under Berkheimer option 1.
Option 1 – Statement(s) by Applicant
An explanation based on an express statement in the specification (e.g., citation to a relevant portion of the specification) that demonstrates the well-understood, routine, conventional nature of the additional element(s) 
A specification demonstrates the well-understood, routine, conventional nature of additional elements when it describes the additional element(s) as conventional (or an equivalent term); as a commercially available product; or, in a way that shows the element is widely prevalent or in common use.
Therefore, Applicant’s argument has not been found to be persuasive.

As to point (g), Applicant attempts to allege claimed features incorporate a practical application by relying upon the specification paragraph [0043] (i.e., high performance can be achieved on large, bandwidth-rich data-parallel devices, high energy efficiency can be achieved by minimizing off-chip memory load and stores such that the system's CPU(s) may be free to perform other processing tasks simultaneously). However, Examiner would like to remind Applicant that this limitation also can be easily evaluated/determined/judged by mentally. For example, a person can easily evaluate/determine/judge if the applied sorting will help the computing processing (i.e., to enhance speed and efficiency of the sorting of the data elements while reducing energy consumption and memory load). Therefore, the claimed limitations ““sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as being abstract idea (i.e., Mental processes). 

As to point (h), Examiner has provided specific support for additional limitation by citing specification paragraph [0003] to provide the support for the rejection. The background of the example does not provide any indication that the steps of “loading”, “storing”, “reloading” and “storing” are anything other than steps that performed by a generic, off-the-shelf computer component. In particular, the specification paragraph [0003] lines 1-2 recites “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”. Therefore, it is clear that the steps of “loading/reloading” and “storing” are well understood, routine, conventional activities that are supported under Berkheimer option 1.
Option 1 – Statement(s) by Applicant
An explanation based on an express statement in the specification (e.g., citation to a relevant portion of the specification) that demonstrates the well-understood, routine, conventional nature of the additional element(s) 
A specification demonstrates the well-understood, routine, conventional nature of additional elements when it describes the additional element(s) as conventional (or an equivalent term); as a commercially available product; or, in a way that shows the element is widely prevalent or in common use.
Therefore, Applicant’s argument has not been found to be persuasive.

As to point (i), Examiner would like to point out that the additional limitation has been evaluated under step 2A and 2B under 101 rejection (see point (a) and 101 rejection above). 

As to point (j), Examiner would like direct applicant to MPEP § 2106.05(a), it indicated that Examiner must evaluate "the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102-03; DDR Holdings, 773 F.3d at 1259, 113 USPQ2d at 1107" MPEP § 2106.05(a). Here in claim 21, it is not enough to just state that “enhance the speed and efficiency,  energy consumption and memory load is reduced”, but actually describe the particular way how that is achieved. In addition, the limitation of “such as multiple processing tasks can be performed simultaneously” is not actually performed yet. See 101 rejection above. Therefore, Applicant’s argument has not been found to be persuasive. 
For the reasons above, Applicant’s argument has not been found to be persuasive, and therefore the rejections are maintained. 


  Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        




/Z.X./Examiner, Art Unit 2195