DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Request for Continued Examination and Applicant Amendment and Arguments filed on 23 November, 2021.
Claims 21-38 are pending for examination. Claims 1-20 were cancelled. 


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 23 November, 2021 has been entered.


Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 


Claim Rejections - 35 USC § 101

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 21-38 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Claim 21 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1, Statutory Category: Yes, the claim 21 is a method that recites a series of steps and therefore falls in the statutory category of a process.
Step 2A- Prong 1: Judicial Exception Recited: Yes, the claim recites: “sorting, the data elements, in accordance with a first ordering process and the data elements in accordance with a second ordering process” and “merging and sorting, the reloaded data elements”. As drafted, the claim as a whole recites a method including steps that could be performed in the human mind, but for the recitation of generic computing components. For example, a person can easily evaluating/determine/judging the values of the data elements, performing the sorting/reordering with different orders (first ordering and second ordering) for the different data elements and grouping/combining/merging the reordered/sorted/reloaded data elements. Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). 
 Therefore, yes, the claim do recite judicial exceptions.
Step 2A- Prong 2: Integrated into a practical Application: No, this judicial exception is not integrated into a practical application. In particular, the claim recites additional limitations that “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)). In addition, “a data-parallel computing device”, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “the data elements loaded into the register rows of a first subset of the plurality of slabs of registers” and “wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). The combination of these additional elements is no more than mere instructions to apply the exception using the generic computer components (i.e., “a data-parallel computing device”, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns”). Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application 
Step 2B: Claim provides an Inventive Concept: No. As discussed with respect to Step 2A prong Two, the additional elements “a data-parallel computing device”, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). In addition, the limitation “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)), which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the steps of “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” were considered to be extra-solution activity in Step 2A as insignificant pre-solution data gathering, and thus it is re-evaluated in Step 2B to determine if it is more than what is loaded or stored”.
Accordingly, a conclusion that the steps of “loading/reloading” and “storing” is well understood, routine, conventional activity is supported under Berkheimer option 1.
For these reasons, there is no inventive concept in the claim, and thus the claim is ineligible. 

Independent claims 27 (system claim) and 33 (non-transitory computer-readable medium claim) are rejected for the same reason as claim 21 above. Claim 33 further recites “A non-transitory computer-readable medium comprising instructions”. These additional elements are directed to generic computer components providing generic computer functions (see MPEP § 2106.05(b)). 

With respect to the dependent claim 22, the claim elaborates that wherein the first ordering process is used to sort the data elements loaded into the register rows of the first subset of the plurality of slabs of registers in a descending order. (“sorting” with the “first ordering process” in “a descending order” is being treated as part of abstract 

With respect to the dependent claim 23, the claim elaborates that wherein the second ordering process is used to sort the data elements loaded into the register rows of the second subset of the plurality of slabs of registers in a reverse and ascending order (“sorting” with the “second ordering process” in “reverse and ascending order” is being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 24, the claim elaborates that wherein upon each of the processors in the group of parallel processors reloading the subset of data elements, performing by the group of parallel processors, a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers. (“performing by the group of parallel processors” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b). In addition, performing the “bitonic merge” is being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 25, the claim elaborates that wherein a number of the register columns of the two- dimensional array of registers corresponds 

With respect to the dependent claim 26, the claim elaborates that wherein the plurality of data elements are loaded into the plurality of slabs of registers in a transposed order (the data elements “loading/loaded” in “a transposed order” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b). In addition, “transposed order” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).).

Dependent claims 28-32 and 34-38 recite the same features as applied to claims 22-26 respectively above, therefore they are also rejected under the same rationale.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 21-38 are rejected under 35 U.S.C. 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
As per claims 21, 27 and 33 (line# refers to claim 21):
In line 8, it recites the phrase “the data elements”. However, prior to this phrase at line 3, it recites “a plurality of data elements”. Thus, it is unclear whether the second recitation of “the data elements” is the same or different from the first recitation of “a plurality of data elements”. If they are the same, same name should be used.

In line 8, it recites the phrase “the register rows”. However, prior to this phrase at line 5, it recites “a plurality of register rows”. Thus, it is unclear whether the second recitation of “the register rows” is the same or different from the first recitation of “a plurality of register rows”. If they are the same, same name should be used.

Lines 8-11, it recites “sorting…the data elements loaded into the register rows of a fist subset of the plurality of slabs…with a first ordering process…and the data elements loaded into the register rows of a second subset of the plurality of slabs…with second ordering process”. It is uncertain how to sorting the same “data elements” that loaded into two subset of plurality of slabs with two different ordering process (i.e., it is unclear if the sorting is performed by sorting a first portion of the plurality of data elements loaded at the first subset in first ordering process, and sorting a second portion of the plurality of data elements loaded at the second subset in second ordering process?).



	Lines 14-15, it recites “the plurality of register rows stored in the shared memory”. It is uncertain what the relationship between “a plurality of register rows” in the each slab of registers and “the plurality of register rows stored in the shared memory” (i.e., are they are the same register rows? or the different? Which plurality of register rows are referring to? Is the plurality of register rows of the single one of the slab of register in the shard memory, or it is the plurality of register rows from the total plurality of register rows of all the slabs of register that is stored in the shard memory? Is the number of the register rows within the shared memory is the same as the number of register rows in all the slabs of registers?).

As per claims 24, 30 and 36 (line# refers to claim 24):
Line 1, “each of the processors” lacks antecedence basis.

As per claims 22-23, 25-26, 28-29, 31-32, 34-35 and 37-38:
They are method, system and non-transitory computer readable medium claims that depend on claims 21, 27 and 33 above. Therefore, they have same deficiencies as claims 21, 27 and 33 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21-22, 26-28, 32-34 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren et al. (US Pub. 2002/0198911 A1) in view of Nordquist (US Patent 7,489,315 B1).

As per claim 21, Blomgren teaches the invention substantially as claimed including A method for sorting data in parallel on a data-parallel computing device, the method comprising (Blomgren, Fig. 2, 16 (as data-parallel computing device); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as SIMD data parallel operation; Claim 1, lines 16-19,  simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix): 
loading, from a shared memory by a group of parallel processors, a plurality of data elements into a plurality of slabs of registers (Blomgren, Fig. 2, 16, 40-70 (as parallel processors); Fig. 5, 130 (including data elements), 140, 142, 144, 146 (as slabs of registers); [0012] lines 5-7, The matrix processor 16 comprises 16 processing elements 40-70 (as parallel processors); [0019] lines 1-2, Fig. 5 shows the results of loading 4 matric registers from memory (as shared memory since it is used by processing elements 40-70); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement (as sorting) operation), wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors (Blomgren, Fig. 5, 140 (including two dimensional array of registers, rows and columns), 142, 144, 146 (as slabs of registers); [0012] lines 5-8, The matrix processor 16 comprises 16 processing elements 40-70  where an individual processing element (PE) 80 comprises 16 PE register entries M0-M15. lines 13-16, An individual matrix register is a combination of register entries that includes an individual PE register entry from each PE register file from each individual processing element (as each slab of registers is associated with at least one parallel processor) in the matrix processor)); 
sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in accordance with a first ordering process and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in accordance with a second ordering process (Blomgren, Figs 6-9 (as including first ordering processing and second ordering processing for sorting); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0047] lines 2-8, The block4 instruction is implemented on 4 contiguous matrix registers and exchanges row data between the four matrix registers 140, 142, 144, and 146 in 3 steps. In step 1, the block4 operation swaps matrix register 140, row 1 with matrix register 142, row 0; and matrix register 144, row 3 with matrix register 146, row 2. This to exchange is performed by simultaneously (as sorting first subset of the plurality of slabs of registers with a first ordering process); [0048] lines 1-4, FIG. 7 shows step 2 of the block4 instruction where the block4 operation swaps matrix register 140, row 2 with matrix register 144, row 0; and matrix register 142, row 3 with matrix register 146, row 1. This is performed by simultaneously; [0049] lines 1-4, FIG. 8 shows step 3 of the block4 instruction where the block4 operation swaps matrix register 140, row 3 with matrix register 146, row 0; and matrix register 142, row 2 with matrix register 144, row 1. This is performed by simultaneously (as sorting second subset of the plurality of slabs of registers with a second ordering process); [0050] lines 1-2, FIG. 9 shows the final state of the matrix registers 140, 142, 144, and 146 at the end of the block4 operation; See ; 
storing, by the group of parallel processors, the sorted data elements in the shared memory (Blomgren, [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement operation; [0050] lines 3-12, the contents of the four matrix registers have been rearranged from four 1x16 vectors (at the beginning of the swapping) to four contiguous 4x4 sub-matrices, as illustrated by the 4x4 sub-matrices 160, 162, 164, and 166 in AxB matrix 120. Since all of the matrix data rearrangement is based upon swap operations…which is suitable for storing back to memory).

 Blomgren fails to specifically teach reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory; merging and sorting, by the group of parallel processors, the reloaded data elements; and storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory.

However, Nordquist teaches reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 3, 325, (330, 335, 340, 345, as whole as shared memory), 350, (370, 365, 360, 355, as respective slab of the plurality of slabs of registers); Col 20, line 6, provide an on-chip shared memory; Col 4, lines 55-56, Cores 205 and 210 can be SIMD processors (as group of parallel processors) which execute instructions for 16 threads in parallel; Col 8, lines 5-31, Avoiding bank conflicts can improve the performance of the system…The second crossbar outputs (as to reloading) the first transpose buffer output 355, the second transpose buffer output 360, the third transpose buffer output 365, and the fourth transpose buffer output 370. The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345…The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345. (as reloading from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory)); 
merging and sorting, by the group of parallel processors, the reloaded data elements (Nordquist, Fig. 1, 105 Multiprocessor; Col 8, lines 11-31, The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350...The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345, reorganizing the order (as merging and sorting) with the second crossbar 350…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350; Col 8, lines 33-49, the S, T, R, and Q texture coordinates in the first transpose buffer output 355, second transpose buffer output 360, third transpose buffer output 365, and fourth transpose buffer output 370 are arranged (as sorted) as quads because for each clock cycle all of the data for an entire quad is obtained…Therefore the transpose buffer has transposed the data format that originally required four clock cycles to get one entire quad into a data format; [Examiner noted: Fig. 3, 370, 365, 360 and 355 (as slab of the plurality of slabs of registers) are reorganized. For example, Fig. 3, 370 (SC SD SE SF) (TC TD TE TF) (RC RD RE RF) (QC QD QE QF) are reorganized in order, such that (SC SD SE SF) from 345 row 1 is combined/merged with (TC TD TE TF) from 330) within the 370); and 
storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 13a, 1305 Quad,1Fig. 3B, 1310 Render target memory; Pixel 1320-1323; Col 20, line 6, provide an on-chip shared memory; Col 8, lines 47-55, Therefore the transpose buffer has transposed the data format that originally required four clock cycles to get one entire quad into a data format…The advantage of having quads is that many of the other graphics modules such as the texture module 220 and the ROP module 225 use quads. Since most graphics modules are designed to process quads; Col 3, lines 63-65, FIG. 13B is an illustration showing pixels of the quad stored in a pitch format memory, in accordance with one embodiment of the present invention).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren with Nordquist because Nordquist’s teaching of reloading/outputting the data from the memory again to the registers/arrays for the purpose of reorganizing the order would have provided Blomgren’s system with the advantage and capability to allow the data elements to be transferred to a quad format in order to allow the particular module to processing the transferred data which improving the system efficiency and performance. 

As per claim 22, Blomgren and Nordquist teach the invention according to claim 21 above. Blomgren further teaches wherein the first ordering process is used to sort the data elements loaded into the register rows of the first subset of the plurality of slabs of registers in a descending order (Blomgren, Fig. 9, 140, first row (0,0 0,1 0,2 0,3), 142 first row (0,4 0,5 06 0,7) (as rows of the first subset of the plurality of slabs in a descending order); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0047] lines 3-8,. In step 1, the block4 operation swaps matrix register 140, row 1 with matrix register 142, row 0; and matrix register 144, row 3 with matrix register 146, row 2. This to exchange is performed by simultaneously (as sorting first subset of the plurality of slabs of registers with a first ordering process)).

As per claim 26, Blomgren and Nordquist teach the invention according to claim 21 above. Blomgren further teaches wherein the plurality of data elements are loaded into the plurality of slabs of registers in a transposed order (Blomgren, [0019] lines 1-2, Fig. 5 shows the results of loading 4 matric registers from memory; [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose…and block rearrangement operation; [0057] lines 1-8, FIG. 15 illustrates another way to view the block4 and block4v operations by illustrating that the instructions are swapping indices. If one takes the 4 matrix registers, 140, 142, 144, and 146, then one would have a 4x.4x4 array 139 of elements (register/index, row, column). Given that the block4, block4v , and transpose operations can be viewed as swapping two of the 3 indices, these operations would then produce the following results: block4: (register/index, row, column); (row, register/index, column), block4v: (register/index, row, column); (column, row, register/ index)); transpose: (register/index, row, column; (register/index, column, row).).

As per claim 27-28 and 32, they are system claims of claims 21-22 and 26 respectively above. Therefore, they are rejected for the same reason as claims 21-22 and 26 respectively above.

As per claim 33-34 and 38, they are non-transitory computer readable medium claims of claims 21-22 and 26 respectively above. Therefore, they are rejected for the same reason as claims 21-22 and 26 respectively above.

Claims 23, 29 and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren and Nordquist, as applied to claim 22 above, and further in view of Sano (US Pub. 2012/0259714 A1).

As per claim 23, Blomgren and Nordquist teach the invention according to claim 22 above. Blomgren further teaches wherein the second ordering process is used to sort the data elements loaded into the register rows of the second subset of the plurality of slabs of registers (Blomgren, Figs 7-9 (as including second ordering processing for sorting); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0048] lines 1-4, FIG. 7 shows step 2 of the block4 instruction where the block4 operation swaps matrix register 140, row 2 with matrix register 144, row 0; and matrix register 142, row 3 with matrix register 146, row 1. This is performed by simultaneously; [0049] lines 1-4, FIG. 8 shows step 3 of the block4 instruction where the block4 operation swaps matrix register 140, row 3 with matrix register 146, row 0; and matrix register 142, row 2 with matrix register 144, row 1. This is performed by simultaneously (as sorting second subset of the plurality of slabs of registers with a second ordering process)).  

Blomgren and Nordquist fail to specifically teach sort the data elements loaded into the register rows in a reverse and ascending order.

However, Sano teaches sort the data elements loaded into the register rows in a reverse and ascending order (Sano, Fig. 4; [0030] lines 9-13, the four key buttons 50-13 to 50-16 in the fourth row (the bottom row) are orderly numbered, from right to left, 13, 14, 15 and 16, so that the numbers added for the key buttons in each row are ascended in reverse direction).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren and Nordquist with Sano because Sano’s teaching of sorting the data in each row with ascended and reverse direction would have provided Blomgren and Nordquist’s system with the advantage and capability to processing the operations more efficiently which improving the system performance. 

As per claim 29, it is a system claim of claim 23 above. Therefore, it is rejected for the same reason as claim 23 above.

As per claim 35, it is a non-transitory computer readable medium claim of claim 23 above. Therefore, it is rejected for the same reason as claim 23 above.


Claims 24, 30 and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren and Nordquist, as applied to claim 21 above, and further in view of Jatin Chhugani et al. (Efficient Implementation of Sorting on Multi-Core SMID CPU Architecture; hereafter Jatin)
Jatin was cited in the IDS filed on 05/13/2019

As per claim 24, Blomgren and Nordquist teach the invention according to claim 21 above. Nordquist further teaches wherein upon each of the processors in the group of parallel processors reloading the subset of data elements, performing by the group of parallel processors, merge of each register column of the plurality of register columns in each of the plurality of slabs of registers (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 3, 370, 365, 360, 355; Col 8, lines 11-31, The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350...The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345, reorganizing the order (as merging and sorting) with the second crossbar 350…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350; [Examiner noted: Fig. 3, 370, 365, 360 and 355 (as slab of the plurality of slabs of registers) are reorganized. For example, Fig. 3, 370 (SC SD SE SF) (TC TD TE TF) (RC RD RE RF) (QC QD QE QF) (16 columns), (SC) from 345 column 1 is merged with (TC) from 330 within the 370)]).


Blomgren and Nordquist fail to specifically teach a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers.

However, Jatin teaches a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers (Jatin, Page 1318, left column, Fig. 5, Bitonic merge network for merging sequences of length 16 elements each (4 of 4x4 matrix/slabs); Page 1314, left column, 2. Related work, paragraph 4, lines 10-13, GPUABiSort [9] was proposed, that is based on adaptive bitonic sort [2] and rearranges the data using bitonic trees to reduce the number of comparisons).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren and Nordquist with Jatin because Jatin’s teaching of bitonic merge for rearranging the data would have provided Blomgren and Nordquist’s system with the advantage and capability to reduce the number of comparisons which improving the data processing performance and efficiency.

As per claim 30, it is a system claim of claim 24 above. Therefore, it is rejected for the same reason as claim 24 above.

As per claim 36, it is a non-transitory computer readable medium claim of claim 24 above. Therefore, it is rejected for the same reason as claim 24 above.

Claims 25, 31 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren and Nordquist, as applied to claim 21 above, and further in view of Lin et al. (US Pub. 2006/0126726 A1).

As per claim 25, Blomgren and Nordquist teach the invention according to claim 21 above. Blomgren and Nordquist fail to specifically teach wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors.

	However, Lin teaches wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors (Lin, Fig. 4, Columns 0-7 (as number of register columns), processor 0-7 (each column corresponding to number of processors in the group of parallel processors); [0024] line 1, A parallel processing DSP structure).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren and Nordquist with Lin because Lin’s teaching of a number of the register columns corresponds to a number of processors would have provided Blomgren and Nordquist’s system with the advantage and capability to allow each processor to processing the 

As per claim 31, it is a system claim of claim 25 above. Therefore, it is rejected for the same reason as claim 25 above.

As per claim 37, it is a non-transitory computer readable medium claim of claim 25 above. Therefore, it is rejected for the same reason as claim 25 above.


Response to Arguments
The Amendment filed on 11/23/2021 has been entered. Applicant’s amendment has overcome the previous rejections under 35 U.S.C § 112(b). However, new 112(b) rejection has been made in response to the Applicant’s amendment.

Applicant’s arguments with respect to claims 21-38 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        

/Z.X./Examiner, Art Unit 2195