DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The Office Action is in response to Applicant’s Amendment and Remarks filed on 15 June 2022. 
Claims 21, 24-27, 30-33 and 36-38 are pending in this application. Claims 1-20, 22-23, 28-29 and 34-35 were cancelled. 


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 21, 24-27, 30-33 and 36-38 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Claim 21 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1, Statutory Category: Yes, the claim 21 is a method that recites a series of steps and therefore falls in the statutory category of a process.
Step 2A- Prong 1: Judicial Exception Recited: Yes, the claim recites: “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements”. As drafted, the claim as a whole recites a method including steps that could be performed in the human mind, but for the recitation of generic computing components. For example, a person can easily evaluating/determine/judging the values/sizes of the data elements, performing the sorting/reordering with different orders (i.e., descending order, reverse and ascending order) for the different data elements and grouping/combining/merging the reordered/sorted/reloaded data elements. Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). 
 Therefore, yes, the claim do recite judicial exceptions.
Step 2A- Prong 2: Integrated into a practical Application: No, this judicial exception is not integrated into a practical application. In particular, the claim recites additional limitations that “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)). In addition, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “the data elements loaded into the register rows of a first subset of the plurality of slabs of registers” and “wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). The combination of these additional elements is no more than mere instructions to apply the exception using the generic computer components (i.e., “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns”). Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to the abstract idea.
Step 2B: Claim provides an Inventive Concept: No. As discussed with respect to Step 2A prong Two, the additional elements “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). In addition, the limitation “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)), which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the steps of “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” were considered to be extra-solution activity in Step 2A as insignificant pre-solution data gathering, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. The background of the example does not provide any indication that the steps of “loading”, “storing”, “reloading” and “storing” are anything other than a generic, off-the-shelf computer component, and the specification paragraph [0003] lines 1-2 specifically recites “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”.
Accordingly, a conclusion that the steps of “loading/reloading” and “storing” is well understood, routine, conventional activity is supported under Berkheimer option 1.
For these reasons, there is no inventive concept in the claim, and thus the claim is ineligible. 

Independent claims 27 (system claim) and 33 (non-transitory computer-readable medium claim) are rejected for the same reason as claim 21 above. Claim 33 further recites “A non-transitory computer-readable medium comprising instructions”. These additional elements are directed to generic computer components providing generic computer functions (see MPEP § 2106.05(b)). 


With respect to the dependent claim 24, the claim elaborates that wherein upon each of the processors in the group of parallel processors reloading the subset of data elements, performing by the group of parallel processors, a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers. (“performing by the group of parallel processors” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b). In addition, performing the “bitonic merge” is being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 25, the claim elaborates that wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors. (“number of the register columns” corresponds to “a number of processors” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b).).

With respect to the dependent claim 26, the claim elaborates that wherein the plurality of data elements are loaded into the plurality of slabs of registers in a transposed order (the data elements “loading/loaded” in “a transposed order” as being treated as a generic computing device performing a generic computer function, see MPEP §2106.05(b). In addition, “transposed order” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind (including an observation, evaluation, judgment, opinion)).).

Dependent claims 30-32 and 36-38 recite the same features as applied to claims 24-26 respectively above, therefore they are also rejected under the same rationale.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 21, 24-27, 30-33 and 36-38 are rejected under 35 U.S.C. 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
As per claims 21, 27 and 33 (line# refers to claim 21):
In line 8, it recites the phrase “the data elements”. However, prior to this phrase at line 3, it recites “a plurality of data elements”. Thus, it is unclear whether the second recitation of “the data elements” is the same or different from the first recitation of “a plurality of data elements”. If they are the same, same name should be used.

In line 8, it recites the phrase “the register rows”. However, prior to this phrase at line 5, it recites “a plurality of register rows”. Thus, it is unclear whether the second recitation of “the register rows” is the same or different from the first recitation of “a plurality of register rows”. If they are the same, same name should be used.

Lines 8-11, it recites “sorting…the data elements loaded into the register rows of a fist subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order”. It is uncertain how to sorting the same “data elements” that loaded into two subset of plurality of slabs with two different ordering process (i.e., it is unclear if the sorting is performed by sorting a first portion of the plurality of data elements loaded at the first subset in a descending order, and sorting a second portion of the plurality of data elements loaded at the second subset in a reverse and ascending order?).

As per claims 24, 30 and 36 (line# refers to claim 24):
Line 1, “each of the processors” lacks antecedence basis.

As per claims 25-26, 31-32 and 37-38:
They are method, system and non-transitory computer readable medium claims that depend on claims 21, 27 and 33 above. Therefore, they have same deficiencies as claims 21, 27 and 33 above.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21, 26-27, 32-33 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren et al. (US Pub. 2002/0198911 A1) in view of Sano (US Pub. 2012/0259714 A1) and further in view of Nordquist (US Patent 7,489,315 B1).
Blomgren and Nordquist were cited in the previous Office Action.
Sano was cited in the PTO-892 mailed on 03/31/2022.

As per claim 21, Blomgren teaches the invention substantially as claimed including A computer-implemented method for sorting data, the method comprising (Blomgren, [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0056] line 10, SIMD data parallel operation; Claim 1, lines 16-19,  simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix): 
loading, from a shared memory by a group of parallel processors, a plurality of data elements into a plurality of slabs of registers (Blomgren, Fig. 2, 16, 40-70 (as parallel processors); Fig. 5, 130 (including data elements), 140, 142, 144, 146 (as slabs of registers); [0012] lines 5-7, The matrix processor 16 comprises 16 processing elements 40-70 (as parallel processors); [0019] lines 1-2, Fig. 5 shows the results of loading 4 matric registers from memory (as shared memory since it is used by processing elements 40-70); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement (as sorting) operation), wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors (Blomgren, Fig. 5, 140 (including two dimensional array of registers, rows and columns), 142, 144, 146 (as slabs of registers); [0012] lines 5-8, The matrix processor 16 comprises 16 processing elements 40-70  where an individual processing element (PE) 80 comprises 16 PE register entries M0-M15. lines 13-16, An individual matrix register is a combination of register entries that includes an individual PE register entry from each PE register file from each individual processing element (as each slab of registers is associated with at least one parallel processor) in the matrix processor)); 
sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a second ordering process (Blomgren, Figs 6-9 (as including descending order and second ordering processing for sorting); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0047] lines 2-8, The block4 instruction is implemented on 4 contiguous matrix registers and exchanges row data between the four matrix registers 140, 142, 144, and 146 in 3 steps. In step 1, the block4 operation swaps matrix register 140, row 1 with matrix register 142, row 0; and matrix register 144, row 3 with matrix register 146, row 2. This to exchange is performed by simultaneously (as sorting first subset of the plurality of slabs of registers in descending order; also see Fig. 9;  (140, row 1, 142 row 1, 144 ro 1 and 146 row 1); [See specs: [0100]: “As shown in Figure 16A, the first half of the processor groups (i.e., Processor Groups 1 and 2,) may add the data elements stored in their respective slabs in a descending order by register row to the shared memory 1680 (e.g., left to right starting from the top left)”]); [0048] lines 1-4, FIG. 7 shows step 2 of the block4 instruction where the block4 operation swaps matrix register 140, row 2 with matrix register 144, row 0; and matrix register 142, row 3 with matrix register 146, row 1. This is performed by simultaneously; [0049] lines 1-4, FIG. 8 shows step 3 of the block4 instruction where the block4 operation swaps matrix register 140, row 3 with matrix register 146, row 0; and matrix register 142, row 2 with matrix register 144, row 1. This is performed by simultaneously (as sorting second subset of the plurality of slabs of registers with a second ordering process); [0050] lines 1-2, FIG. 9 shows the final state of the matrix registers 140, 142, 144, and 146 at the end of the block4 operation; See Fig. 9, 140, 142, 144 and 146 [Examiner noted: the final state of matrix registers 140, 142, 144 and 146 are sorted. For example, matrix registers 140, row 2 is sorted from  (0,4 0,5 0,6 0,7) in Fig. 5 to (1,0, 1,1, 1,2 1,3) in Fig. 9]); 
storing, by the group of parallel processors, the sorted data elements in the shared memory (Blomgren, [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement operation; [0050] lines 3-12, the contents of the four matrix registers have been rearranged from four 1x16 vectors (at the beginning of the swapping) to four contiguous 4x4 sub-matrices, as illustrated by the 4x4 sub-matrices 160, 162, 164, and 166 in AxB matrix 120. Since all of the matrix data rearrangement is based upon swap operations…which is suitable for storing back to memory).

Blomgren fails to specifically teach the second ordering process is a reverse and ascending order.

However, Sano teaches the second ordering process is a reverse and ascending order (Sano, Fig. 4; [0030] lines 9-13, the four key buttons 50-13 to 50-16 in the fourth row (the bottom row) are orderly numbered, from right to left, 13, 14, 15 and 16, so that the numbers added for the key buttons in each row are ascended in reverse direction [See specs [0100], add the data elements stored in their respective slabs a reverse and ascending order by register row (e.g., right to left starting from the bottom right)]).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren with Sano because Sano’s teaching of sorting the data in each row with ascended and reverse direction would have provided Blomgren’s system with the advantage and capability to processing the operations more efficiently which improving the system performance. 

Both Blomgren and Sano fail to specifically teach reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory; merging and sorting, by the group of parallel processors, the reloaded data elements; and storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory.

However, Nordquist teaches reloading, by the group of parallel processors, from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 3, 325, (330, 335, 340, 345, as whole as shared memory), 350, (370, 365, 360, 355, as respective slab of the plurality of slabs of registers); Col 20, line 6, provide an on-chip shared memory; Col 4, lines 55-56, Cores 205 and 210 can be SIMD processors (as group of parallel processors) which execute instructions for 16 threads in parallel; Col 8, lines 5-31, Avoiding bank conflicts can improve the performance of the system…The second crossbar outputs (as to reloading) the first transpose buffer output 355, the second transpose buffer output 360, the third transpose buffer output 365, and the fourth transpose buffer output 370. The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345…The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345. (as reloading from the shared memory into each respective slab of the plurality of slabs of registers, a subset of data elements of each of the plurality of register rows stored in the shared memory)); 
merging and sorting, by the group of parallel processors, the reloaded data elements (Nordquist, Fig. 1, 105 Multiprocessor; Col 8, lines 11-31, The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350...The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345, reorganizing the order (as merging and sorting) with the second crossbar 350…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350; Col 8, lines 33-49, the S, T, R, and Q texture coordinates in the first transpose buffer output 355, second transpose buffer output 360, third transpose buffer output 365, and fourth transpose buffer output 370 are arranged (as sorted) as quads because for each clock cycle all of the data for an entire quad is obtained…Therefore the transpose buffer has transposed the data format that originally required four clock cycles to get one entire quad into a data format; [Examiner noted: Fig. 3, 370, 365, 360 and 355 (as slab of the plurality of slabs of registers) are reorganized. For example, Fig. 3, 370 (SC SD SE SF) (TC TD TE TF) (RC RD RE RF) (QC QD QE QF) are reorganized in order, such that (SC SD SE SF) from 345 row 1 is combined/merged with (TC TD TE TF) from 330) within the 370); and 
storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 13a, 1305 Quad,1Fig. 3B, 1310 Render target memory; Pixel 1320-1323; Col 20, line 6, provide an on-chip shared memory; Col 8, lines 47-55, Therefore the transpose buffer has transposed the data format that originally required four clock cycles to get one entire quad into a data format…The advantage of having quads is that many of the other graphics modules such as the texture module 220 and the ROP module 225 use quads. Since most graphics modules are designed to process quads; Col 3, lines 63-65, FIG. 13B is an illustration showing pixels of the quad stored in a pitch format memory, in accordance with one embodiment of the present invention).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren and Sano with Nordquist because Nordquist’s teaching of reloading/outputting the data from the memory again to the registers/arrays for the purpose of reorganizing the order would have provided Blomgren and Sano’s system with the advantage and capability to allow the data elements to be transferred to a quad format in order to allow the particular module to processing the transferred data which improving the system efficiency and performance. 

As per claim 26, Blomgren, Sano and Nordquist teach the invention according to claim 21 above. Blomgren further teaches wherein the plurality of data elements are loaded into the plurality of slabs of registers in a transposed order (Blomgren, [0019] lines 1-2, Fig. 5 shows the results of loading 4 matric registers from memory; [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose…and block rearrangement operation; [0057] lines 1-8, FIG. 15 illustrates another way to view the block4 and block4v operations by illustrating that the instructions are swapping indices. If one takes the 4 matrix registers, 140, 142, 144, and 146, then one would have a 4x.4x4 array 139 of elements (register/index, row, column). Given that the block4, block4v , and transpose operations can be viewed as swapping two of the 3 indices, these operations would then produce the following results: block4: (register/index, row, column); (row, register/index, column), block4v: (register/index, row, column); (column, row, register/ index)); transpose: (register/index, row, column; (register/index, column, row).).

As per claim 27 and 32, they are system claims of claims 21 and 26 respectively above. Therefore, they are rejected for the same reason as claims 21 and 26 respectively above.

As per claim 33 and 38, they are non-transitory computer readable medium claims of claims 21 and 26 respectively above. Therefore, they are rejected for the same reason as claims 21 and 26 respectively above.


Claims 24, 30 and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren, Sano and Nordquist, as applied to claim 21 above, and further in view of Jatin Chhugani et al. (Efficient Implementation of Sorting on Multi-Core SMID CPU Architecture; hereafter Jatin)
Jatin was cited in the previous Office Action.

As per claim 24, Blomgren, Sano and Nordquist teach the invention according to claim 21 above. Nordquist further teaches wherein upon each of the processors in the group of parallel processors reloading the subset of data elements, performing by the group of parallel processors, merge of each register column of the plurality of register columns in each of the plurality of slabs of registers (Nordquist, Fig. 1, 105 Multiprocessor; Fig. 3, 370, 365, 360, 355; Col 8, lines 11-31, The first transpose buffer output 355 is generated by reading the first row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350...The second transpose buffer output 360 is generated by reading the second row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350…The third transpose buffer output 365 is generated by reading the third row of the four RAMs 330, 335, 340, 345, reorganizing the order (as merging and sorting) with the second crossbar 350…The fourth transpose buffer output 370 is generated by reading the fourth row of the four RAMs 330, 335, 340, 345, reorganizing the order with the second crossbar 350; [Examiner noted: Fig. 3, 370, 365, 360 and 355 (as slab of the plurality of slabs of registers) are reorganized. For example, Fig. 3, 370 (SC SD SE SF) (TC TD TE TF) (RC RD RE RF) (QC QD QE QF) (16 columns), (SC) from 345 column 1 is merged with (TC) from 330 within the 370)]).

Blomgren, Sano and Nordquist fail to specifically teach a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers.

However, Jatin teaches a bitonic merge of each register column of the plurality of register columns in each of the plurality of slabs of registers (Jatin, Page 1318, left column, Fig. 5, Bitonic merge network for merging sequences of length 16 elements each (4 of 4x4 matrix/slabs); Page 1314, left column, 2. Related work, paragraph 4, lines 10-13, GPUABiSort [9] was proposed, that is based on adaptive bitonic sort [2] and rearranges the data using bitonic trees to reduce the number of comparisons).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren, Sano and Nordquist with Jatin because Jatin’s teaching of bitonic merge for rearranging the data would have provided Blomgren, Sano and Nordquist’s system with the advantage and capability to reduce the number of comparisons which improving the data processing performance and efficiency.

As per claim 30, it is a system claim of claim 24 above. Therefore, it is rejected for the same reason as claim 24 above.

As per claim 36, it is a non-transitory computer readable medium claim of claim 24 above. Therefore, it is rejected for the same reason as claim 24 above.


Claims 25, 31 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Blomgren, Sano and Nordquist, as applied to claim 21 above, and further in view of Lin et al. (US Pub. 2006/0126726 A1).
Lin was cited in the previous Office Action.

As per claim 25, Blomgren, Sano and Nordquist teach the invention according to claim 21 above. Blomgren, Sano and Nordquist fail to specifically teach wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors.

	However, Lin teaches wherein a number of the register columns of the two- dimensional array of registers corresponds to a number of processors in the group of parallel processors (Lin, Fig. 4, Columns 0-7 (as number of register columns), processor 0-7 (each column corresponding to number of processors in the group of parallel processors); [0024] line 1, A parallel processing DSP structure).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Blomgren, Sano and Nordquist with Lin because Lin’s teaching of a number of the register columns corresponds to a number of processors would have provided Blomgren, Sano and Nordquist’s system with the advantage and capability to allow each processor to processing the data elements of each columns respectively which improving the system efficiency and power consumption (see Lin, [0024], efficiency and power consumption).

As per claim 31, it is a system claim of claim 25 above. Therefore, it is rejected for the same reason as claim 25 above.

As per claim 37, it is a non-transitory computer readable medium claim of claim 25 above. Therefore, it is rejected for the same reason as claim 25 above.


Response to Arguments
The Amendment filed on 06/15/2022 has been entered. However, Applicant fails to address the 112(b) issues in the previous Office Action. Therefore, 112(b) rejection has been maintained.

In the remark applicant’s argue in substance: 
(a). Applicant respectfully disagrees and submits that selecting only certain words and making a generalized statement about mental processes shows that the entire basis for the rejection is fundamentally incorrect, and fails to comport with the requirements for a proper § 101 rejection. Claim 21 expressly recites technically detailed features that cannot properly be considered to be done "in the mind", such as "loading, from a shared memory by a group of parallel processors, a plurality of data elements into a plurality of slabs of registers…"sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers… and "merging and sorting, by the group of parallel processors, the reloaded data elements", as recited by amended claim 21.

(b). While the Examiner contends that the claimed features "could be performed in the human mind"), this formulaic statement simply ignores the requirements (e.g., loading "a plurality of data elements into a plurality of slabs of registers" and sorting "the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order) without articulating how it could be possible to do this "in the mind".

(c). In fact, as best understood it would not be humanly possible to perform the claimed features "in the mind". Applicant's specification notes that "many applications require data sorting in substantially real-time, such as search, data query processing, graphics, sparse linear algebra, machine learning, etc." (specification paragraph 0002). Solutions to such application as disclosed in the specification "within the disclosure relate generally to sorting data in parallel on a data- parallel computing device" (specification paragraph 0003). Applicant's specification also makes clear that the technology is firmly rooted in a specialized computing system itself, which provides significant technical advantages not found in conventional approaches: [0043].

(d). Applicant submits that a claim that recites "A computer-implemented method for sorting data, the method comprising: loading, from a shared memory by a group of parallel processors, a plurality of data elements into a plurality of slabs of registers… sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order; storing, by the group of parallel processors, the sorted data elements in the shared memory; reloading, by the group of parallel processors…merging and sorting, by the group of parallel processors, the reloaded data elements; and storing, by the group of parallel processors, the merged and sorted reloaded data elements in the shared memory" (emphasis added) necessarily cannot be performed in the human mind.

(e). The term "certain" qualifies the "certain methods of organizing human activity" grouping as a reminder of several important points. First, not all methods of organizing human activity are abstract ideas (e.g., "a defined set of steps for combining particular ingredients to create a drug formulation" is not a certain "method of organizing human activity").

(f). The Examiner has not properly evaluated claim 21 for a practical application. In particular, the Step 2A Prong 2 discussion on page 4, line 4 through page 5, line 2 of the Office Action, as reproduced below, starts with a conclusion, "this judicial exception is not integrated into a practical application" without addressing the substance of the specification…The prong #2 (step 2A) conclusory assertion in the Office Action is devoid of any actual analysis of the claimed features as presented.

(g). It is the examiner's burden to provide the evidence necessary to examine the application and if a rejection is made provide an explanation of the basis for the rejection", and when the "examiner has failed to favor this record with a clear explanation", it is grounds for vacating a rejection (Ex parte Luu, Appeal No. 2006-1222, nonprecedential decision at 6 and 8). In view of this, because there is no actual analysis provided in the Office Action regarding Prong #2 of Step 2A, the burden did not shift to Applicant and the rejection is per se deficient. For at least these reasons, Applicant submits that the § 101 rejection should be withdrawn. 

(h). Applicant submits that a detailed review pursuant to Prong #2 of Step 2A unquestionably demonstrates that the claimed features incorporate a practical application. When looking at the claimed limitations as an ordered combination in the manner set forth by the USPTO's guidelines, the invention as a whole amounts to significantly more than the alleged abstract idea, and in fact is integrated into a practical application. As noted in the application, parallelized processing offers significantly faster more efficient sorting than offered by current technology, thereby improving the functioning of computing devices, whereby high performance can be achieved on large, bandwidth-rich data-parallel devices, high energy efficiency can be achieved by minimizing off-chip memory load and stores such that the system's CPU(s) may be free to perform other processing tasks simultaneously (Specification 0043, emphasis added).

(i). In view of the above, Applicant submits that independent claim 21, when taken as an ordered combination, provides unconventional steps that confine any purported abstract idea to a particular useful application, including an integrated computer-based informational approach that allows for effective and efficient data-parallel computing. This is done, in part, by applying the technical features to the practical application via aspects of "sorting data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order", as recited by amended claim 21.

(j). This conclusory statement is devoid of any actual analysis of the claimed features themselves. This expressly contradicts what the MPEP explains regarding this exact type of assertion. Furthermore, per MPEP § 2106.07(a)(III): "Examiners should not assert that an additional element (or combination of elements) is well-understood, routine, or conventional unless the examiner finds, and expressly supports the rejection in writing ..." (emphasis added).

(k). the Examiner did not articular any facts on the record to support the rejection's Step 2B "analysis", Applicant's discussion above regarding Prong #2 of Step 2A is incorporated herein by reference. That discussion shows that claim 21 provides "Improvements to the Functioning of a Computer or To Any Other Technology or Technical Field" per MPEP 2106.05(a). As discussed above, claim 21 covers a specific solution to allow for effective and efficient data-parallel computing by (in part) "sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order".

(l). As should be clear from the entire discussion above and the application as filed, there cannot be sorting data elements using slabs of registers without the corresponding elements of the shared memory and a group of parallel processors. Thus, the technology as claimed is tied to a particular machine that implements the steps of the method.

(m). The claimed approaches involve a particular transformation. See MPEP § 2106.05(c). Here, by way of example, the transformation involves sorting data elements such that the order of data elements stored in the shared memory is altered to allow for effective and efficient data-parallel computing. When fully analyzed, the additional elements thus yield claims as a whole that amount to significantly more than the purported abstract concept, and the rejection should be withdrawn for at least this reason. 

(n). On page 17, lines 8-20 of the Office Action, the Examiner alleges that Fig. 9 of Blomgren, as reproduced below, teaches sorting data elements loaded in register rows of a first subset of slabs in a descending order. Applicant respectfully disagrees.

(O). Blomgren fails to teach or suggest "sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order" and "storing, by the group of parallel processors, the sorted data elements in the shared memory", as recited by amended claim 21.

(p). Nordquist fails to teach or suggest "sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order" and "storing, by the group of parallel processors, the sorted data elements in the shared memory", as recited by amended claim 21.

(q). Sato does not teach or suggest "sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order" and "storing, by the group of parallel processors, the sorted data elements in the shared memory", as recited by amended claim 21. Thus, Sato fails to cure the admitted deficiencies of Blomgren and Nordquist as applied.

Examiner respectfully disagreed with Applicant’s argument for the following reasons:
As to point (a). Examiner would like to point out that under 2019 PEG, step 2A-Prong 1, the abstract idea of the claim is identified (i.e., Judicial Exception Recited?). Here, the claim 21 recites the limitations of “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea. Because, a person can easily evaluating/determine/judging the values of the data elements (i.e., size, Number, Larger or Smaller?), performing the sorting/reordering based on that values of the data elements with different orders (descending order, reverse and ascending order) and grouping/combining/merging the reordered/sorted/reloaded data elements. Therefore, yes, the claim do recite judicial exceptions.

In addition, in response to the Applicant ‘s argument that “expressly recites technically detailed features that cannot properly be considered to be done" in the mind". 

Examiner respectfully disagree. As indicated in the 101 rejection. The additional elements just for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). Moreover, Examiner would like to point out that these additional elements are further analyzed in Step 2A- Prong 2 and Step 2B. Please refers to 101 rejection above.

As to point (b). Examiner would like to point out that claimed limitations of “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea (i.e., Mental processes). 
In response to the Applicant’s argument that “formulaic statement simply ignores the requirements”.  Examiner would like direct Applicant to Step 2A- Prong 2 and Step 2B under 101 rejection. These additional elements are further analyzed. For example, “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)) which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). Examiner specifically provided an evidence from specification paragraph [0003] lines 1-2, “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”. That is, the “loading”, “storing”, “reloading” and “storing” are additionally well understood, routine, conventional activity that is for data operations. And this is supported under Berkheimer option 1. (see MPEP § 2106.05(d))

As to point (c), Applicant attempts to allege not be humanly possible to perform the claimed features "in the mind" by relying upon the specification paragraph [0003] and [0043]. However, there is nothing in the claim to suggest improvement from the specification (i.e., the claimed limitation fails to recited how these limitations will be implemented to provide significant technical advantages).
Again, the claimed limitations “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea (i.e., Mental processes). 

As to point (d), Again, the claimed limitations “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea (i.e., Mental processes). And other additional elements indicated in the claim are further analyzed under Step 2A- Prong 2 and Step 2B. 
For example, the additional elements “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)).  “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)) which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). Examiner specifically provided an evidence from specification paragraph [0003] lines 1-2, “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”. That is, the “loading”, “storing”, “reloading” and “storing” are additionally well understood, routine, conventional activity that is for data operations. And this is supported under Berkheimer option 1. (see MPEP § 2106.05(d))

As to point (e). Examiner would like to point out that the 101 rejection is directed to a judicial exception without significantly more and which is directed to the Mental processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion).  Examiner does not using organizing human activity as abstract ideas. Please see 101 rejection above.

As to point (f), Examiner would like to point out that under Step 2A Prong 2, examiner need to identifying that does the claim recite additional elements that integrate the judicial exception into a practical application. Here, examiner has clearly analyzed all the additional limitations in the claim to see if that integrate the judicial exception into a practical application. For example, the claim recites additional limitations that “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)). 
In addition, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “the data elements loaded into the register rows of a first subset of the plurality of slabs of registers” and “wherein each slab of registers includes a two- dimensional array of registers having a plurality of register rows and a plurality of register columns, and each slab of registers is associated with at least one parallel processor in the group of parallel processors” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). Examiner specifically indicated that the combination of these additional elements is no more than mere instructions to apply the exception using the generic computer components (i.e., “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns”). Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to the abstract idea.
Again, the further analysis for the additional limitation is analyzed under Step 2B. Therefore, Applicant’s argument has not been found to be persuasive.

As to point (g), Please see above (i.e., point (f)). Examiner specifically indicated that the combination of these additional elements is no more than mere instructions to apply the exception using the generic computer components (i.e., “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns”). Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they not impose any meaningful limits on practicing the abstract idea. Again, the further analysis for the additional limitation is analyzed under Step 2B. Therefore, Applicant’s argument has not been found to be persuasive.

As to point (h), Applicant attempts to allege claimed features incorporate a practical application by relying upon the specification paragraph [0043] (i.e., high performance can be achieved on large, bandwidth-rich data-parallel devices, high energy efficiency can be achieved by minimizing off-chip memory load and stores such that the system's CPU(s) may be free to perform other processing tasks simultaneously). However, Examiner would like to remind Applicant that the claimed limitation fails to recited how these limitations will be implemented to provide significant technical advantages. There is nothing in the claim to suggest improvement from the specification.
Again, the claimed limitations “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea (i.e., Mental processes). 

As to point (i), Examiner would like to point out that the claim fails to recites how the claimed limitation will allowing for effective and efficient data-parallel computing. Again, “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea (i.e., Mental processes).

As to point (j), Examiner would like to point out that the additional element are clearly analyzed under step 2B. For example, the additional elements “a data-parallel computing device”, “shared memory”, “a group of parallel processors”, “plurality of slabs of registers”, “plurality of register rows” and “plurality of register columns” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). In addition, the limitation “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” are insignificant pre-solution data gathering (see MPEP § 2106.05(g)), which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the steps of “loading, a plurality of data elements”, “storing, the sorted data elements”, “reloading, a subset of data elements” and “storing, the merged and sorted reloaded data elements” were considered to be extra-solution activity in Step 2A as insignificant pre-solution data gathering, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. 
Under 2019 PEG, In a step 2B analysis, an additional element (or combination of elements) is not well-understood, routine or conventional unless the examiner finds, and expressly supports a rejection in writing with, one or more of the following four options:
Option 1 – Statement(s) by Applicant
	Option 2 – Court Decisions in MPEP § 2106.05(d)(II)
Option 3 – Publication(s)
Option 4 – Official Notice
Examiner specifically citing the specification paragraph [0003] to provide support for rejection. For example, the background of the example does not provide any indication that the steps of “loading”, “storing”, “reloading” and “storing” are anything other than a generic, off-the-shelf computer component, and the specification paragraph [0003] lines 1-2 specifically recites “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”.
Accordingly, a conclusion that the steps of “loading/reloading” and “storing” is well understood, routine, conventional activity is supported under Berkheimer option 1.
Option 1 – Statement(s) by Applicant
An explanation based on an express statement in the specification (e.g., citation to a relevant portion of the specification) that demonstrates the well-understood, routine, conventional nature of the additional element(s) 
A specification demonstrates the well-understood, routine, conventional nature of additional elements when it describes the additional element(s) as conventional (or an equivalent term); as a commercially available product; or, in a way that shows the element is widely prevalent or in common use.
Therefore, Applicant’s argument has not been found to be persuasive.

As to point (k), please refers to point (j) above for facts on the record to support the rejection's Step 2B "analysis. 
In addition, in response to Applicant’s argument that “claim 21 covers a specific solution to allow for effective and efficient data-parallel computing by…”.
Examiner would respectfully disagree. Again, Applicant attempts to allege Improvements to the Functioning of a Computer by relying upon the specification paragraph. However, there is nothing in the claim to suggest improvement from the specification (i.e., the claimed limitation fails to recited how these limitations will be implemented to effective and efficient data-parallel computing).
Again, the claimed limitations “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements” are identified as abstract idea (i.e., Mental processes). 

As to point (l), Examiner would like to point out that examiner has clearly indicated that the “sorting, the data elements, in a descending order and the data elements in a reverse and ascending order” and “merging and sorting, the reloaded data elements”. As drafted, the claim as a whole recites a method including steps that could be performed in the human mind, but for the recitation of generic computing components. For example, a person can easily evaluating/determine/judging the values/sizes of the data elements, performing the sorting/reordering with different orders (i.e., descending order, reverse and ascending order) for the different data elements and grouping/combining/merging the reordered/sorted/reloaded data elements. Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion).
In addition, the additional elements “slabs of registers” “shared memory and a group of parallel processors” are recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function, see MPEP §2106.05(b)). Therefore, Applicant’s argument has not been found to be persuasive.

As to point (m), Examiner would like to point out that “data elements stored in the shared memory” is just an insignificant pre-solution data gathering (see MPEP § 2106.05(g)), which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d). Mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.
Here, “data elements stored in the shared memory” were considered to be extra-solution activity as insignificant pre-solution data gathering, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. Examiner specifically citing the specification paragraph [0003] to provide support for rejection. For example, the background of the example does not provide any indication that the steps of “loading”, “storing”, “reloading” and “storing” are anything other than a generic, off-the-shelf computer component, and the specification paragraph [0003] lines 1-2 specifically recites “sorting methods include semi-parallelized and parallelized algorithms being performed by data-parallel devices, such as graphics processing units (GPUs)”, and lines 9-10, “the data parallel device…for data to be loaded or stored”. Accordingly, a conclusion that the steps of “data elements stored in the shared memory” is well understood, routine, conventional activity is supported under Berkheimer option 1.
Again, “sorting, the data elements” is identified as abstract idea (i.e., Mental processes). Therefore, Applicant’s argument has not been found to be persuasive.

As to point (n), Applicant attempts to allege the Blomgren does not teach sorting data elements loaded in register rows of a first subset of slabs in a descending order by relying upon drawing Fig. 16A. However, the claim fails to recites all the details about how the data elements are sorted in a descending order as indicated in the Fig. 16A (i.e., the first data element in each of the SLABs of the first subset (see circled "3" in SLABI and circled "53" in SLAB2) are sorted in a descending order and stored in the first column of the shared memory 1680). In fact, the claim only recites sorting data elements loaded in register rows of a first subset of slabs in a descending order. And this is clearly taught by Blomgren. 
For example, Blomgren teaches a system that performing the data elements rearrangement that including: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement operation. As indicated in FIG. 9 of Blomgren, the final state of matrix registers 140, 142, 144 and 146 are sorted. For example, matrix registers 140, row 1, (0,0 0,1 0,2 0,3 ) matrix registers 142, row 1 (0,4 0,5 0,6 0,7) (as the descending order); [See specs: [0100]: “As shown in Figure 16A, the first half of the processor groups (i.e., Processor Groups 1 and 2,) may add the data elements stored in their respective slabs in a descending order by register row to the shared memory 1680 (e.g., left to right starting from the top left)”]) (see Blomgren, Figs 6-9 (as including descending order and second ordering processing for sorting); [0037] lines 16-20; [0047] lines 2-8; [0048] lines 1-4; [0049] lines 1-4; [0050] lines 1-2, FIG. 9 shows the final state of the matrix registers 140, 142, 144, and 146 at the end of the block4 operation; See Fig. 9, 140, 142, 144 and 146). Please refers to 103 rejection above. Therefore, Applicant’s argument has not been found to be persuasive.

As to point (O), Examiner would like to point out that the rejection is based on 103 rejection using multiple references. Examiner used Blomgren for teaching "sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a second ordering process (Blomgren, Figs 6-9 (as including descending order and second ordering processing for sorting); [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector, select, select row, select column, transpose, shift row/column, matrix multiply, sum row, sum column, and block rearrangement (as sorting) operation; [0047] lines 2-8, The block4 instruction is implemented on 4 contiguous matrix registers and exchanges row data between the four matrix registers 140, 142, 144, and 146 in 3 steps. In step 1, the block4 operation swaps matrix register 140, row 1 with matrix register 142, row 0; and matrix register 144, row 3 with matrix register 146, row 2. This to exchange is performed by simultaneously (as sorting first subset of the plurality of slabs of registers in descending order; also see Fig. 9;  (140, row 1, 142 row 1, 144 ro 1 and 146 row 1); [See specs: [0100]: “As shown in Figure 16A, the first half of the processor groups (i.e., Processor Groups 1 and 2,) may add the data elements stored in their respective slabs in a descending order by register row to the shared memory 1680 (e.g., left to right starting from the top left)”]); [0048] lines 1-4, FIG. 7 shows step 2 of the block4 instruction where the block4 operation swaps matrix register 140, row 2 with matrix register 144, row 0; and matrix register 142, row 3 with matrix register 146, row 1. This is performed by simultaneously; [0049] lines 1-4, FIG. 8 shows step 3 of the block4 instruction where the block4 operation swaps matrix register 140, row 3 with matrix register 146, row 0; and matrix register 142, row 2 with matrix register 144, row 1. This is performed by simultaneously (as sorting second subset of the plurality of slabs of registers with a second ordering process); [0050] lines 1-2, FIG. 9 shows the final state of the matrix registers 140, 142, 144, and 146 at the end of the block4 operation; See Fig. 9, 140, 142, 144 and 146 [Examiner noted: the final state of matrix registers 140, 142, 144 and 146 are sorted. For example, matrix registers 140, row 2 is sorted from  (0,4 0,5 0,6 0,7) in Fig. 5 to (1,0, 1,1, 1,2 1,3) in Fig. 9]) and
storing, by the group of parallel processors, the sorted data elements in the shared memory (Blomgren, [0037] lines 16-20, The operations performed by the matrix unit 16 of this invention include: load/store matrix or vector…and block rearrangement operation; [0050] lines 3-12, the contents of the four matrix registers have been rearranged from four 1x16 vectors (at the beginning of the swapping) to four contiguous 4x4 sub-matrices, as illustrated by the 4x4 sub-matrices 160, 162, 164, and 166 in AxB matrix 120. Since all of the matrix data rearrangement is based upon swap operations…which is suitable for storing back to memory).

Blomgren’s data element sorting/arrangement mechanism does not recite that the second ordering process is a reverse and ascending order. However, Sano teaches the second ordering process is a reverse and ascending order (Sano, Fig. 4; [0030] lines 9-13, the four key buttons 50-13 to 50-16 in the fourth row (the bottom row) are orderly numbered, from right to left, 13, 14, 15 and 16, so that the numbers added for the key buttons in each row are ascended in reverse direction [See specs [0100], add the data elements stored in their respective slabs a reverse and ascending order by register row (e.g., right to left starting from the bottom right)]).
Please refers to 103 rejection above. Therefore, Applicant’s argument has not been found to be persuasive.

As to point (p), Examiner would like to point out that Applicant is attacking the references individually. Examiner would like to remind Applicant that the rejection is based on 103 rejection using multiple references. The limitation of “sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order" and "storing, by the group of parallel processors, the sorted data elements in the shared memory” were taught by Blomgren, Sano and Nordquist. Please refers to point (o) above. 

As to point (q), Again, Examiner would like to point out that Applicant is attacking the references individually. Examiner would like to remind Applicant that the rejection is based on 103 rejection using multiple references. The limitation of “sorting, by the group of parallel processors, the data elements loaded into the register rows of a first subset of the plurality of slabs of registers in a descending order, and the data elements loaded into the register rows of a second subset of the plurality of slabs of registers in a reverse and ascending order" and "storing, by the group of parallel processors, the sorted data elements in the shared memory” were taught by Blomgren, Sano and Nordquist. Please refers to point (o) above. 
To the extent that applicants are arguing against the references individually, the examiner reminds the applicants that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). 
In addition, Examiner would like to point out that the claim fails to specifically recites the details about how the data elements are sorted in a reverse and ascending order. Examiner would like direct Applicant to specification paragraph [0100], the paragraph [0100] states that “add the data elements stored in their respective slabs a reverse and ascending order by register row (e.g., right to left starting from the bottom right)”, this is clearly taught by Sano. Sano clearly teaches the second ordering process is a reverse and ascending order (see Sano, Fig. 4; [0030] lines 9-13, the four key buttons 50-13 to 50-16 in the fourth row (the bottom row) are orderly numbered, from right to left, 13, 14, 15 and 16, so that the numbers added for the key buttons in each row are ascended in reverse direction). Please refers to 103 rejection above. 

For the reasons above, Applicant’s argument has not been found to be persuasive, and therefore the rejections are maintained. 


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        

/Z.X./Examiner, Art Unit 2195