Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
2. 	This Office Action is taken in response to Applicants’ Amendments and Remarks filed on 7/14/2022 regarding application 16/986,643 filed on 8/6/2020.  
 	Claims 1, 3-8, 10-15, and 17-23 are pending for consideration.

3.				Response to Amendments and Remarks 
	Applicants’ amendments and remarks have been fully and carefully considered, with the Examiner’s response set forth below.
	(1) In response to the amendments and remarks, an updated claim analysis has been made. Refer to the corresponding sections of the following Office Action for details.

4.					Examiner’s Note
(1) In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution.  MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.”  Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R.  1.131(b), (c), (d), and (h) and therefore held not fully responsive.  Generic statements such as “Applicants believe no new matter has been introduced” may be deemed insufficient.
(2) Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1, 8, 15, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrara et al. (US Patent Application Publication 2014/0164745, hereinafter Mehrara), in view of Wei (US Patent Application Publication 2010/0138612, hereinafter Wei), and further in view of DeVale et al. (US Patent Application Publication 2006/0010292, hereinafter DeVale).
	As to claim 1, Mehrara teaches A method comprising: 
executing a first work-group on a processor [as shown in figure 2, PPU (220), and a plurality of GPCs (208); the Streaming Multiprocessors (SM) as shown in figure 3, 310; The series of instructions transmitted to a particular GPC 208 constitutes a thread, as previously defined herein, and the collection of a certain number of concurrently executing threads across the parallel processing engines (not shown) within an SM 310 is referred to herein as a "warp" or "thread group." As used herein, a "thread group" refers to a group of threads concurrently executing the same program on different input data, with one thread of the group being assigned to a different processing engine within an SM 310 … (¶ 0048-0049)], wherein the first work-group comprises a plurality of work-items that are executed in parallel to perform a defined function [The series of instructions transmitted to a particular GPC 208 constitutes a thread, as previously defined herein, and the collection of a certain number of concurrently executing threads across the parallel processing engines (not shown) within an SM 310 is referred to herein as a "warp" or "thread group." As used herein, a "thread group" refers to a group of threads concurrently executing the same program on different input data, with one thread of the group being assigned to a different processing engine within an SM 310 … (¶ 0048-0049); Wei also teaches this limitation – as shown in figure 3, where a group of service processing units are concurrently preforming the operations of accessing a shared cache memory], wherein the processor comprises internal registers files [local register file, figure 3, 304; Local Register File (LRF), figure 4, 404(0), 404(1), ,,,, 404(x)], wherein the processor uses a cache hierarchy including a lowest level cache and at least one other cache [L1 cache, figure 3, 320; L1.5 cache, figure 3, 208; … Each SM 310 includes an instruction L1 cache 370 that is configured to receive instructions and constants from memory via an L1.5 cache (not shown) within the GPC 208 … (¶ 0047)], determining a space requirement to store a shared local memory region that is shared by the plurality of work-items of the first work-group [local memory as shown in figure 2, 204, the PP memory; shared memory, figure 3, 306; also shown in figure 3, 306, the shared memory; Wei more expressively teaches determining a space requirement to store a local memory content – as shown in figure 3, where a group of service processing units are concurrently preforming the operations of accessing a shared cache memory; According to still another embodiment of the present invention, in a method for implementing cache sharing based on the above system, a first service processing unit initiates a message for allocating a cache space; the message includes: the first service processing unit and a second service processing unit which are members sharing the cache space, and a size of the cache space. The method includes: after receiving the message, issuing, by the main controller to the shared cache unit, a command of allocating the cache space; after receiving the command, transmitting, by the shared cache unit to the main controller, information of the cache space and reading and writing authorities of the members sharing the cache (¶ 0013-0015); FIG. 7 is a flowchart illustrating an implementation of high-speed data sharing. The utilization of the shared cache unit is not fixed. Instead, the shared cache unit is requested according to requirements. For example, if a service processing unit 1 initiates a data visit to service processing units 3 and 4, it request the main control unit for a shared cache unit after defining the size of the required cache space and the format of exchanged data. The implementation includes the following … Step s702: The main control unit receives the request message, determines whether the shared cache unit has enough space; if enough, proceed to Step s704; otherwise, proceed to Step s703 (¶ 0090-0092)], wherein the shared local memory region is only accessible by the plurality of work-items included in the first work-group [as shown in figure 3, where the shared memory (306) is designated to a plurality of execute units (3020(0)-302(N-1)), exclusively; A sequence of per-thread instructions may include at least one instruction that defines a cooperative behavior between the representative thread and one or more other threads of the thread array. For example, the sequence of per-thread instructions might include an instruction to suspend execution of operations for the representative thread at a particular point in the sequence until such time as one or more of the other threads reach that particular point, an instruction for the representative thread to store data in a shared memory to which one or more of the other threads have access, an instruction for the representative thread to atomically read and update data stored in a shared memory to which one or more of the other threads have access based on their thread IDs, or the like … (¶ 0051); Shared memory 306 is accessible to threads within a single CTA; in other words, any location in shared memory 306 is accessible to any thread within the same CTA (or to any processing engine within SM 310). Shared memory 306 can be implemented as a shared register file or shared on-chip cache memory with an interconnect that allows any processing engine to read from or write to any location in the shared memory … (¶ 0058);
Wei more expressively teaches determining a space requirement to store a local memory content – as shown in figure 3, where a group of service processing units are concurrently preforming the operations of accessing a shared cache memory; According to still another embodiment of the present invention, in a method for implementing cache sharing based on the above system, a first service processing unit initiates a message for allocating a cache space; the message includes: the first service processing unit and a second service processing unit which are members sharing the cache space, and a size of the cache space. The method includes: after receiving the message, issuing, by the main controller to the shared cache unit, a command of allocating the cache space; after receiving the command, transmitting, by the shared cache unit to the main controller, information of the cache space and reading and writing authorities of the members sharing the cache (¶ 0013-0015)];
determining whether an available space in the internal register files is sufficient to store the shared local memory region [this limitation is taught by DeVale – as shown in figure 7, steps 710-721; A technique to use available register cache resources if register file resources are unavailable. Embodiments of the invention pertain to a register cache writeback algorithm for storing writeback data to a register cache if register file write ports or space is unavailable (abstract)], and in response to a determination that the available space in the internal register files is not sufficient to store the shared local memory region [this limitation is taught by DeVale – as shown in figure 7, steps 710-721; A technique to use available register cache resources if register file resources are unavailable. Embodiments of the invention pertain to a register cache writeback algorithm for storing writeback data to a register cache if register file write ports or space is unavailable (abstract)], storing a first portion of the shared local memory region in the available space in the internal register files [this limitation is taught by DeVale – as shown in figure 7, steps 710-721; A technique to use available register cache resources if register file resources are unavailable. Embodiments of the invention pertain to a register cache writeback algorithm for storing writeback data to a register cache if register file write ports or space is unavailable (abstract); FIG. 7 is a flow diagram illustrating a decision criteria to determine whether to store write data to a register file or register cache, according to one embodiment of the invention. For example, in FIG. 7, if at operation 701 there there is no space in the register file or no available write ports, an attempt is made at operation 710 to write the data to the register cache. However, if there are no unlocked entries in the register cache, another attempt is made to write the data to the register file at operation 701. If there are no unlocked available entries in the register cache and there are no available write ports or space in the register file, the embodiment may stall. Furthermore, in some embodiments, operations 701 and 710 may occur in parallel … (¶ 0044-0048)], and storing a second portion of the shared local memory region in the lowest level cache [this limitation is taught by DeVale – as shown in figure 7, steps 710-721; A technique to use available register cache resources if register file resources are unavailable. Embodiments of the invention pertain to a register cache writeback algorithm for storing writeback data to a register cache if register file write ports or space is unavailable (abstract); FIG. 2 illustrates a shared bus computer system in which at least one embodiment of the invention may be used. The shared bus computer system of FIG. 2 contains a processor 205, a level one (L1) cache memory 210, and main memory 215. In other embodiments of the invention, the cache memory may be a level two (L2) cache or other memory within a computer system memory hierarchy. The processor and cache reside on the shared bus 207. Also illustrated within the processor of FIG. 2 is one embodiment of the invention 206. Other embodiments of the invention, however, may be implemented within other devices within the system, such as a separate bus agent, or distributed throughout the system in hardware, software, or some combination thereof (¶ 0030); FIG. 7 is a flow diagram illustrating a decision criteria to determine whether to store write data to a register file or register cache, according to one embodiment of the invention. For example, in FIG. 7, if at operation 701 there there is no space in the register file or no available write ports, an attempt is made at operation 710 to write the data to the register cache. However, if there are no unlocked entries in the register cache, another attempt is made to write the data to the register file at operation 701. If there are no unlocked available entries in the register cache and there are no available write ports or space in the register file, the embodiment may stall. Furthermore, in some embodiments, operations 701 and 710 may occur in parallel … (¶ 0044-0048)].
	Regarding claim 1, Mehrara teaches a local memory [local memory as shown in figure 2, 204, the PP memory; also shown in figure 3, 306, the shared memory, but does not expressively teach determining a space requirement to store a local memory content for the first work-group.
	However, it is a well-known and commonly used practice in the art to a space requirement to store data in a local shared memory to ensure that there is sufficient space available for accommodate new data without overwriting the existing data.
	For example, Wei specifically teaches determining a space requirement to store data in a local shared cache memory [as shown in figure 3, where a group of service processing units are concurrently preforming the operations of accessing a shared cache memory; According to still another embodiment of the present invention, in a method for implementing cache sharing based on the above system, a first service processing unit initiates a message for allocating a cache space; the message includes: the first service processing unit and a second service processing unit which are members sharing the cache space, and a size of the cache space. The method includes: after receiving the message, issuing, by the main controller to the shared cache unit, a command of allocating the cache space; after receiving the command, transmitting, by the shared cache unit to the main controller, information of the cache space and reading and writing authorities of the members sharing the cache (¶ 0013-0015); FIG. 7 is a flowchart illustrating an implementation of high-speed data sharing. The utilization of the shared cache unit is not fixed. Instead, the shared cache unit is requested according to requirements. For example, if a service processing unit 1 initiates a data visit to service processing units 3 and 4, it request the main control unit for a shared cache unit after defining the size of the required cache space and the format of exchanged data. The implementation includes the following … Step s702: The main control unit receives the request message, determines whether the shared cache unit has enough space; if enough, proceed to Step s704; otherwise, proceed to Step s703 (¶ 0090-0092)].
	Therefore, it would have been obvious for one of ordinary skills in the art prior to Applicant’s invention to determine a space requirement to store a local memory content, as demonstrated by Wei, and to incorporate it into the existing scheme disclosed by Mehrara, to ensure that there is sufficient space available for accommodate new data without overwriting the existing data.
Further regarding claim 1, Mehrara in view of Wei does not teach determining whether an available space in the internal register files is sufficient to store the local memory content for the first work-group, and in response to a determination that the available space in the internal register files is not sufficient to store the shared local memory region: storing a first portion of the shared local memory region in the available space in the internal register files, and storing a second portion of the shared local memory region in the lowest level cache.
However, DeVale specifically teaches the cited limitations [as shown in figure 7, steps 710-721; A technique to use available register cache resources if register file resources are unavailable. Embodiments of the invention pertain to a register cache writeback algorithm for storing writeback data to a register cache if register file write ports or space is unavailable (abstract); FIG. 2 illustrates a shared bus computer system in which at least one embodiment of the invention may be used. The shared bus computer system of FIG. 2 contains a processor 205, a level one (L1) cache memory 210, and main memory 215. In other embodiments of the invention, the cache memory may be a level two (L2) cache or other memory within a computer system memory hierarchy. The processor and cache reside on the shared bus 207. Also illustrated within the processor of FIG. 2 is one embodiment of the invention 206. Other embodiments of the invention, however, may be implemented within other devices within the system, such as a separate bus agent, or distributed throughout the system in hardware, software, or some combination thereof (¶ 0030); FIG. 7 is a flow diagram illustrating a decision criteria to determine whether to store write data to a register file or register cache, according to one embodiment of the invention. For example, in FIG. 7, if at operation 701 there there is no space in the register file or no available write ports, an attempt is made at operation 710 to write the data to the register cache. However, if there are no unlocked entries in the register cache, another attempt is made to write the data to the register file at operation 701. If there are no unlocked available entries in the register cache and there are no available write ports or space in the register file, the embodiment may stall. Furthermore, in some embodiments, operations 701 and 710 may occur in parallel … (¶ 0044-0048)].
Therefore, it would have been obvious for one of ordinary skills in the art prior to Applicant’s invention to determine whether an available space in the internal register files is sufficient to store the local memory content for the first work-group, and in response to a determination that the available space in the internal register files is sufficient to store the local memory content for the first work-group, storing the local memory content in the available space in the internal register files, as demonstrated by DeVale, and to incorporate it into the existing scheme disclosed by Mehrara in view of Wei, to ensure that there is sufficient space available in the register file for accommodate new data without overwriting the existing data.
	As to claims 8, they recite substantially the same limitations as in claim 1, and are rejected by the same reasons as claim 1. Refer to “As to claim 1” presented earlier in this Office Action for details.
As to claims 15, they recite substantially the same limitations as in claim 1, and are rejected by the same reasons as claim 1. Refer to “As to claim 1” presented earlier in this Office Action for details.
As to claims 21, Mehrara in view of Wei & DeVale  teaches The system of claim 15, the processor to: in response to a determination that the available space in the internal register files is sufficient to store the shared local memory region, store all of the shared local memory region in the available space in the internal register files [DeVale -- as shown in figure 7, steps 710-721; A technique to use available register cache resources if register file resources are unavailable. Embodiments of the invention pertain to a register cache writeback algorithm for storing writeback data to a register cache if register file write ports or space is unavailable (abstract); FIG. 7 is a flow diagram illustrating a decision criteria to determine whether to store write data to a register file or register cache, according to one embodiment of the invention. For example, in FIG. 7, if at operation 701 there there is no space in the register file or no available write ports, an attempt is made at operation 710 to write the data to the register cache. However, if there are no unlocked entries in the register cache, another attempt is made to write the data to the register file at operation 701. If there are no unlocked available entries in the register cache and there are no available write ports or space in the register file, the embodiment may stall. Furthermore, in some embodiments, operations 701 and 710 may occur in parallel … (¶ 0044-0048)].
As to claims 22, they recite substantially the same limitations as in claim 21, and are rejected by the same reasons as claim 21. Refer to “As to claim 21” presented earlier in this Office Action for details.
As to claims 23, they recite substantially the same limitations as in claim 21, and are rejected by the same reasons as claim 21. Refer to “As to claim 21” presented earlier in this Office Action for details.

6.	Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrara in view of Wei & DeVale, and further in view of Doerr et al. (US Patent Application Publication 2014/0351551, hereinafter Doerr).
	As to claim 3, Mehrara in view of Wei & DeVale teaches a work group [Mehrara -- The series of instructions transmitted to a particular GPC 208 constitutes a thread, as previously defined herein, and the collection of a certain number of concurrently executing threads across the parallel processing engines (not shown) within an SM 310 is referred to herein as a "warp" or "thread group." As used herein, a "thread group" refers to a group of threads concurrently executing the same program on different input data, with one thread of the group being assigned to a different processing engine within an SM 310 … (¶ 0048-0049)], but does not teaches the work group is an OpenCL work group.
	However, OpenCL work groups are well known and commonly used in the art.
	For example, Doerr specifically teaches an OpenCL work group [The Open Computing language (OpenCL) is a framework for writing programs with the objective to enable execution across heterogeneous platforms comprising central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors. It is designed to support close-to-hardware interface with limited abstraction … The language is extended to support parallelism with vector types and operations, synchronization, and functions to work with work-items/groups. An application programming interface (API) is used to define and then control the platform. OpenCL, at a course-level, supports parallel computing using task-based and data-based parallelism (¶ 0022-0023)].
	Therefore, it would have been obvious for one of ordinary skills in the art prior to Applicant’s invention to use an OpenCL work group, as demonstrated by Doerr, and to incorporate it into the existing scheme disclosed by Mehrara in view of Wei & DeVale, because Doerr teaches doing so would support parallel operations [… The language is extended to support parallelism with vector types and operations, synchronization, and functions to work with work-items/groups. An application programming interface (API) is used to define and then control the platform. OpenCL, at a course-level, supports parallel computing using task-based and data-based parallelism (¶ 0022-0023)].
As to claims 10, they recite substantially the same limitations as in claim 3, and are rejected by the same reasons as claim 3. Refer to “As to claim 3” presented earlier in this Office Action for details.
As to claims 17, they recite substantially the same limitations as in claim 3, and are rejected by the same reasons as claim 3. Refer to “As to claim 3” presented earlier in this Office Action for details.
 7.	Claims 4, 6-7, 11, 13-14, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrara in view of Wei & DeVale, and further in view of Vaidya et al. (US Patent Application Publication 2014/0181477, hereinafter Vaidya).
	As to claim 4, Mehrara in view of Wei & DeVale does not teaches detecting a non-aligned write as recited in the claim.
	However, Vaidya teaches the cited limitations. Specifically, Vaidya teaches detecting a non-aligned write from a first register location in a source register to a second register location in a destination register, wherein the first register location is not aligned with the second register location [as shown in figure 4, where register 50a is the first/source register and register 50b is the second/destination register; SCC in accordance with an embodiment of the present invention is shown in FIG. 4.  In an embodiment, rearranging channel positions is done through operand swizzling (permutation) hardware prior to being dispatched to the execution pipeline.  In turn, destination operand positions are correspondingly unswizzled prior to writeback to the register file or other portion of a memory hierarchy … (¶ 0034-0036)]; in response to the detected non-aligned write, generating a plurality of instructions based on permutation group theory; and transforming the non-aligned write using the generated plurality of instructions [as shown in figure 4, where a plurality of SIMD4 instructions are generated in a pipeline fashion during the following cycles; SCC in accordance with an embodiment of the present invention is shown in FIG. 4.  In an embodiment, rearranging channel positions is done through operand swizzling (permutation) hardware prior to being dispatched to the execution pipeline.  In turn, destination operand positions are correspondingly unswizzled prior to writeback to the register file or other portion of a memory hierarchy … (¶ 0034-0036)].
	Therefore, it would have been obvious for one of ordinary skills in the art prior to Applicant’s invention to detect and handle a non-aligned write as recited in claim 6, as demonstrated by Vaidya, and to incorporate it into the existing scheme disclosed by Mehrara in view of Wei & DeVale, because Vaidya teaches doing so allows taking advantage of cycle compression [Note that some divergence patterns do not favor BCC. In particular, when disabled channels in an instruction are not contiguous, or are contiguous but not favorably aligned to the SIMD pipeline width, BCC cannot be used to take advantage of cycle compression opportunities … (¶ 0031-0036)].
	As to claim 6, Mehrara in view of Wei & DeVale & Vaidya teaches The method of claim 4 wherein generating the plurality of instructions comprises generating log (k-1) instructions to transform the source register, wherein k is a value calculated using permutation group theory [Vaidya -- as shown in figure 4, where a plurality of SIMD4 instructions are generated in a pipeline fashion during the following cycles; SCC in accordance with an embodiment of the present invention is shown in FIG. 4.  In an embodiment, rearranging channel positions is done through operand swizzling (permutation) hardware prior to being dispatched to the execution pipeline.  In turn, destination operand positions are correspondingly unswizzled prior to writeback to the register file or other portion of a memory hierarchy … (¶ 0034-0036)].
	As to claim 7, Mehrara in view of Wei & DeVale & Vaidya teaches The method of claim 6, wherein k is a value calculated using permutation group theory [Vaidya -- as shown in figure 4, where a plurality of SIMD4 instructions are generated in a pipeline fashion during the following cycles; SCC in accordance with an embodiment of the present invention is shown in FIG. 4.  In an embodiment, rearranging channel positions is done through operand swizzling (permutation) hardware prior to being dispatched to the execution pipeline.  In turn, destination operand positions are correspondingly unswizzled prior to writeback to the register file or other portion of a memory hierarchy … (¶ 0034-0036)].
As to claims 11, they recite substantially the same limitations as in claim 4, and is rejected by the same reasons as claim 4. Refer to “As to claim 4” presented earlier in this Office Action for details.
As to claims 13, they recite substantially the same limitations as in claim 6, and is rejected by the same reasons as claim 6. Refer to “As to claim 6” presented earlier in this Office Action for details.
As to claims 14, they recite substantially the same limitations as in claim 7, and is rejected by the same reasons as claim 7. Refer to “As to claim 7” presented earlier in this Office Action for details.
	As to claims 18, they recite substantially the same limitations as in claim 4, and are rejected by the same reasons as claim 4. Refer to “As to claim 4” presented earlier in this Office Action for details.
	As to claims 20, they recite substantially the same limitations as in claim 6, and is rejected by the same reasons as claim 6. Refer to “As to claim 6” presented earlier in this Office Action for details.
8.	Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mehrara in view of Wei & DeVale & Vaidya, and further in view of Doerr et al. (US Patent Application Publication 2014/0351551, hereinafter Doerr).
	As to claim 5, Mehrara in view of Wei & DeVale & Vaidya teaches a work group [Mehrara -- The series of instructions transmitted to a particular GPC 208 constitutes a thread, as previously defined herein, and the collection of a certain number of concurrently executing threads across the parallel processing engines (not shown) within an SM 310 is referred to herein as a "warp" or "thread group." As used herein, a "thread group" refers to a group of threads concurrently executing the same program on different input data, with one thread of the group being assigned to a different processing engine within an SM 310 … (¶ 0048-0049)], but does not teaches the work group is an OpenCL compiler.
	However, OpenCL compilers are well known and commonly used in the art.
	For example, Doerr specifically teaches an OpenCL compiler [The Open Computing language (OpenCL) is a framework for writing programs with the objective to enable execution across heterogeneous platforms comprising central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors. It is designed to support close-to-hardware interface with limited abstraction … The language is extended to support parallelism with vector types and operations, synchronization, and functions to work with work-items/groups. An application programming interface (API) is used to define and then control the platform. OpenCL, at a course-level, supports parallel computing using task-based and data-based parallelism (¶ 0022-0023); The source code may be processed, e.g., by a compiler or other tool, including analyzing the information representing the multiple views specified or defined for the system. For example, in one embodiment, the compiler may be configured to recognize the information representing the multiple views in the application source code, and may extract and analyze the information. In other embodiments, the compiler may analyze the information in situ … (¶ 0045-0046)].
	Therefore, it would have been obvious for one of ordinary skills in the art prior to Applicant’s invention to use an OpenCL compiler, as demonstrated by Kyo, and to incorporate it into the existing scheme disclosed by Mehrara in view of Wei & DeVale & Vaidya, because Doerr teaches doing so would support parallel operations [… The language is extended to support parallelism with vector types and operations, synchronization, and functions to work with work-items/groups. An application programming interface (API) is used to define and then control the platform. OpenCL, at a course-level, supports parallel computing using task-based and data-based parallelism (¶ 0022-0023)].
	As to claims 12, they recite substantially the same limitations as in claim 5, and are rejected by the same reasons as claim 5. Refer to “As to claim 5” presented earlier in this Office Action for details.
	As to claims 19, they recite substantially the same limitations as in claim 5, and are rejected by the same reasons as claim 5. Refer to “As to claim 5” presented earlier in this Office Action for details.

Conclusion
9.	Claims 1, 3-8, 10-15, and 17-23 are rejected as explained above. 
10. 	THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHENG JEN TSAI whose telephone number is 571-272-4244.  The examiner can normally be reached on Monday-Friday, 9-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on 571-272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/SHENG JEN TSAI/Primary Examiner, Art Unit 2136                                                                                                                                                                                                        
July 29, 2022