DETAILED ACTION
Response to Amendment
This communication is responsive to the amendment filed on 5/31/2022.  Claims 1-20 are pending and have been examined.  Claims 5, 8, 13 and 16 have been amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


3.	Claims 1, 5-6, 9, 13-14 and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference “Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs” hereby referred as Crago, Zamsky, PGPUB No.:  2015/0032929 (cited on PTO-892 filed on 7/30/2021) and further in view of Orenstien, PGPUB No.: 2009/0172365.

	In regards to claim 1, Crago teaches “An apparatus” (See Fig. 1(a-b) and section 2:  wherein a system (apparatus) comprising a stream multiprocessor (SM) is disclosed) “comprising: a load store unit” (See Figs. 1(b)/(c) and section 2:  wherein a stream multiprocessor (SM) includes a load/store unit) “wherein the load store unit is configured to perform a gather operation, based on a memory address, to concurrently gather a plurality of subsets of data from a memory” (See section 2 and sections 4.1-4.2:  wherein  the load store unit performs a ld.indirect (gather) operation which concurrently gathers a plurality of subsets of data from memory, based on a memory address, because the instruction is executed in a SIMT environment which executes a single instruction across a plurality of slice/threads concurrently.  Wherein the load store unit is made of a plurality of slices and each slice gathers a subset of data from memory (see section 6 and Figure 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors and displays the output of an LSU slice including a memory address for more clarity)) “wherein the gather operation load data” (See section 2 and sections 4.1-4.2:  wherein  the load store unit performs a ld.indirect (gather) operation which loads data) “and a register that is partitioned into a plurality of portions to hold the plurality of subsets of data provided by the load store unit.” (See Section 4.2:  wherein a destination vector register is used to hold the plurality of subsets of data provided by the load/store unit.  Wherein a vector register is known in the art to store a plurality of elements and each element is stored in a partitioned portion of that register.  For example, if a register is a 128-bit register with 32-bit elements, it can be considered a register partitioned into 4 portions each including one 32-bit element (see Figs. 1(b)/(c) for details of the load/store unit))
	Crago does not explicitly teach “a plurality of load buses; load/store unit comprising a plurality of load ports to access the plurality of load buses”, “wherein the load store unit is configured to perform a gather operation from memory via the plurality of load buses in a first mode” nor “loads data via each of the plurality of buses”.  Crago teaches a load/store unit, including a plurality of slices, for performing a gather operation (see Figs. 1(b-c)), and one of ordinary skill in the art would see Fig. 1(c) and see a single slice of the LSU including input and output arrows indicating data flow on a data bus.  However, Crago does not explicitly illustrate that each of the slices includes a plurality of data buses used to load the data. Therefore, to any extent that Crago does not teach load ports used to access load buses in order to gather the data another reference is brought in for that teaching.
	Zamsky teaches “a plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a LSU uses a plurality of load buses (read buses elements 24 and 32)) “load/store unit comprising a plurality of load ports to access the plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) comprises two load ports to access the plurality of load buses (elements 24 and 32) (See Fig. 1:  wherein the load/store unit (combination of elements 24 and 32) include two ports in order to access read buses (elements 24 and 32)) “loads data via each of the load buses” ([0023 and Fig. 1]:  wherein read buses (elements 24 and 32) are used to load data from memory)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to include load ports and load buses in order to gather data from memory using a load/store unit as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load ports and load buses to gather data from memory as taught in Zamsky) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses load ports and buses in conjunction with a load/store unit in order to gather data from memory) for the benefit of simplifying memory access circuitry and reducing hardware cost by using dual ports and buses.  (MPEP 2143, Example D) Furthermore, it would have been beneficial to use separate buses to perform a load that loads multiple pieces of data in parallel because it can provide improved bandwidth and speed up processing (See Zamsky [0024 and 0034]).
	The overall combination of Crago and Zamsky does not teach “perform a gather operation from memory in a first mode”.  The overall combination of Crago and Zamksy teaches performing a gather operation but does not explicitly teach performing the gather in a particular mode of operation.  Therefore, another reference is brought in for that teaching.
	Orenstien teaches “perform a gather operation from memory in a first mode”. ([0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the gather operation of Crago to be performed speculatively as the gather operation of Orenstien.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (speculatively executing a gather operation) to a known device (apparatus of Crago which performs a gather operation) ready for improvement to yield predictable results (an apparatus which speculatively performs a gather operation) for the benefit of improving processor performance by performing speculative execution (Orenstien [0029]).  (MPEP 2143, Example D)

	Claim 9 is similarly rejected on the same basis as claim 1 above as claim 9 is the method claim corresponding to the apparatus of claim 1 above.  (Note:  claim 9 includes an additional limitation stating “a load store unit in a floating-point unit (FPU)”.  The examiner notes that the base reference Crago teaches the load store unit is in a stream multiprocessor (wherein this processor can be considered a floating-point unit because it operates on floating point data as discussed in sections 4.2, 8 and shown in fig. 1 (b)).  Therefore, the above combination teaching claim 1 also similarly teaches claim 9.)

	In regards to claim 5, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 1” (see rejection of claim 1 above) “wherein the load store unit is configured to ignore exceptions or faults while performing the gather operation in the first mode.” (Orenstien [0029 and 0034-0035]:  wherein in a first speculative mode page faults are ignored, while executing a load operation in the speculative mode of execution; and are handled in a non-speculative mode (See Fig. 2, for clarity see element 260) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 1 and the overall combination of references teaches the above limitation.  Also note examiner is interpreting claims as indicated previously in 112(b) rejection above)) 

	Claim 13 is similarly rejected on the same basis as claim 5 above as claim 13 is the method claim corresponding to the apparatus of claim 5 above.  

	In regards to claim 6, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 5” (see rejection of claim 5 above) “wherein the load store unit is configured to transition from the first mode to a second mode in response to an exception or fault occurring while performing the gather operation.” (Orenstien [0029 and 0034-0035]:  wherein if a page fault occurs in a first speculative mode of operation, while performing a load operation, then the mode of operation transitions as control passes to a non-speculative mode of execution (See Fig. 2) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 

	Claim 14 is similarly rejected on the same basis as claim 6 above as claim 14 is the method claim corresponding to the apparatus of claim 6 above.  

	In regards to claim 17, Crago teaches “An apparatus” (See Fig. 1(b) and section 2:  wherein a stream multiprocessor (SM)) “comprising: a load store unit configured to perform a gather operation” (See Figs. 1(b)/(c), section 2 and 4.1-4.2:  wherein a stream multiprocessor (SM) includes a load/store unit.  Wherein the load store unit performs a ld.indirect (gather) operation) “wherein the load store unit is configured to concurrently gather, based on a memory address, a plurality of subsets of data from a memory” (See section 2 and sections 4.1-4.2:  wherein  the load store unit performs a ld.indirect (gather) operation which concurrently gathers a plurality of subsets of data from memory, based on a memory address, because the instruction is executed in a SIMT environment which executes a single instruction across a plurality of slice/threads concurrently.  Wherein the load store unit is made of a plurality of slices and each slice gathers a subset of data from memory (see section 6 and Figure 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors and displays the output of an LSU slice including a memory address for more clarity)) “and a register configured to store the data provided by the load store unit.” (See Section 4.2:  wherein a destination vector register is used to hold the plurality of subsets of data provided by the load/store unit (see Figs. 1(b)/(c) for details of the load/store unit))
	Crago does not explicitly teach “a plurality of load buses”, “perform a gather operation selectively in a first mode or a second mode”, “and gather a plurality of subsets of data from a memory via the plurality of load buses in the first mode”, “loads data via each of the plurality of load buses” nor “wherein the load store unit is configured to gather the plurality of subsets of data from the memory using partial updating in the second mode”.  Crago teaches a load/store unit, including a plurality of slices, for performing a gather operation (see Figs. 1(b-c)), and one of ordinary skill in the art would see Fig. 1(c) and see a single slice of the LSU including input and output arrows indicating data flow on a data bus.  However, Crago does not explicitly illustrate that each of the slices includes a plurality of data buses used to load the data. Therefore, to any extent that Crago does not teach load ports used to access load buses in order to gather the data another reference is brought in for that teaching.
	Zamsky teaches “a plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a LSU uses a plurality of load buses (read buses elements 24 and 32)) “and gather data from a memory via the plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) gathers data from memory via the plurality of load buses (elements 24 and 32)) “loads data via each of the plurality of load buses” ([0023 and Fig. 1]:  wherein read buses (elements 24 and 32) are used to load data from memory)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to include load buses in order to gather data from memory using a load/store unit as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load buses to gather data from memory as taught in Zamsky) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses load buses in conjunction with a load/store unit in order to gather data from memory) for the benefit of simplifying memory access circuitry and reducing hardware cost by using dual buses.  (MPEP 2143, Example D) Furthermore, it would have been beneficial to use separate buses to perform a load that loads multiple pieces of data in parallel because it can provide improved bandwidth and speed up processing (See Zamsky [0024 and 0034]).
	The overall combination of Crago and Zamsky does not teach “perform a gather operation selectively in a first mode or a second mode”, “and gather data from a memory in the first mode” nor “wherein the load store unit is configured to gather the data from the memory using partial updating in the second mode”.  The overall combination of Crago and Zamsky teaches performing a gather operation but does not explicitly teach performing the gather selectively in one of two modes.  Therefore, another reference is brought in for that teaching.
	Orenstien teaches “perform a gather operation selectively in a first mode or a second mode” ([0029-0035]:  wherein a load (gather) operation is selectively performed in a first speculative mode or a non-speculative mode depending upon if a page fault occurs) “and gather a plurality of subsets of data from a memory in the first mode” ([0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2)) “wherein the load store unit is configured to gather the plurality of subsets of data from the memory using partial updating in the second mode”.  ([0035 and 00432-0043]:  wherein a load (gather) operation gathers data from memory in a non-speculative mode which uses partial updating, wherein a sequence of loads are used to update the destination one load at a time, opposed to concurrent load operations (See fig. 3) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the gather operation of Crago to be able to be performed in one of a speculative or non-speculative mode as the gather operation of Orenstien.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using either a speculative or non-speculative mode to execute a gather operation) to a known device (apparatus of Crago which performs a gather operation) ready for improvement to yield predictable results (an apparatus which speculatively or non-speculatively executes a gather operation) for the benefit of improving processor performance by performing a gather operation speculatively  (Orenstien [0029]). It would also have added flexibility to Crago by allowing a processor to also execute non-speculatively if a fault or exception occurs, in order to efficiently handle the fault or exception.  (MPEP 2143, Example D)

	In regards to claim 18, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 17” (see rejection of claim 17 above) “wherein the load store unit ignores exceptions or faults while performing the gather operation in the first mode.” (Orenstien [0029 and 0034-0035]:  wherein in a first speculative mode page faults are ignored, while executing a load operation, because they are not handled in the speculative mode; and are handled in a non-speculative mode (See Fig. 2) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 17 and the overall combination of references teaches the above limitation)) 

	In regards to claim 19, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 18” (see rejection of claim 18 above) “wherein the load store unit transitions from the first mode to the second mode in response to an exception or fault occurring while performing the gather operation in the first mode.” (Orenstien [0029 and 0034-0035]:  wherein if a page fault occurs in a first speculative mode of operation, while performing a load operation, then the mode of operation transitions as control passes to a non-speculative mode of execution (See Fig. 2) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 17 and the overall combination of references teaches the above limitation)) 

	In regards to claim 20, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 19” (see rejection of claim 19 above) “wherein lanes are dispatched to concurrently perform the gather operation per clock cycle in the first mode” (Crago:  Sections 4.2 and 6.1:  wherein ld.indirect operations are dispatched to all slices (lanes) to concurrently perform the gather operation (see section 6 and Figures 1 and 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors for more clarity) (Note: the overall combination of claim 17 uses Orenstien to teach the first mode and the overall combination of Crago and Orenstien teaches the limitation above (see Orenstien [0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2))) “and a single lane is dispatched to perform the gather operation per clock cycle in the second mode.” (Orenstien [0004 and 0041-0043]:  wherein the load (gather) operation is performed by dispatching a single load (gather) operation per cycle (dispatches instructions in a sequence and therefore instructions are executed sequentially in different clock cycles) in a non-speculative mode (wherein each element of vector can be considered to be stored in a lane; for example, a vector with four elements can be considered to include 4 lanes).  For example, if one of the loads of the sequence causes a fault in the second mode, the sequence restarts in a second iteration based on a fault occurring in a first iteration (See Fig. 3) (Note:  the base reference Crago has already taught the load/store unit with the plurality of lanes in the rejection of claim 17 and the overall combination of references teaches the above limitation)) 
	The overall combination of Crago, Zamsky and Orenstien thus far does not teach “wherein two lanes are dispatched to concurrently perform the operation per clock cycle”.
	Zamsky teaches “wherein two lanes are dispatched to concurrently perform the operation per clock cycle” ([0021-0023 and Fig. 1]:  wherein a load/store unit comprises two load buses used to perform a multiple load operation including two load operations per cycle (See Fig. 1:  wherein the load/store unit includes two read buses (elements 24 and 32) to perform two operations per cycle) (Note:  the overall combination of references teaches two lanes and Zamsky is brought in two teach performing two gather/load operations per clock cycle))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to perform two gather operations per clock cycle as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load/store unit to perform two gather operations per clock cycle) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses a load/store unit to gather two operations per clock cycle) for the benefit of simplifying memory access operations in a processor, by reducing a number of operations that occur each cycle.  (MPEP 2143, Example D)

4.	Claims 2-4 and 10-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crago, Zamsky, Orenstien and further in view of Boettcher, PGPUB No.: 2019/0340054.

	In regards to claim 2, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 1” (see rejection of claim 1 above) “wherein the load store unit comprises a plurality of lanes that are configured to concurrently execute the gather operation to gather the plurality of subsets of data.” (Crago:  See section 2 and sections 4.1-4.2:  wherein the load store unit uses a plurality of slices (lanes) to perform a ld.indirect (gather) operation which concurrently gathers a plurality of subsets of data from memory because the instruction is executed in a SIMT environment which executes a single instruction across a plurality of slice/threads concurrently.  Wherein the load store unit is made of a plurality of slices (lanes) and each slice gathers a subset of data from memory (see section 6 and Figures 1(c)/6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors for more clarity))
	The overall combination of Crago, Zamsky and Orenstien does not teach “wherein unit comprises a plurality of lanes that is partitioned into lane subsets that are configured to concurrently execute.”  Crago does teach a load/store unit which comprises a plurality of lanes which concurrently execute a gather operation.  However, Crago does not teach partitioning the lanes into lane subsets which execute concurrently.
	Boettcher discloses a vector unit comprising a plurality of lanes that are partitioned into lane subsets that are configured to concurrently execute ([0036, 0103 and 0112]:  wherein vector processing circuitry lanes can be partitioned in lane groups which execute concurrently (See Fig. 9:  wherein element 170 comprises a lane subset comprising a first and second vector lane))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Crago to execute concurrent operations on partitioned lane subsets as taught in Boettcher.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (performing concurrent operations on lane subsets as taught in Boettcher) to a known device (apparatus of Crago which gathers data concurrently using a plurality of lanes) ready for improvement to yield predictable results (an apparatus which performs concurrent operations on lane subsets in order to perform a gather operation) for the benefit of added flexibility, by allowing a plurality of lanes to be used for execution in a variety of methods (i.e. executing on all lanes in parallel or executing a subset of lanes in parallel).  (MPEP 2143, Example D)

	Claim 10 is similarly rejected on the same basis as claim 2 above as claim 10 is the method claim corresponding to the apparatus of claim 2 above.  

	In regards to claim 3, the overall combination of Crago, Zamsky, Orenstien and Boettcher teaches “The apparatus of claim 2” (see rejection of claim 2 above) “wherein the load store unit includes two load ports and two load buses” (Zamsky [0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) comprises two load ports to access the plurality of load buses (elements 24 and 32)) “and wherein the plurality of lanes is partitioned into even lane subsets and odd lane subsets” (Boettcher [0036, 0103 and 0112]:  wherein the plurality of lanes can be partitioned into even lane subsets and odd subsets.  For example, in Fig. 9, the first lane subset containing elements A0 and A1 and third lane subset containing elements A4 and A5 can be considered odd lane subsets.  While, the second lane subset containing elements A2 and A3 and fourth lane subset containing elements A6 and A7 can be considered even lane subsets)  

	Claim 11 is similarly rejected on the same basis as claim 3 above as claim 11 is the method claim corresponding to the apparatus of claim 3 above.  

	In regards to claim 4, the overall combination of Crago, Zamsky, Orenstien and Boettcher teaches “The apparatus of claim” (see rejection of claim 3 above) “wherein gather operations are dispatched to two lanes are dispatched to concurrently perform the gather operation in the first mode” (Crago:  Sections 4.2 and 6.1:  wherein ld.indirect operations are dispatched to two slices (lanes) to concurrently perform the gather operation (see section 6 and Figures 1 and 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors for more clarity) (Note: the overall combination of claim 1 uses Orenstien to teach the first mode and the overall combination of Crago and Orenstien teaches the limitation above (see Orenstien [0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2))
	The overall combination of Crago, Zamsky, Orenstien and Boettcher thus far does not teach “wherein two lanes concurrently perform the gather operation per clock cycle”.  
	Zamsky teaches “wherein two lanes are dispatched to concurrently perform the operation per clock cycle” ([0021-0023 and Fig. 1]:  wherein a load/store unit comprises two load buses are used to perform a multiple load operation including two load operations per cycle (See Fig. 1:  wherein the load/store unit includes two read buses (elements 24 and 32) to perform two operations per cycle) (Note:  the overall combination of references teaches two lanes and Zamsky is brought in two teach performing two gather/load operations per clock cycle))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to perform two gather operations per clock cycle as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load/store unit to perform two gather operations per clock cycle) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses a load/store unit to gather two operations per clock cycle) for the benefit of simplifying memory access operations in a processor, by reducing a number of operations that occur each cycle.  (MPEP 2143, Example D)

	Claim 12 is similarly rejected on the same basis as claim 4 above as claim 12 is the method claim corresponding to the apparatus of claim 4 above.  

5.	Claims 7-8 and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crago, Zamksy, Orenstien and further in view of Sperber, PGPUB No.: 2013/0326160.

	In regards to claim 7, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 6” (see rejection of claim 6 above) “wherein the load store unit is configured to perform the gather operation in order by a plurality of lanes based on a previous iteration of the second mode.” (Orenstien [0041-0043]:  wherein the load (gather) operation is performed in a sequence by a plurality of lanes (wherein each element of vector can be considered to be stored in a lane; for example, a vector with four elements can be considered to include 4 lanes) based on a previous iteration of the second mode.  For example, if one of the loads of the sequence causes a fault in the second mode, the sequence restarts in a second iteration based on a fault occurring in a first iteration (See Fig. 3) (Note:  the base reference Crago has already taught the load/store unit with the plurality of lanes in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 
	The overall combination of Crago, Zamsky and Orenstien does not teach “perform the gather operation based on a mask that indicates lanes that successfully gathered data and stored the successfully gathered data in the register”.
	Sperber teaches “perform the gather operation based on a mask that indicates lanes that successfully gathered data and stored the successfully gathered data in the register”. ([0034, 0101-0107 and 0123]:  wherein a gather operation is performed based on a completion mask which indicates vector elements that were successfully gathered and stored in a destination vector register (Note:  Crago teaches the lanes and the overall combination of references would teach the limitation above))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the gather operation of Crago, Zamsky and Orenstien to use a mask which tracks vector elements that were successfully loaded as taught in Sperber.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using a completion mask to track which elements were successfully gathered during a gather operation) to a known device (apparatus of Crago, Zamsky and Orenstien which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which executes a gather operation based on a completion mask which indicates which elements were gathered successfully) for the benefit of avoiding the execution of redundant gather operations when a memory fault causes a gather operation to restart.  (MPEP 2143, Example D)

	Claim 15 is similarly rejected on the same basis as claim 7 above as claim 15 is the method claim corresponding to the apparatus of claim 7 above.  

	In regards to claim 8, the overall combination of Crago, Zamsky, Orenstien and Sperber teaches “The apparatus of claim 7” (see rejection of claim 7 above) “wherein the gather operation is dispatched to a single lane to perform the gather operation per clock cycle in the second mode.” (Orenstien [0004 and 0041-0043]:  wherein the load (gather) operation is performed by dispatching a single load (gather) operation per cycle (dispatches instructions in a sequence and therefore instructions are executed sequentially in different processing cycles) in a non-speculative mode (wherein each element of vector can be considered to be stored in a lane; for example a vector with four elements can be considered to include 4 lanes) (See Fig. 3) (Note:  the base reference Crago has already taught the load/store unit with the plurality of lanes in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 

	Claim 16 is similarly rejected on the same basis as claim 8 above as claim 16 is the method claim corresponding to the apparatus of claim 8 above.  
	
Examiner Notes
6.	The examiner notes that claims 1, 5-6 and 17 use the language “load store unit configured to perform/ignore/transition”, which some may construe as invoking 35 U.S.C 112(f).  However, the limitations do not invoke 112(f) because the limitations fail the 3-prong test, as the term “load/store unit” would be understood by persons of ordinary skill in the art to have a sufficiently definite meaning as a name for structure.  (See MPEP 2181 I (A))

Response to Arguments
7.	Applicant’s arguments, see page 6 of the remarks filed on 5/31/22, with respect to the rejection(s) of claim(s) 5-8 and 10-16 under 35 USC 112(b) have been fully considered and are persuasive.  Therefore, the rejection(s) have been withdrawn.  

8.	Applicant's arguments filed on 5/31/2022 have been fully considered but they are not persuasive.  Therefore, the previous 35 USC 103 rejections in view of Crago, Zamsky and Orenstien with regards to similar independent claims 1, 9 and 17 are maintained.
	Furthermore, the dependent claims 2-8, 10-16 and 18-20 remain rejected for at least being dependent upon the rejected claims 1, 9 and 17 above.

9.	Applicant argues in regards to similar independent claims 1, 9 and 17, on pages 6-7 of the remarks filed on 5/31/2022, in the substance that:
	“The cited references do not teach or suggest each and every feature of claim 1. In particular, the Zamsky reference fails to teach or suggest an apparatus comprising "a load store unit comprising a plurality of load ports to access the plurality of load buses.." (emphasis added). Instead, Zamsky teaches "circuitry for a computing system" that is comprised of "a first load/store unit" and a "second load/store unit." Each of the load/store units taught in Zamsky are coupled to only a single read bus that "may be a unidirectional bus via which data may only be transferred from the memory arrangement..." See Zamsky, paras. [0021] and [0022]. There is no suggestion or teaching in Zamsky of a single load/store unit comprised of a plurality of load ports to access a plurality of load buses. Rather, this reference teaches circuitry consisting of two distinct load/store units (elements 12 and 14) each of which include only a single load bus.”
	
	It appears that the applicant is arguing above that the Zamsky reference does not teach "a load store unit comprising a plurality of load ports to access the plurality of load buses.." as claimed because Zamsky teaches two distinct load store units that each individually comprise a single read bus.  However, the examiner respectfully disagrees with the applicant’s arguments.
	The examiner respectfully disagrees because the applicant has disregarded the examiner’s interpretation of the Zamsky reference provided in the non-final office action mailed on 3/1/2022.  For example, the previous and current claim mappings of the reference Zamsky state “load/store unit comprising a plurality of load ports to access the plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) comprises two load ports to access the plurality of load buses (elements 24 and 32) (See Fig. 1:  wherein the load/store unit (combination of elements 24 and 32) include two ports in order to access read buses (elements 24 and 32)) (see rejection of claims above).  Based on the claim mapping above it is clear that the examiner is interpreting the load store unit to be the combination of LSU’s (elements 12 and 14); said another way a single load store unit is the made up of or comprises two load store units (elements 12 and 14).  
	The examiner asserts this is a reasonable interpretation at least because paragraph [0023] indicates that both the load store units (combination of elements 12 and 14) are used to concurrently perform a multiple load instruction.  Therefore, the two units are used together to work as a single load store unit to perform a single multiple load instruction which loads multiple pieces of data.  Furthermore, it is clear based upon the examiners interpretation and claim mapping that the reference Zamksy does teach the claimed limitation stating “load/store unit comprising a plurality of load ports to access the plurality of load buses” because the single load store unit is being interpreted as the combination of elements 12 and 14 of Fig. 1 and that combination would include two load buses (elements 24 and 32).
	
10.	It appears that the applicant attempts to argue broad aspects of the current independent claims. The examiner notes that in order to move prosecution forward the independent claims of the application should include further details at least about the first mode of the independent claims.  Currently, the independent claims are very broad as they merely claim a load store unit used to perform a gather operation currently in a first mode.  However, SIMD gather instructions are well-known instructions in the art as well as a load store unit or some execution unit used to process such instructions.  Further, “a first mode” can mean anything a speculative mode, a non-speculative mode, a SIMD mode, etc. under broad interpretation.  Therefore, the independent claims should at least distinguish exactly what the first mode is doing in order to move prosecution forward.  
	Alternatively, the independent claims could focus on other features of the case such as the second mode as discussed in claim 5, executing the instruction using two lanes as discussed in claim 4, or any combination of these features (i.e. amend the independent claims to discuss what the first mode is along with aspects of claim 4 or 5).  But the applicant is encouraged to amend the independent claims with distinguishing features of their invention.

Conclusion
11.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P CARMICHAEL-MOODY whose telephone number is (571)431-0692. The examiner can normally be reached M-F, 10am-7pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/COURTNEY P CARMICHAEL-MOODY/Primary Examiner, Art Unit 2183