DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/24/2022 has been entered.
	This communication is responsive to the request for continued examination filed on 1/31/2022.  Claims 1-20 are pending and have been examined.  Claims 1, 5, 9, 13 and 17 have been amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5-8 and 10-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and 

In regards to claim 5, the limitation stating “the load store unit is configured to not perform partial updates in response to exceptions or faults while performing the gather operation in the first mode” lacks clarity.  It is unclear if the limitation is indicating that the load store unit does not perform partial updates in response to exceptions or faults because the unit is operating in a first mode that based on paragraphs [0011 and 0030-0033] and Fig. 7 of applicant’s disclosure does not perform partial updating because it does not handle exceptions or faults. Or is the limitation indicating that partial updating is not performed in response to exceptions or faults while operating in a first mode because the first mode handles exceptions or faults by not performing partial updates.  The overall limitation is further unclear in light of the limitation stating “not performing partial updates in response to exceptions or faults” because it is clear from Fig. 7 that partial updating is performed in response to exceptions or faults; it appears the wording of the limitation may be making it unclear and the examiner suggest the applicant revise the limitation.  Therefore, the limitation lacks clarity based on the specification and the examiner suggest the applicant amend claim 5 with similar language from claim 18 that indicates the load store unit ignores exceptions or faults while performing in a first mode.  For purposes of examination the examiner will interpret the claim based on the above suggestion above.
Claim 13 is similarly rejected on the same basis as claim 5 above.
In regards to claim 8, line 2 the limitation “the gather operation” lacks clarity.  The limitation lacks clarity because it is unclear if the limitation is referring to “a gather operation” of claim 1, line 4 or a “gather operation” of claim 8, line 1?
In regards to claim 10, line 3 the limitation “the gather operation” lacks clarity.  The limitation lacks clarity because it is unclear if the limitation is referring to “a gather operation” of claim 9, line 5 or a “gather operation” of claim 10, line 2?
In regards to claim 12, line 2 the limitation “the gather operation” lacks clarity.  The limitation lacks clarity because it is unclear if the limitation is referring to “a gather operation” of claim 9, line 5 or a “gather operation” of claim 10, line 2?
In regards to claim 16, line 2 the limitation “the gather operation” lacks clarity.  The limitation lacks clarity because it is unclear if the limitation is referring to “a gather operation” of claim 9, line 5 or a “gather operation” of claim 16, line 2?
Claims 6-8, 11-12 and 14-16 are dependent upon one or more rejected claims above and are further rejected for including the deficiencies of one or more rejected claims above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	Claims 1, 5-6, 9, 13-14 and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference “Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs” hereby referred as Crago, Zamsky, PGPUB No.:  2015/0032929 (cited on PTO-892 filed on 7/30/2021) and further in view of Orenstien, PGPUB No.: 2009/0172365.

	In regards to claim 1, Crago teaches “An apparatus” (See Fig. 1(a-b) and section 2:  wherein a system (apparatus) comprising a stream multiprocessor (SM) is disclosed) “comprising: a load store unit” (See Figs. 1(b)/(c) and section 2:  wherein a stream multiprocessor (SM) includes a load/store unit) “wherein the load store unit is configured to perform a gather operation, based on a memory address, to concurrently gather a plurality of subsets of data from a memory” (See section 2 and sections 4.1-4.2:  wherein  the load store unit performs a ld.indirect (gather) operation which concurrently gathers a plurality of subsets of data from memory, based on a memory address, because the instruction is executed in a SIMT environment which executes a single instruction across a plurality of slice/threads concurrently.  Wherein the load store unit is made of a plurality of slices and each slice gathers a subset of data from memory (see section 6 and Figure 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors and displays the output of an LSU slice including a memory address for more clarity)) “wherein the gather operation load data” (See section 2 and sections 4.1-4.2:  wherein  the load store unit performs a ld.indirect (gather) operation which loads data) “and a register that is partitioned into a plurality of portions to hold the plurality of subsets of data provided by the load store unit.” (See Section 4.2:  wherein a destination vector register is used to hold the plurality of subsets of data provided by the load/store unit.  Wherein a vector register is known in the art to store a plurality of elements and each element is stored in a partitioned portion of that register.  For example, if a register is a 128-bit register with 32-bit elements, it can be considered a register partitioned into 4 portions each including one 32-bit element (see Figs. 1(b)/(c) for details of the load/store unit))
	Crago does not explicitly teach “a plurality of load buses; load/store unit comprising a plurality of load ports to access the plurality of load buses”, “wherein the load store unit is configured to perform a gather operation from memory via the plurality of load buses in a first mode” nor “loads data via each of the plurality of buses”.  Crago teaches a load/store unit, including a plurality of slices, for performing a gather operation (see Figs. 1(b-c)), and one of ordinary skill in the art would see Fig. 1(c) and see a single slice of the LSU including input and output arrows indicating data flow on a data bus.  However, Crago does not explicitly illustrate that each of the slices includes a plurality of data buses used to load the data. Therefore, to any extent that Crago does not teach load ports used to access load buses in order to gather the data another reference is brought in for that teaching.
	Zamsky teaches “a plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a LSU uses a plurality of load buses (read buses elements 24 and 32)) “load/store unit comprising a plurality of load ports to access the plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) comprises two load ports to access the plurality of load buses (elements 24 and 32) (See Fig. 1:  wherein the load/store unit (combination of elements 24 and 32) include two ports in order to access read buses (elements 24 and 32)) “loads data via each of the load buses” ([0023 and Fig. 1]:  wherein read buses (elements 24 and 32) are used to load data from memory)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to include load ports and load buses in order to gather data from memory using a load/store unit as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load ports and load buses to gather data from memory as taught in Zamsky) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses load ports and buses in conjunction with a load/store unit in order to gather data from memory) for the benefit of simplifying memory access circuitry and reducing hardware cost by using dual ports and buses.  (MPEP 2143, Example D) Furthermore, it would have been beneficial to use separate buses to perform a load that loads multiple pieces of data in parallel because it can provide improved bandwidth and speed up processing (See Zamsky [0024 and 0034]).
	The overall combination of Crago and Zamsky does not teach “perform a gather operation from memory in a first mode”.  The overall combination of Crago and Zamksy teaches performing a gather operation but does not explicitly teach performing the gather in a particular mode of operation.  Therefore, another reference is brought in for that teaching.
	Orenstien teaches “perform a gather operation from memory in a first mode”. ([0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the gather operation of Crago to be performed speculatively as the gather operation of Orenstien.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (speculatively executing a gather operation) to a known device (apparatus of Crago which performs a gather operation) ready for improvement to yield predictable results (an apparatus which speculatively performs a gather operation) for the benefit of improving processor performance by performing speculative execution (Orenstien [0029]).  (MPEP 2143, Example D)

	Claim 9 is similarly rejected on the same basis as claim 1 above as claim 9 is the method claim corresponding to the apparatus of claim 1 above.  (Note:  claim 9 includes an additional limitation stating “a load store unit in a floating-point unit (FPU)”.  The examiner notes that the base reference Crago teaches the load store unit is in a stream multiprocessor (wherein this processor can be considered a floating-point unit because it operates on floating point data as discussed in sections 4.2, 8 and shown in fig. 1 (b)).  Therefore, the above combination teaching claim 1 also similarly teaches claim 9.)

	In regards to claim 5, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 1” (see rejection of claim 1 above) “wherein the load store unit is configured to not perform partial updates in response to exceptions or faults while performing the gather operation in the first mode.” (Orenstien [0029 and 0034-0035]:  wherein in a first speculative mode page faults are ignored, while executing a load operation in the speculative mode of execution; and are handled in a non-speculative mode (See Fig. 2, for clarity see element 260) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 1 and the overall combination of references teaches the above limitation.  Also note examiner is interpreting claims as indicated previously in 112(b) rejection above)) 

	Claim 13 is similarly rejected on the same basis as claim 5 above as claim 13 is the method claim corresponding to the apparatus of claim 5 above.  

	In regards to claim 6, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 5” (see rejection of claim 5 above) “wherein the load store unit is configured to transition from the first mode to a second mode in response to an exception or fault occurring while performing the gather operation.” (Orenstien [0029 and 0034-0035]:  wherein if a page fault occurs in a first speculative mode of operation, while performing a load operation, then the mode of operation transitions as control passes to a non-speculative mode of execution (See Fig. 2) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 
	Claim 14 is similarly rejected on the same basis as claim 6 above as claim 14 is the method claim corresponding to the apparatus of claim 6 above.  

	In regards to claim 17, Crago teaches “An apparatus” (See Fig. 1(b) and section 2:  wherein a stream multiprocessor (SM)) “comprising: a load store unit configured to perform a gather operation” (See Figs. 1(b)/(c), section 2 and 4.1-4.2:  wherein a stream multiprocessor (SM) includes a load/store unit.  Wherein the load store unit performs a ld.indirect (gather) operation) “wherein the load store unit is configured to concurrently gather, based on a memory address, a plurality of subsets of data from a memory” (See section 2 and sections 4.1-4.2:  wherein  the load store unit performs a ld.indirect (gather) operation which concurrently gathers a plurality of subsets of data from memory, based on a memory address, because the instruction is executed in a SIMT environment which executes a single instruction across a plurality of slice/threads concurrently.  Wherein the load store unit is made of a plurality of slices and each slice gathers a subset of data from memory (see section 6 and Figure 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors and displays the output of an LSU slice including a memory address for more clarity)) “and a register configured to store the data provided by the load store unit.” (See Section 4.2:  wherein a destination vector register is used to hold the plurality of subsets of data provided by the load/store unit (see Figs. 1(b)/(c) for details of the load/store unit))
	Crago does not explicitly teach “a plurality of load buses”, “perform a gather operation selectively in a first mode or a second mode”, “and gather a plurality of subsets of data from a memory via the plurality of load buses in the first mode”, “loads data via each of the plurality of load buses” nor “wherein the load store unit is configured to gather the plurality of subsets of data from the memory using partial updating in the second mode”.  Crago teaches a load/store unit, including a plurality of slices, for performing a gather operation (see Figs. 1(b-c)), and one of ordinary skill in the art would see Fig. 1(c) and see a single slice of the LSU including input and output arrows indicating data flow on a data bus.  However, Crago does not explicitly illustrate that each of the slices includes a plurality of data buses used to load the data. Therefore, to any extent that Crago does not teach load ports used to access load buses in order to gather the data another reference is brought in for that teaching.
	Zamsky teaches “a plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a LSU uses a plurality of load buses (read buses elements 24 and 32)) “and gather data from a memory via the plurality of load buses” ([0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) gathers data from memory via the plurality of load buses (elements 24 and 32)) “loads data via each of the plurality of load buses” ([0023 and Fig. 1]:  wherein read buses (elements 24 and 32) are used to load data from memory)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to include load buses in order to gather data from memory using a load/store unit as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load buses to gather data from memory as taught in Zamsky) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses load buses in conjunction with a load/store unit in order to gather data from memory) for the benefit of simplifying memory access circuitry and reducing hardware cost by using dual buses.  (MPEP 2143, Example D) Furthermore, it would have been beneficial to use separate buses to perform a load that loads multiple pieces of data in parallel because it can provide improved bandwidth and speed up processing (See Zamsky [0024 and 0034]).
	The overall combination of Crago and Zamsky does not teach “perform a gather operation selectively in a first mode or a second mode”, “and gather data from a memory in the first mode” nor “wherein the load store unit is configured to gather the data from the memory using partial updating in the second mode”.  The overall combination of Crago and Zamsky teaches performing a gather operation but does not explicitly teach performing the gather selectively in one of two modes.  Therefore, another reference is brought in for that teaching.
	Orenstien teaches “perform a gather operation selectively in a first mode or a second mode” ([0029-0035]:  wherein a load (gather) operation is selectively performed in a first speculative mode or a non-speculative mode depending upon if a page fault occurs) “and gather a plurality of subsets of data from a memory in the first mode” ([0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2)) “wherein the load store unit is configured to gather the plurality of subsets of data from the memory using partial updating in the second mode”.  ([0035 and 00432-0043]:  wherein a load (gather) operation gathers data from memory in a non-speculative mode which uses partial updating, wherein a sequence of loads are used to update the destination one load at a time, opposed to concurrent load operations (See fig. 3) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the gather operation of Crago to be able to be performed in one of a speculative or non-speculative mode as the gather operation of Orenstien.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using either a speculative or non-speculative mode to execute a gather operation) to a known device (apparatus of Crago which performs a gather operation) ready for improvement to yield predictable results (an apparatus which speculatively or non-speculatively executes a gather operation) for the benefit of improving processor performance by performing a gather operation speculatively  (Orenstien [0029]). It would also have added flexibility to Crago by allowing a processor to also execute non-speculatively if a fault or exception occurs, in order to efficiently handle the fault or exception.  (MPEP 2143, Example D)

	In regards to claim 18, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 17” (see rejection of claim 17 above) “wherein the load store unit ignores exceptions or faults while performing the gather operation in the first mode.” (Orenstien [0029 and 0034-0035]:  wherein in a first speculative mode page faults are ignored, while executing a load operation, because they are not handled in the speculative mode; and are handled in a non-speculative mode (See Fig. 2) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 17 and the overall combination of references teaches the above limitation)) 

	In regards to claim 19, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 18” (see rejection of claim 18 above) “wherein the load store unit transitions from the first mode to the second mode in response to an exception or fault occurring while performing the gather operation in the first mode.” (Orenstien [0029 and 0034-0035]:  wherein if a page fault occurs in a first speculative mode of operation, while performing a load operation, then the mode of operation transitions as control passes to a non-speculative mode of execution (See Fig. 2) (Note:  the base reference Crago has already taught the load/store unit in the rejection of claim 17 and the overall combination of references teaches the above limitation)) 

	In regards to claim 20, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 19” (see rejection of claim 19 above) “wherein lanes are dispatched to concurrently perform the gather operation per clock cycle in the first mode” (Crago:  Sections 4.2 and 6.1:  wherein ld.indirect operations are dispatched to all slices (lanes) to concurrently perform the gather operation (see section 6 and Figures 1 and 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors for more clarity) (Note: the overall combination of claim 17 uses Orenstien to teach the first mode and the overall combination of Crago and Orenstien teaches the limitation above (see Orenstien [0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2))) “and a single lane is dispatched to perform the gather operation per clock cycle in the second mode.” (Orenstien [0004 and 0041-0043]:  wherein the load (gather) operation is performed by dispatching a single load (gather) operation per cycle (dispatches instructions in a sequence and therefore instructions are executed sequentially in different clock cycles) in a non-speculative mode (wherein each element of vector can be considered to be stored in a lane; for example, a vector with four elements can be considered to include 4 lanes).  For example, if one of the loads of the sequence causes a fault in the second mode, the sequence restarts in a second iteration based on a fault occurring in a first iteration (See Fig. 3) (Note:  the base reference Crago has already taught the load/store unit with the plurality of lanes in the rejection of claim 17 and the overall combination of references teaches the above limitation)) 
	The overall combination of Crago, Zamsky and Orenstien thus far does not teach “wherein two lanes are dispatched to concurrently perform the operation per clock cycle”.
	Zamsky teaches “wherein two lanes are dispatched to concurrently perform the operation per clock cycle” ([0021-0023 and Fig. 1]:  wherein a load/store unit comprises two load buses used to perform a multiple load operation including two load operations per cycle (See Fig. 1:  wherein the load/store unit includes two read buses (elements 24 and 32) to perform two operations per cycle) (Note:  the overall combination of references teaches two lanes and Zamsky is brought in two teach performing two gather/load operations per clock cycle))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to perform two gather operations per clock cycle as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load/store unit to perform two gather operations per clock cycle) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses a load/store unit to gather two operations per clock cycle) for the benefit of simplifying memory access operations in a processor, by reducing a number of operations that occur each cycle.  (MPEP 2143, Example D)

7.	Claims 2-4 and 10-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crago, Zamsky, Orenstien and further in view of Boettcher, PGPUB No.: 2019/0340054.

	In regards to claim 2, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 1” (see rejection of claim 1 above) “wherein the load store unit comprises a plurality of lanes that are configured to concurrently execute the gather operation to gather the plurality of subsets of data.” (Crago:  See section 2 and sections 4.1-4.2:  wherein the load store unit uses a plurality of slices (lanes) to perform a ld.indirect (gather) operation which concurrently gathers a plurality of subsets of data from memory because the instruction is executed in a SIMT environment which executes a single instruction across a plurality of slice/threads concurrently.  Wherein the load store unit is made of a plurality of slices (lanes) and each slice gathers a subset of data from memory (see section 6 and Figures 1(c)/6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors for more clarity))
	The overall combination of Crago, Zamsky and Orenstien does not teach “wherein unit comprises a plurality of lanes that is partitioned into lane subsets that are configured to concurrently execute.”  Crago does teach a load/store unit which comprises a plurality of lanes which concurrently execute a gather operation.  However, Crago does not teach partitioning the lanes into lane subsets which execute concurrently.
	Boettcher discloses a vector unit comprising a plurality of lanes that are partitioned into lane subsets that are configured to concurrently execute ([0036, 0103 and 0112]:  wherein vector processing circuitry lanes can be partitioned in lane groups which execute concurrently (See Fig. 9:  wherein element 170 comprises a lane subset comprising a first and second vector lane))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Crago to execute concurrent operations on partitioned lane subsets as taught in Boettcher.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (performing concurrent operations on lane subsets as taught in Boettcher) to a known device (apparatus of Crago which gathers data concurrently using a plurality of lanes) ready for improvement to yield predictable results (an apparatus which performs concurrent operations on lane subsets in order to perform a gather operation) for the benefit of added flexibility, by allowing a plurality of lanes to be used for execution in a variety of methods (i.e. executing on all lanes in parallel or executing a subset of lanes in parallel).  (MPEP 2143, Example D)

	Claim 10 is similarly rejected on the same basis as claim 2 above as claim 10 is the method claim corresponding to the apparatus of claim 2 above.  

	In regards to claim 3, the overall combination of Crago, Zamsky, Orenstien and Boettcher teaches “The apparatus of claim 2” (see rejection of claim 2 above) “wherein the load store unit includes two load ports and two load buses” (Zamsky [0021-0023 and Fig. 1]:  wherein a load/store unit (combination of elements 12 and 14) comprises two load ports to access the plurality of load buses (elements 24 and 32)) “and wherein the plurality of lanes is partitioned into even lane subsets and odd lane subsets” (Boettcher [0036, 0103 and 0112]:  wherein the plurality of lanes can be partitioned into even lane subsets and odd subsets.  For example, in Fig. 9, the first lane subset containing elements A0 and A1 and third lane subset containing elements A4 and A5 can be considered odd lane subsets.  While, the second lane subset containing elements A2 and A3 and fourth lane subset containing elements A6 and A7 can be considered even lane subsets)  

	Claim 11 is similarly rejected on the same basis as claim 3 above as claim 11 is the method claim corresponding to the apparatus of claim 3 above.  

	In regards to claim 4, the overall combination of Crago, Zamsky, Orenstien and Boettcher teaches “The apparatus of claim” (see rejection of claim 3 above) “wherein gather operations are dispatched to two lanes are dispatched to concurrently perform the gather operation in the first mode” (Crago:  Sections 4.2 and 6.1:  wherein ld.indirect operations are dispatched to two slices (lanes) to concurrently perform the gather operation (see section 6 and Figures 1 and 6 which discuss how the instruction is executed on existing GPU’s with stream multiprocessors for more clarity) (Note: the overall combination of claim 1 uses Orenstien to teach the first mode and the overall combination of Crago and Orenstien teaches the limitation above (see Orenstien [0027-0031]:  wherein a load (gather) operation from memory is performed in a speculative mode of operation, wherein faults are not handled (See Fig. 2))
	The overall combination of Crago, Zamsky, Orenstien and Boettcher thus far does not teach “wherein two lanes concurrently perform the gather operation per clock cycle”.  
	Zamsky teaches “wherein two lanes are dispatched to concurrently perform the operation per clock cycle” ([0021-0023 and Fig. 1]:  wherein a load/store unit comprises two load buses are used to perform a multiple load operation including two load operations per cycle (See Fig. 1:  wherein the load/store unit includes two read buses (elements 24 and 32) to perform two operations per cycle) (Note:  the overall combination of references teaches two lanes and Zamsky is brought in two teach performing two gather/load operations per clock cycle))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the apparatus Crago to perform two gather operations per clock cycle as taught in Zamsky.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using load/store unit to perform two gather operations per clock cycle) to a known device (apparatus of Crago which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which uses a load/store unit to gather two operations per clock cycle) for the benefit of simplifying memory access operations in a processor, by reducing a number of operations that occur each cycle.  (MPEP 2143, Example D)

	Claim 12 is similarly rejected on the same basis as claim 4 above as claim 12 is the method claim corresponding to the apparatus of claim 4 above.  

8.	Claims 7-8 and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crago, Zamksy, Orenstien and further in view of Sperber, PGPUB No.: 2013/0326160.
	In regards to claim 7, the overall combination of Crago, Zamsky and Orenstien teaches “The apparatus of claim 6” (see rejection of claim 6 above) “wherein the load store unit is configured to perform the gather operation in order by a plurality of lanes based on a previous iteration of the second mode.” (Orenstien [0041-0043]:  wherein the load (gather) operation is performed in a sequence by a plurality of lanes (wherein each element of vector can be considered to be stored in a lane; for example, a vector with four elements can be considered to include 4 lanes) based on a previous iteration of the second mode.  For example, if one of the loads of the sequence causes a fault in the second mode, the sequence restarts in a second iteration based on a fault occurring in a first iteration (See Fig. 3) (Note:  the base reference Crago has already taught the load/store unit with the plurality of lanes in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 
	The overall combination of Crago, Zamsky and Orenstien does not teach “perform the gather operation based on a mask that indicates lanes that successfully gathered data and stored the successfully gathered data in the register”.
	Sperber teaches “perform the gather operation based on a mask that indicates lanes that successfully gathered data and stored the successfully gathered data in the register”. ([0034, 0101-0107 and 0123]:  wherein a gather operation is performed based on a completion mask which indicates vector elements that were successfully gathered and stored in a destination vector register (Note:  Crago teaches the lanes and the overall combination of references would teach the limitation above))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the gather operation of Crago, Zamsky and Orenstien to use a mask which tracks vector elements that were successfully loaded as taught in Sperber.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using a completion mask to track which elements were successfully gathered during a gather operation) to a known device (apparatus of Crago, Zamsky and Orenstien which gathers data from memory using a load/store unit) ready for improvement to yield predictable results (an apparatus which executes a gather operation based on a completion mask which indicates which elements were gathered successfully) for the benefit of avoiding the execution of redundant gather operations when a memory fault causes a gather operation to restart.  (MPEP 2143, Example D)

	Claim 15 is similarly rejected on the same basis as claim 7 above as claim 15 is the method claim corresponding to the apparatus of claim 7 above.  

	In regards to claim 8, the overall combination of Crago, Zamsky, Orenstien and Sperber teaches “The apparatus of claim 7” (see rejection of claim 7 above) “wherein a gather operation is dispatched to a single lane to perform the gather operation per clock cycle in the second mode.” (Orenstien [0004 and 0041-0043]:  wherein the load (gather) operation is performed by dispatching a single load (gather) operation per cycle (dispatches instructions in a sequence and therefore instructions are executed sequentially in different processing cycles) in a non-speculative mode (wherein each element of vector can be considered to be stored in a lane; for example a vector with four elements can be considered to include 4 lanes) (See Fig. 3) (Note:  the base reference Crago has already taught the load/store unit with the plurality of lanes in the rejection of claim 1 and the overall combination of references teaches the above limitation)) 

	Claim 16 is similarly rejected on the same basis as claim 8 above as claim 16 is the method claim corresponding to the apparatus of claim 8 above.  
	
Examiner Notes
9.	The examiner notes that claims 1, 5-6 and 17 use the language “load store unit configured to perform/ignore/transition”, which some may construe as invoking 35 U.S.C 112(f).  However, the limitations do not invoke 112(f) because the limitations fail the 3-prong test, as the term “load/store unit” would be understood by persons of ordinary skill in the art to have a sufficiently definite meaning as a name for structure.  (See MPEP 2181 I (A))

	Response to Arguments
10.	Applicant’s arguments, see page 6 of the remarks filed on 1/24/22, with respect to the rejection(s) of claim(s) 5-8 and 13-16 under 35 USC 112(a) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 112(b). (see 35 USC 112(b) rejections above)

11.	Applicant’s arguments, see page 7 of the remarks, filed on 1/24/2022, with respect to the rejection(s) of claim(s) 1, 9 and 17 under 35 USC 103 in view of Crago, Wilson and Orenstien have been fully considered and are persuasive.  Therefore, the rejection(s) have been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made under 35 USC 103 in view of Crago, Zamsky and Orenstien.
	Furthermore, the dependent claims 2-8, 10-16 and 18-20 remain rejected at least by virtue of the respective dependencies upon the rejected claims 1, 9 and 17 above.

Conclusion
12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P CARMICHAEL-MOODY whose telephone number is (571)431-0692. The examiner can normally be reached M-F, 10am-7pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/COURTNEY P CARMICHAEL-MOODY/Primary Examiner, Art Unit 2183