DETAILED ACTION
Response to Amendment
This communication is responsive to the amendment filed on 3/3/2022.  Claims 1-2, 5-8, 11-14, 17-20 and 23-24 are pending and have been examined.  Claims 8 and 20 have been amended.  Claims 3-4, 9-10, 15-16 and 20-21 have been canceled.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
2.	The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

3.	Claims 8 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint 

4.	In regards to claim 8, the limitation “an execution unit of the execution pipeline is configured to perform the single load operation for each enabled lane of the plurality of lanes in parallel and using the loaded operand data” fails to comply with the written description requirement because the original disclosure does not properly describe an execution unit performing the single load operation on enabled lanes in parallel using the loaded operand data in sufficient detail such that one skilled in the art can reasonably conclude that the inventor had possession of the claimed invention.
	Specifically, while original paragraph [0026] and Fig. 3, appear to teach a load/store unit executing and completing the single load operation, the paragraph does not teach an execution unit other than the load store unit performing the single load operation.  Rather, the original disclosure of paragraph [0026] and Fig. 3 describe the load/store unit (element 212 of Fig. 2) completing a single load operation and then an execution unit (element 228 of Fig. 2) performs SIMD operations for each enabled lane in parallel using the data loaded from the single load operation (i.e. completed by the load/store unit).  Therefore, the specification fails to provide proper written description for the above claimed limitations which indicate an execution unit other than a load/store unit performing a single load operation. 
	The examiner suggests the applicant amend the limitation to state “an execution unit of the execution pipeline is configured to perform the SIMD operation for each enabled lane of the plurality of lanes in parallel and using the loaded operand data”. For 

5.	In regards to claim 20, the limitation “an execution unit of the execution pipeline is configured to perform the single store operation for each enabled lane of the plurality of lanes in parallel” fails to comply with the written description requirement because the original disclosure does not properly describe an execution unit performing the single store operation on enabled lanes in parallel in sufficient detail such that one skilled in the art can reasonably conclude that the inventor had possession of the claimed invention.
	Specifically, while original paragraphs [0037-0038] and Fig. 5, appear to teach a load/store unit executing and completing the single store operation, the paragraphs do not teach an execution unit other than the load store unit performing the single store operation.  Rather, the original disclosure of paragraphs [0037-0038] and Fig. 5 describe the load/store unit (element 212 of Fig. 2) completing a single store operation and an execution unit (element 228 of Fig. 2) performs SIMD operations for each enabled lane in parallel.  Therefore, the specification fails to provide proper written description for the above claimed limitations which indicate an execution unit other than a load/store unit performing a single store operation. 
	The examiner suggests the applicant amend the limitation to state “an execution unit of the execution pipeline is configured to perform the SIMD operation for each enabled lane of the plurality of lanes in parallel”. For purposes of examination the examiner will interpret the claim as discussed in the specification and claimed similarly in method claim 14 which corresponds to the processor of claim 20.

Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	Claims 1-2, 5-8, 11, 13-14, 17-20 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Col, USPAT No. 6,330,657 (cited on PTO-892 filed on 11/13/2020) and further in view of Goveas, 2012/0117420 (cited on PTO-892 filed on 11/13/2020).

	In regards to claim 7, Col teaches “A processor, comprising: a load/store unit”  (Column 5, lines 9-58 and See Fig. 1:  wherein a microprocessor comprises a integer execution unit used to load and store data to and from memory and is therefore a load/store unit (see Fig. 2, element 250)) “and an execution pipeline configured to execute an instruction representing a single-instruction-multiple-data (SIMD) operation” (See Figs. 1-2:  wherein an execution pipeline is configured to execute a load-ALU SIMD macro instruction, and therefore the instruction represents a SIMD operation (See Column 5, lines 9-58  and Column 7, lines 10-39)) “wherein the execution pipeline is configured to attempt to execute the instruction by decoding the instruction into a single load operation and the SIMD operation and attempting to generate a source memory address for the single load operation” (Column 7, line 10-56:  wherein the execution pipeline attempts to execute the load-ALU SIMD instruction by decoding the instruction into a single SIMD micro-instruction to load a SIMD operand and a SIMD micro-instruction to perform an ALU operation.  Wherein memory operations require generating a memory address (i.e. source address) to load the data from memory, and therefore address stage (element 108) would generate an address for the load SIMD micro-instruction (See Figs. 1-2)) “wherein: the instruction references a memory block storing operand data for each lane of a plurality of lanes” (Column 5, lines 9-58 and Column 7, lines 10-39:  wherein the load-ALU SIMD macro instruction represents a SIMD ALU operation and a SIMD load microinstruction which loads operand data stored in memory.  Wherein the operand data stored in memory is for one or more lanes of a plurality of lanes of a SIMD register; for example a SIMD register stores multiple sets of data (bits) which are considered multiple elements of a SIMD register and each element is stored in a lane of the SIMD register (i.e. a 512-bit SIMD register can include four 128-bit elements, considered to be stored in four lanes of the SIMD register)(See Fig. 1 and Column 2, lines 21-38 which describes a SIMD instruction operation)) “single load operation is attempted to access the memory block via the load/store unit” (Column 5, lines 9-58 and Column 7, lines 10-39:  wherein a load microinstruction, for a load-ALU SIMD macro instruction, loads operand data for each lane of the SIMD register via a load/store unit (see Fig. 2, element 250) from memory) “a load operation is attempted to access the memory block and is performed by the load/store unit for each lane of the plurality of lanes prior to executing the SIMD operation.” (Column 5, lines 9-58 and Column 7, lines 10-39:  wherein a load microinstruction, of a load-ALU SIMD macroinstruction, loads operand data for each lane of the SIMD register prior to the SIMD operation (i.e. ALU operation) performing an operation on the loaded data)
	Col does not teach “execute an instruction representing a single-instruction-multiple-data (SIMD) operation in a first execution mode unless a memory fault is generated”, “wherein the execution pipeline is configured to attempt to execute the instruction in the first execution mode by attempting to generate a source memory address for the single load operation”, “and in response to the memory fault to re-execute the instruction in a second execution mode, wherein: the instruction references a memory block storing operand data for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation”,  “in the first execution mode single load operation is attempted to access the memory block via the load/store unit”,  “and in the second execution mode a separate load operation is attempted to access the memory block and is performed by the load/store unit for each enabled lane of the plurality of lanes”.  While Col discloses a processor with a pipeline that executes a load-ALU SIMD instruction that is decoded into two SIMD microinstructions in order to execute the instruction; the two SIMD microinstructions include a SIMD load microinstruction and a SIMD ALU microinstruction.  Col includes no discussion of executing the load-ALU SIMD instruction using a first or second mode, wherein the modes are dependent upon a memory fault being triggered by executing the SIMD load microinstruction; nor does Col discuss the load ALU-SIMD instruction including an instruction mask.
	However, Goveas discloses execute an instruction representing a single-instruction-multiple-data (SIMD) operation in a first execution mode unless a memory fault is generated ([0027-0032]:  wherein a masked SIMD load instruction is executed in a first fastpath mode of operation unless a memory fault is detected (see [0003 and 0016] for explicit details of SIMD)), wherein a processor is configured to attempt to execute the instruction in the first execution mode by attempting to generate a source memory address for the single load operation ([0023-0024 and 0031-0032]:  wherein the processor executes the instruction in the fastpath mode by attempting to generate a source memory address for the masked single masked load operation.  Wherein in order to execute a load instruction a source memory address would be generated in order to load the data from memory (See Fig. 4 and [0016, 0019 and 0027-0028]:  wherein processor when VMASKMOVPS includes a 128-bit operand and processor includes a maximum native data size of 128-bits the instruction would break down into one single load operation with one 128-bit operand)), and in response to the memory fault to re-execute the instruction in a second execution mode ([0029-0031]:  wherein in response to a memory fault occurring the instruction is re-executed in a slow mode using microcode) wherein: the instruction references a memory block storing operand data for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation ([0016, 0019-0022 and see details of Fig. 4]:  wherein the VMASKMOV load instruction references a memory block storing operand data for each lane of a plurality of lanes (i.e. each element of a SIMD operand corresponds to a lane of a SIMD register) and further references a mask indicating whether each lane is enabled or disabled (masked or unmasked) for a SIMD load operation) in the first execution mode single load operation is attempted to access the memory block ([0023-0024 and 0031-0032]:  wherein the processor executes the single load operation in the fastpath mode by attempting to access the memory block.  (See Fig. 4 and [0016, 0019 and 0027-0028]:  wherein processor when VMASKMOVPS includes a 128-bit operand and processor includes a maximum native data size of 128-bits the instruction would break down into one single load operation with one 128-bit operand)), and in the second execution mode a separate load operation is attempted to access the memory block and is performed by the processor for each enabled lane of the plurality of lanes. (See Fig. 4 and [0018-0021 and 0030-0032]:  wherein in a slow mode microcode is used to generate separate load operations to access the memory block and is performed for each enabled lane (unmasked lane) of the plurality of lanes) The combination would have a processor like the one of Col that executes a masked load–ALU SIMD instruction in one of two modes depending on if a memory fault is detected as taught in Goveas.  One of ordinary skill in the art would have been motivated to detect a memory fault and determine a mode of execution for a SIMD load instruction as taught in Goveas based on the fault detection for the benefit of detecting faults in order to perform efficient execution of a faulting instruction; which may include executing microcode to reduce the amount of hardware on a processor (Goveas [0033]).
	It would have first been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the SIMD instruction of Col to reference a mask as the SIMD (vector) instruction of Goveas.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using a mask to enable or disable operations of a SIMD instruction) to a known device (SIMD instruction of Col) ready for improvement to yield predictable results (a SIMD instruction which uses a mask to enable and disable operations for a SIMD instruction) for the benefit of adding flexibility to an instruction. (MPEP 2143, Example D)
	It would have then been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Col that executes a SIMD load instruction to execute the load in one of two modes (fast mode or slow mode) depending on if a memory fault occurs as taught in Goveas.  It would have been obvious to one of ordinary skill in the art because executing a masked SIMD load instruction in a fast mode increases the speed in which a masked SIMD load instruction is executed and thereby increases processor efficiency (Goveas [0027]).  Furthermore, executing a masked SIMD load instruction in a slow mode upon the occurrence a memory fault, allows for efficient fault handling that uses microcode to execute the instruction; wherein microcode reduces the amount of hardware required to implement the SIMD load instruction (Goveas [0027 and 0033]).

	Claim 1 is similarly rejected on the same basis as claim 7 above as claim 1 is the method claim corresponding to the processor of claim 7 above. (Note:  claim 1 differs from claim 7 as the method explicitly claims “fetching, at a processor, an instruction” and “a memory fault resulting from the attempt to generate the source memory address”.  The examiner notes that the base reference Col teaches a fetch stage (element 102) of Fig. 1 which is used to fetch instructions.  The examiner further notes that the memory fault of Goveas does result from the attempt to generate the source memory address because the memory fault can occur from a TLB miss (See Goveas [0023-0024]) and therefore the references above teach all of the corresponding limitations of claim 1 as well.)

	In regards to claim 8, the overall combination of Col and Goveas teaches “The processor of claim 7” (see rejection of claim 7 above) “wherein: responsive to an absence of a memory fault during execution of the instruction in the first execution mode: the load/store unit is configured to complete the single load operation to load the operand data” (Goveas [0027-0029]:  wherein  responsive to absence of a memory fault execution in the fastpath mode the processor is configured to complete the single load operation to load the operand data (See Fig. 4 and [0016, 0019 and 0027-0028]:  wherein processor when VMASKMOVPS includes a 128-bit operand and processor includes a maximum native data size of 128-bits the instruction would break down into one single load operation with one 128-bit operand; also note base reference Col teaches the load/store unit and combination of references teaches above limitation)) “and an execution unit of the execution pipeline is configured to perform the single load operation for each enabled lane of the plurality of lanes in parallel and using the loaded operand data.” (Col:  See Fig. 1:  wherein SIMD unit (element 114) of the pipeline is used to perform the SIMD ALU operation for each lane in parallel using data loaded by a SIMD load micro-operation (See Column 5, lines 9-58 and Column 7, lines 10-39)(Note: Goveas teaches using a mask that will indicate enabled lanes and the combination teaches the limitation above)|Goveas [0022 and 0027-0029]:  wherein the SIMD load is performed for each of the enabled data elements (elements where mask bits are set to one) of the operand lanes in parallel using the loaded data (See Fig. 2 and [0013]) (Note:  the overall combination of Col and Goveas teaches the SIMD operation with enabled lanes which performs a SIMD operation on loaded data (See Claim 7 rejection) and therefore the overall combination of references teaches the limitation above))

	Claim 2 is similarly rejected on the same basis as claim 8 above as claim 2 is the method claim corresponding to the processor of claim 8 above. (Note:  claim 2 differs from claim 8 as the method explicitly claims “a memory fault from the attempt to generate the source memory address”.   The examiner notes that the memory fault of Goveas does result from the attempt to generate the source memory address because the memory fault can occur from a TLB miss (See Goveas [0023-0024]) and therefore the references above teach all of the corresponding limitations of claim 2 as well.)

	In regards to claim 11, the overall combination of Col and Goveas teaches “The processor of claim 7” (see rejection of claim 7 above) “wherein the execution pipeline is configured to re-execute the instruction at the processor in the second execution mode comprises: implementing a resynchronization for the instruction” (Goveas [0021-0022 and 0030-0032]:  wherein re-executing the instruction in a slow mode using microcode comprises re-starting the instruction (implementing a resynchronization for the instruction)(See Figs. 2 and 4) (Note:  Col teaches the execution pipeline, and the combination of references is used to teach the limitation above as modifying the pipeline of Col to operate in the second mode as taught in the combination of claim 7)) “decoding the instruction into a microcode preamble and the SIMD operation” (Goveas [0018, 0021-0022 and 0030-0032]:  wherein a sequence of load operations must be performed in the second mode in place of the masked load instruction and therefore the masked load instruction is decoded by a decoder into a microcode preamble (sequence of individual load operations) (see Figs. 2 and 4 for further clarity)|Col:  Column 5, lines 29-59 and Column 7, lines 10-39: teaches decoding the load-ALU SIMD macro instruction into a single load micro-operation and a SIMD micro-operation and in combination with the second mode of Goveas would teach decoding the single load into a preamble in order to handle a memory fault and would still decode the SIMD microinstruction) “wherein: the microcode preamble includes a load operation for each enabled lane of the plurality of lanes, the load operation configured to load the operand data for a corresponding lane from the memory block to a corresponding position in a temporary storage location ” (Goveas [0018, 0021-0022 and 0030-0032]:  wherein the microcode preamble (sequence of individual load operations) includes a load operation for each lane which includes a corresponding mask bit which is set to one.  Wherein the load operation loads operand data for the corresponding lane from memory to a temporary register (See Figs. 2 and 4)) 
“and the SIMD operation is configured to reference the temporary storage location in place of a memory location originally identified in the instruction as a source address of the memory block” (Col:  Column 6, lines 56-67, Column 7, lines 1-39 and Column 8, lines 35-42:  wherein a SIMD operation of a load-ALU SIMD macro instruction references a temporary register in place of an original memory address specified by the instruction, because the SIMD operation uses the operand which has already been loaded into the processor by the previous load operation)  “directing the load/store unit to perform each load operation to load the operand data for each enabled lane into the temporary storage location” (Goveas [0018, 0021-0022 and 0030-0032]:  wherein the processor performs each load operation of the microcode preamble (sequence of individual load operations) for each lane which includes a corresponding mask bit which is set to one and loads the data into a temporary register (Note:  the combination of Col and Goveas teach a load/store unit that executes an instruction which requires loading data into a temporary register, based on enabled masked bits, so that the subsequent SIMD operation can access the data retrieved from memory)) “and performing the SIMD operation using the operand data from the temporary storage location.” (Col:  Column 6, lines 56-67, Column 7, lines 1-39 and Column 8, lines 35-42:  wherein a SIMD operation of a load-ALU SIMD macroinstruction performs a SIMD operation on operand data of a temporary register)

Claim 5 is similarly rejected on the same basis as claim 11 above as claim 5 is the method claim corresponding to the processor of claim 11 above.

	In regards to claim 6, the overall combination of Col and Goveas teaches “The method of claim 5” (see rejection of claim 5 above) “wherein the memory fault comprises a page fault responsive to the memory block including a page that is not resident in memory.” (Goveas [0023-0024]:  wherein a memory fault comprises a page fault responsive to memory block including a page that is not resident in memory)

	In regards to claim 19, Col teaches “A processor, comprising: a load/store unit”  (Column 5, lines 9-58 and See Fig. 1:  wherein a microprocessor comprises a integer execution unit used to load and store data to and from memory and is therefore a load/store unit (see Fig. 2, element 250)) “and an execution pipeline configured to execute an instruction representing a single-instruction-multiple-data (SIMD) operation” (See Figs. 1-2:  wherein an execution pipeline is configured to execute a SIMD ALU store macro instruction (See Column 7, lines 10-39 and Column 9, lines 39-52)) “wherein the execution pipeline is configured to attempt to execute the instruction by decoding the instruction into the SIMD operation and a single store operation and attempting to generate a destination address for the single store operation” (Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-52:  wherein a microprocessor executes a SIMD store-ALU macroinstruction by decoding the SIMD store-ALU instruction into a SIMD ALU microinstruction and a SIMD store instruction. Wherein store microinstructions require generating a memory address (i.e. destination address) to store the data operated on by the SIMD ALU operation, and therefore address stage (element 108) would generate an address for the store SIMD micro-instruction (See Figs. 1-2)) “wherein: the instruction references a memory block that is to serve as a destination for the result data generated by the execution of the SIMD operation for each lane of a plurality of lanes” (Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-52:  wherein the SIMD-ALU store macro instruction that represents a SIMD ALU operation and a SIMD store micro operation which stores a SIMD result to memory.  Wherein the result data stored to memory is data from a SIMD register which includes one or more lanes of a plurality of lanes; for example a SIMD register stores multiple sets of data (bits) which are considered multiple elements of a SIMD register and each element is stored in a lane of the SIMD register (i.e. a 512-bit SIMD register can include four 128-bit elements, considered to be stored in four lanes of the SIMD register)(See Fig. 1 and Column 2, lines 21-38 which describes a SIMD instruction operation)) “the single store operation is attempted to store result data generated from the execution of the SIMD operation to the memory block via the load/store unit” (Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-52:  wherein a store micro-instruction, for a SIMD-ALU store macro instruction, stores SIMD ALU result data to memory via a load/store unit (see Fig. 2, element 250)) “a store operation is performed by the load/store unit to store result data to the memory block for each lane of the plurality of lanes subsequent to executing the SIMD operation” (Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-52:  wherein a store micro operation, for a SIMD-ALU store macro instruction, stores result data for each lane of the SIMD register to the memory block subsequent to the SIMD operation performing an operation to produce the result)
	Col does not teach “execute an instruction representing a single-instruction-multiple-data (SIMD) operation in a first execution mode unless a memory fault is generated”, “wherein the execution pipeline is configured to attempt to execute the instruction in the first execution mode and attempting to generate a destination address for the single store operation”, “and in response to the memory fault to re-execute the instruction in a second execution mode, wherein: the instruction references a memory block that is to serve as a destination for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation”,  “in the first execution mode the single store operation is attempted to store result data to the memory block via the load/store unit”,  “in the second execution mode a store operation is performed by the load/store unit to store the result data to the memory block for each enabled lane of the plurality of lanes”.  While Col discloses a processor with a pipeline that executes a SIMD-ALU store instruction that is decoded into two SIMD microinstructions in order to execute the instruction; the two SIMD microinstructions include a SIMD ALU microinstruction and a SIMD store microinstruction.  Col includes no discussion of executing the SIMD-ALU store instruction using a first or second mode, wherein the modes are dependent upon a memory fault being triggered by executing the SIMD store microinstruction; nor does Col discuss the SIMD-ALU store instruction including an instruction mask.
	However, Goveas discloses execute an instruction representing a single-instruction-multiple-data (SIMD) operation in a first execution mode unless a memory fault is generated (abstract and [0016-0018, 0023-0024 and 0030-0031]:  wherein a VMASKMOV store instruction can be performed in a fastpath mode unless a memory fault is generated) “wherein the execution pipeline is configured to attempt to execute the instruction in the first execution mode and attempting to generate a destination address for the single store operation” (abstract and [0016-0018 and 0023-0026]:  wherein the processor executes the instruction in the fastpath mode by attempting to generate a destination memory address for the single masked store operation.  Wherein in order to execute a store instruction a destination memory address would be generated in order to store the data to memory (See Fig. 4 and [0016, 0019 and 0027-0028]:  wherein processor when VMASKMOVPS includes a 128-bit operand and processor includes a maximum native data size of 128-bits the instruction would break down into one single store operation with one 128-bit operand)) and in response to the memory fault to re-execute the instruction in a second execution mode ([0017-0018, 0026 and 0029-0031]: wherein in response to a memory fault occurring in a fastpath mode to re-execute the instruction in a slow mode using microcode (also see abstract)) wherein: the instruction references a memory block that is to serve as a destination for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation ([0016, 0026 and see Fig. 3]:  wherein the VMASKMOV store instruction references a memory block that is to serve as a destination for each lane of a plurality of lanes (i.e. each element of a SIMD operand is corresponds to a lane) and further references a mask indicating whether each lane is enabled or disabled (masked or unmasked) for a SIMD store operation) in the first execution mode the single store operation is attempted to store result data to the memory block (abstract and [0016-0018, 0026 and 0030-0031]:  wherein the processor executes the single store operation in the fastpath mode by attempting to store data to the memory block.  (See Fig. 3 and [0016, 0019 and 0027-0028]:  wherein processor when VMASKMOVPS includes a 128-bit operand and processor includes a maximum native data size of 128-bits the instruction would break down into one single store operation with one 128-bit operand)) in the second execution mode a store operation is performed by the processor to store the result data to the memory block for each enabled lane of the plurality of lanes. (abstract and [0018, 0022 and 0026]:  wherein in a slow mode microcode is used to generate separate store operations to store data to the memory block and is performed for each enabled lane (unmasked lane) of the plurality of lanes (See Figs.3-4 for further clarity)) The combination would have a processor like Col that executes a masked store SIMD instruction in one of two modes depending on if a memory fault is detected as taught in Goveas.  One of ordinary skill in the art would have been motivated to detect a memory fault and determine a mode of execution for a SIMD store instruction as taught in Goveas based on the fault detection for the benefit of detecting faults in order perform efficient execution of a faulting instruction; which may include executing microcode to reduce the amount of hardware on a processor (Goveas [0033]).
	It would have first been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the SIMD instruction of Col to reference a mask as the SIMD (vector) instruction of Goveas.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using a mask to enable or disable operations of a SIMD instruction) to a known device (SIMD instruction of Col) ready for improvement to yield predictable results (a SIMD instruction which uses a mask to enable and disable operations for a SIMD instruction) for the benefit of adding flexibility to an instruction. (MPEP 2143, Example D)
	It would have then been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Col that executes a SIMD store instruction to execute the store in one of two modes (fast mode or slow mode) depending on if a memory fault occurs as taught in Goveas.  It would have been obvious to one of ordinary skill in the art because executing a masked SIMD store instruction in a fast mode increases the speed in which a masked SIMD store instruction is executed and thereby increases processor efficiency by implementing fewer processing cycles (Goveas [0017 and 0027]).  Furthermore, executing a masked SIMD store instruction in a slow mode upon the occurrence a memory fault, allows for efficient fault handling that uses microcode to execute the instruction; wherein microcode reduces the amount of hardware required to implement the SIMD store instruction (Goveas [0033]).

	Claim 13 is similarly rejected on the same basis as claim 19 above as claim 13 is the method claim corresponding to the processor of claim 19 above. (Note:  claim 13 differs from claim 19 as the method explicitly claims “fetching, at a processor, an instruction” and “a memory fault resulting from the attempt to generate the destination memory address”.  The examiner notes that the base reference Col teaches a fetch stage (element 102) of Fig. 1 which is used to fetch instructions.  The examiner further notes that the memory fault of Goveas does result from the attempt to generate the destination address for a store operation because the memory fault can occur from a TLB miss (See Goveas [0023-0024]) and therefore the references above teach all of the corresponding limitations of claim 13 as well.)

	In regards to claim 20, the overall combination of Col and Goveas teaches “The processor of claim 19” (see rejection of claim 19 above) “wherein: responsive to an absence of a memory fault during execution of the instruction in the first execution mode: an execution unit of the execution pipeline is configured to perform the single store operation for each enabled lane of the plurality of lanes in parallel to generate the result data for each enabled lane” (Col:  See Fig. 1:  wherein SIMD unit (element 114) of the pipeline is used to perform the SIMD ALU operation for each lane in parallel in order to generate result data (See Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-50 (Note:  Goveas teaches using a mask that will perform a store operation for enabled lanes as well as executing in a fastpath mode responsive to an absence of a memory fault (See Goveas [0017-0018 and Figs.3- 4]). Therefore the overall combination of references teach the limitation above)) “the load/store unit configured to complete the single store operation to store the generated result data to the memory block” (Col: See Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-50:  wherein a store micro-instruction, for a SIMD-ALU store macro instruction, stores SIMD ALU result data to memory via a load/store unit (see Fig. 2, element 250))| Goveas [abstract and 0016-0017 and 0028]:  wherein a single store operation is completed when executed in a fastpath mode. wherein processor when VMASKMOVPS includes a 128-bit operand and processor includes a maximum native data size of 128-bits the instruction would break down into one single store operation with one 128-bit operand (See Fig. 4) (Note:  Col is the base reference teaching the store operation and the overall combination of Col and Goveas teaches the limitation above)) 

	Claim 14 is similarly rejected on the same basis as claim 20 above as claim 14 is the method claim corresponding to the processor of claim 20 above.

	In regards to claim 23, the overall combination of Col and Goveas teaches “The processor of claim 19” (see rejection of claim 19 above) “wherein the execution pipeline is configured to re-execute the instruction at the processor in the second execution mode by: implementing a resynchronization for the instruction” (Goveas [0026 and 0030-0032]:  wherein re-executing the instruction in a slow mode using microcode comprises re-starting the instruction (implementing a resynchronization for the instruction)(See Figs. 3 and 4) (Note:  Col teaches the execution pipeline, and the combination of references is used to teach the limitation above as modifying the pipeline of Col to operate in the second mode as taught in the combination of claim 19)) “decoding the instruction into the SIMD operation and a microcode postamble” (Goveas [0018, 0026 and 0030-0032]:  wherein microcode of store operations must be performed in the second mode in place of the masked store instruction and therefore the masked store instruction is decoded by a decoder into a microcode postamble (sequence of individual store operations) (see Figs. 2 and 4 for further clarity)| Col:  Column 5, lines 29-59, Column 7, lines 10-39 and Column 9, lines 39-52: teaches decoding the SIMD ALU store macro instruction into a SIMD microinstruction and a single store operation and in combination with the second mode of Goveas would teach decoding the single store into a postamble in order to handle a memory fault and would still decode the SIMD microinstruction) “wherein:  the microcode postamble includes a store operation for each enabled lane of the plurality of lanes, the store operation configured to store the result data for the corresponding lane” (Goveas [0018, 0021-0022 and 0030-0032]:  wherein the microcode postamble (sequence of individual store operations) includes a store operation for each lane which includes a corresponding mask bit which is set to one.  Wherein the store operation stores data for the corresponding lane to memory (See Figs. 3 and 4)) 
 “in a temporary storage location to a corresponding position in the memory block” (Col:  Column 6, lines 56-67, Column 7, lines 1-39 and Column 9, lines 39-52:  wherein a store operation of a SIMD-ALU store macro instruction stores data to memory from a temporary register) “and the SIMD operation is configured to reference a temporary storage location in place of a memory location originally identified in the instruction as a destination address of the memory block” (Col:  Column 6, lines 56-67, Column 7, lines 1-39, Column 8, lines 35-42 and Column 9, lines 39-52:  wherein a SIMD operation of a SIMD-ALU store macro instruction references a temporary register in place of an original destination address specified by the instruction, because the SIMD operation uses the temporary register to hold data used by a subsequent store operation) “perform the SIMD operation to generate corresponding  result data for each enabled lane and store the result data in a corresponding position in the temporary storage location” (Col:  Column 6, lines 56-67, Column 7, lines 1-39 and Column 8, lines 35-42:  wherein a SIMD operation of a SIMD-ALU store macro instruction generates result data for each lane of a SIMD register and stores the result data for each lane at a corresponding position of  a temporary register (Also see Column 9, lines 39-52)) “direct the load/store unit to perform each store operation to store the result data for each enabled lane in the temporary storage location in a corresponding position of the memory block.” (Goveas [0018, 0021-0022 and 0030-0032]:  wherein store operations of the microcode postamble (individual store operations) are performed for each lane which includes a corresponding mask bit which is set to one (Note:  the combination of Col teaches the instruction which requires storing SIMD result data from a temporary register to memory, while Goveas stores data based on enabled masked bits and therefore the combination of Col and Goveas would teach storing result data for the enable lanes from a temporary register to memory)) 

	Claim 17 is similarly rejected on the same basis as claim 23 above as claim 17 is the method claim corresponding to the processor of claim 23 above.

	In regards to claim 18, the overall combination of Col and Goveas teaches “The method of claim 13” (see rejection of claim 13 above) “wherein the memory fault comprises a page fault responsive to the memory block including a page that is not resident in memory.” (Goveas [0023-0024]:  wherein a memory fault comprises a page fault responsive to memory block including a page that is not resident in memory)

8.	Claims 12 and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Col, Goveas and further in view of Hasenplaugh, PGPUB No. 2018/0095756 (cited in PTO-892 filed on 11/13/2020).

	In regards to claim 12, the overall combination of Col and Goveas teaches “The processor of claim 11” (see rejection of claim 11 above) “wherein the temporary storage location is in a memory of the processor” (Col:  Column 6, lines 56-67 and Column 37-38:  wherein a register stored in a register file of the microprocessor provides temporary storage |Goveas:  See Figs. 2 and 4:  wherein a temporary register is in a memory of the processor)
	The overall combination of Col and Goveas thus far does not teach “wherein the storage location is in a scratchpad memory.”  Col and Goveas both teach using a temporary register but do not explicitly teach the register being a part of scratchpad memory.
	Hasenplaugh teaches “wherein the storage location is in a scratchpad memory” ([0044]:  wherein a storage location is in scratchpad memory)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the temporary storage location of the processor of Col and Goveas to be included in a scratchpad memory as taught in Hasenplaugh.  It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (using a temporary storage location in a scratchpad memory) for another (using a temporary storage location in a generic register file) for the benefit of using a scratchpad memory which is a high speed memory used for rapid data retrieval. (MPEP 2143, Example B)


	In regards to claim 24, the overall combination of Col and Goveas teaches “The processor of claim 23” (see rejection of claim 23 above) “wherein the temporary storage location is in a memory of the processor” (Col:  Column 6, lines 56-67 and Column 37-38:  wherein a register stored in a register file of the microprocessor provides temporary storage)
	The overall combination of Col and Goveas thus far does not teach “wherein the storage location is in a scratchpad memory.”  Col teaches using a temporary register but does not explicitly teach the register being a part of scratchpad memory.
	Hasenplaugh teaches “wherein the storage location is in a scratchpad memory” ([0044]:  wherein a storage location is in scratchpad memory)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the temporary storage location of the processor of Col to be included in a scratchpad memory as taught in Hasenplaugh.  It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (using a temporary storage location in a scratchpad memory) for another (using a temporary storage location in a generic register file) for the benefit of using a scratchpad memory which is a high speed memory used for rapid data retrieval. (MPEP 2143, Example B)
Response to Arguments
9.	Applicant’s arguments, see page 9 of the remarks filed on 3/3/2022, with respect to the 35 USC 112(d) rejections have been fully considered and are persuasive.  Therefore the 35 USC 112(d) rejections have been withdrawn. 

10.	Applicant’s arguments, see page 9 of the remarks filed on 3/3/2022, with respect to the rejection(s) of claim(s) 8 and 20 under 35 USC 112(b) have been fully considered and are persuasive.  Therefore, the rejection(s) have been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 112(a) with regards to claims 8 and 20.

11.	Applicant's arguments filed on 3/3/2022, with respect to the 35 USC 103 rejections of independent claims 1, 7, 13 and 19 have been fully considered but they are not persuasive. Therefore, the previous 35 USC 103 rejections of claims 1, 7, 13 and 19 in view of Col and Goveas have been maintained.  
	Furthermore, dependent claims 2, 5-6, 8, 11-12, 14, 17-18, 20 and 23-24 are similarly argued based at least on their dependency from one of claims 1, 7, 13 and 19, and therefore remain rejected based at least on their dependency from one of claims 1, 7, 13 and 19 above.

12.	Applicant argues claim 1, on pages 9-10 of the remarks filed on 3/3/2022, in the substance that:
	“Claim 1 recites "attempting to generate a source memory address for the single load operation" and "responsive to a memory fault resulting from the attempt to generate the source memory address, re-executing the instruction. These features are not taught or rendered obvious by the cited references. The Office Action acknowledges at pages 7-8 of the Office Action that Col does not teach these features, and instead alleges that Goveas reaches these features. However, the Goveas does not overcome the deficiencies of Col. Goveas teaches a processor that handles a masked load instruction, which may be a 256-bit memory operand, where "the processor.... after receiving the instruction, breaks up the 256-bit memory operand into eight 32-bit sub-operands...8 separate load pops are issued from microcode to load each 32-bit piece individually. Therefore, 8 addresses are generated." Goveas, para. [0021]. Goveas nowhere teaches or suggests generating a source memory address, nor re-executing an instruction responsive to a memory fault resulting from an attempt to generate a source memory address. Further, one skilled in the art would not understand Goveas' process requiring the breaking of a memory operand into eight distinct sub-operands for processing as suggesting these features of claim 1. 
	In view of the foregoing, present claim 1 is novel and non-obvious in view of the combination of Col and Goveas. Independent claim 7 includes similar features. As such, claim 7 is novel and non- obvious in view of the cited references, mutatis mutandis.”

	The examiner first notes that the previous office does not indicate that Col does not teach “attempting to generate a source memory address for the single load operation” as the applicant asserts above.  On page 6 of the previous office action the examiner states Col teaches “wherein the execution pipeline is configured to attempt to execute the instruction by decoding the instruction into a single load operation and the SIMD operation and attempting to generate a source memory address for the single load operation” (Column 7, line 10-56:  wherein the execution pipeline attempts to execute the load-ALU SIMD instruction by decoding the instruction into a single SIMD micro-instruction to load a SIMD operand and a SIMD micro-instruction to perform an ALU operation.  Wherein memory operations require generating a memory address (i.e. source address) to load the data from memory, and therefore address stage (element 108) would generate an address for the load SIMD micro-instruction (See Figs. 1-2))”.  Therefore, it is clear that Col does teach the first argued limitation above.
	The examiner notes that the office action acknowledges that Col does not teach specifically “execute the instruction in the first execution mode by attempting to generate a source memory address for the single load operation” because Col does not teach a first execution mode.  However, Col does teach attempting to generate a source memory address for the single load operation.
	In addition, Goveas does teach “attempting to generating a source memory address for the single load operation”, as well as attempting the generation in a first mode as claimed.  Goveas indicates in paragraph [0028], as cited in previous office action, that a processor can have a maximum native data size, it explicitly states “if the maximum native data size of the processor is 128-bits and the mask load instruction has a 256-bit operand, the processor will break the operand into two 128-bit sub-operands. The method 400 will be explained using a 256-bit variant of the VMASKMOVPS masked load instruction; however, the method can be modified to handle any of the various types of masked load instructions discussed herein.”  The examiner cited this previously and clarified that the examiner is interpreting the instruction in this case to be a VMASKMOVPS instruction that has a 128-bit operand in a native processor with a maximum data size of 128-bits, therefore the processor would only generate a single memory address for the single load operation with a single 128-bit operand.  One of ordinary skill in the art would see that paragraph which states any various type of instruction can be used and know that the example using 256-bits is merely an example and that a 128-bit VMASKMOVPS instruction can be used as the examiner cited in previous office action.  The examiner further indicates that based on the above explanation/clarification it is clear that the examiner is not using Goveas teaching of breaking memory operands into eight sub-operands to teach the above argued limitation.  Additionally, the examiner notes that the applicant argues paragraph [0021] of Goveas, however the examiner has not cited that for teaching the limitation and suggests the applicant read from the citations provided in the office action for further clarity.
	In addition, the applicant argues that Goveas does not teach “responsive to a memory fault resulting from the attempt to generate the source memory address, re-executing the instruction”.  However, the examiner respectfully disagrees as it is clear from the above that Goveas teaches generating a source memory address, and paragraphs [0030-0032] indicate responsive to a memory fault resulting from the attempt to generate the source memory address, re-execution of the instruction occurs (Also see Goveas Fig. 4 for further clarification).  For further clarity the examiner notes that paragraph [0023] of Goveas states “a fault can occur from a page fault, a protection faults, a data cache miss (line is not in the cache) or translation look aside buffer (TLB) miss (page mapping does not exist in the TLB).”  One of ordinary skill in the art would know that a TLB miss occurs when a page table entry required for conversion of virtual address to physical address is not present in the TLB, as that is the definition of a TLB miss.  Therefore, when a fault occurs due to a TLB miss, a fault is occurring in an attempt to generate a source memory address. It is therefore clear that Goveas does teach re-executing an instruction responsive to a memory fault resulting from an attempt to generate a source memory address.

	Claim 7 is argued for similar reasons as claim 1 above, and therefore remains rejected for the same reasons as claim 1 above. 
	
13.	Applicant argues claim 13, on page 10 of the remarks filed on 3/3/2022, in the substance that:
	“In addition, the cited references do not teach or suggest each limitation of claim 13. In particular, the cited references do not teach or suggest a method in which attempting to execute an instruction in a first execution mode that includes "attempting to generate a destination address for the single store operation." Once again, the Office Action acknowledges that Col does not teach or suggest this feature. However, Goveas still fails to address or overcome this deficiency. In particular, Goveas describes how a 256-bit mask store instruction is ultimately broken eight different 32-bit operands and how the processor will issue eight different store-check operations, one for each of the eight 32-bit operands. See Goveas, para. [0026]. Correspondingly, this reference provides no teaching or suggestion of any method for attempting to generate a destination address for a single store operation.”

	The examiner notes that the previous office does not indicate that Col does not teach “attempting to generate a destination address for the single store operation” as the applicant asserts above.  On page 16 of the previous office action the examiner states Col teaches “wherein the execution pipeline is configured to attempt to execute the instruction by decoding the instruction into the SIMD operation and a single store operation and attempting to generate a destination address for the single store operation” (Column 5, lines 9-58, Column 7, lines 10-39 and Column 9, lines 39-52:  wherein a microprocessor executes a SIMD store-ALU macroinstruction by decoding the SIMD store-ALU instruction into a SIMD ALU microinstruction and a SIMD store instruction. Wherein store microinstructions require generating a memory address (i.e. destination address) to store the data operated on by the SIMD ALU operation, and therefore address stage (element 108) would generate an address for the store SIMD micro-instruction (See Figs. 1-2)).  Therefore, it is clear that Col does teach the argued limitation above.
	The examiner notes that the office action acknowledges that Col does not teach specifically “execute the instruction in the first execution mode and attempting to generate a destination address for the single store operation” because Col does not teach a first execution mode.  However, Col does teach attempting to generate a destination address for the single store operation.
	In addition, Goveas does teach “attempting to generate a destination address for the single store operation”, as well as attempting the generation in a first mode as claimed.  Goveas indicates in paragraph [0028], as cited in previous office action, that a processor can have a maximum native data size, it explicitly states “if the maximum native data size of the processor is 128-bits and the mask load instruction has a 256-bit operand, the processor will break the operand into two 128-bit sub-operands. The method 400 will be explained using a 256-bit variant of the VMASKMOVPS masked load instruction; however, the method can be modified to handle any of the various types of masked load instructions discussed herein.” (Note:  In addition, paragraph [0016] of Goveas indicates that VMASKMOVPS instructions are load or store instruction variations, therefore the VMASKMOVPS can be a 128-bit masked store instruction, as indicated in the previous office action.) The examiner cited this previously and clarified that the examiner is interpreting the instruction in this case to be a VMASKMOVPS instruction that has a 128-bit operand in a native processor with a maximum data size of 128-bits, therefore the processor would only generate a single destination address for the single store operation with a single 128-bit operand.  One of ordinary skill in the art would see that paragraph which states any various type of instruction can be used and know that the example using 256-bits is merely an example and that a 128-bit VMASKMOVPS instruction can be used as the examiner cited in previous office action.  The examiner further indicates that based on the above explanation/clarification it is clear that the examiner is not using Goveas teaching of breaking 256-bit memory operands into eight sub-operands to teach the above argued limitation.  
	Therefore, the examiner respectfully disagrees with the applicant’s arguments above.
	Claim 19 is argued for similar reasons as claim 13 above, and therefore remains rejected for the same reasons as claim 13 above. 

	
Conclusion
14.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

15.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P CARMICHAEL-MOODY whose telephone number is (571)431-0692. The examiner can normally be reached M-F, 10am-7pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/COURTNEY P CARMICHAEL-MOODY/Primary Examiner, Art Unit 2183