DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 11, 2021 has been entered.
 
Claims 1-2, 4-6, 8-9, 11-13, 15-16, 18-20, and 22 are pending in this office action and presented for examination. Claims 1, 8, and 15 are newly amended, and claim 22 is newly added by the RCE received January 11, 2021.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-2, 4-6, 8-9, 11-13, 15-16, 18-20, and 22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 4-6, 8-9, 11-13, 15-16, and  of U.S. Patent No. 10514923 in view of Wilder et al. (Wilder) (US 20100274990) in view of Lin et al. (Lin) (US 20030023646) in view of Fox (US 5119484) in view of Dockser et al. (Dockser) (US 20070174379 A1). 
Although the claims at issue are not identical, they are not patentably distinct from each other because all the limitations of each of the aforementioned instant claims are taught by a corresponding claim of the ‘923 patent (instant claim 1 corresponds to claim 1 of the ‘923 patent, instant claim 2 corresponds to claim 2 of the ‘923 patent … instant claim 21 corresponds to claim 21 of the ‘923 patent, instant claim 22 corresponds to claim 1 of the ‘923 patent), except for the limitations “negation circuitry to negate the aforementioned first and second temporary signed quadword products to generate first and second negated signed quadword products” and “the instruction includes three operands identifying the first, second, and third register” and “a first and a second temporary register to store the first and second negated signed quadword products, respectively” and “the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products” (or a slight variant thereof) recited in each independent claim, and the limitation “one or more additional temporary signed quadword products generated responsive to execution of one additional instruction are to be subtracted from the first accumulated signed quadword result and second accumulated signed quadword result” in claim 22.
On the other hand, Wilder discloses negation circuitry to negate products to generate negated products for combining ([0111], lines 17-20, if a multiply-subtract operation is being performed the negate circuit 266 will negate the complex multiplication result produced by the multiplication circuit 262 prior to input to the adder 268). 

However, the combination thus far does not entail the instruction includes three operands identifying the first, second, and third register. The combination thus far also does not disclose a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. The combination thus far also does not disclose the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for a destination register to be specified by a source operand (as taught by Lin) rather than be specified by a separate destination operand (as taught by the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent), as this modification merely entails the simple substitution of one known element for another to obtain predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Additionally, Lin’s teaching reduces complexity in the decoder (Lin, [0079], line 7). Alternatively, Lin’s teaching may save space, as the size of the instruction word may decrease from not using a separate destination operand field. Note that the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent and Wilder, when modified by the teaching of Lin which precludes use of the dest field by specifying a destination using a source field, results in the overall claim limitation that the instruction includes three operands identifying the first, second, and third register.
However, the combination thus far does not disclose a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. The combination thus far also does not disclose the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.

It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to implement the instruction of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent, Wilder, and Lin using temporary registers as taught by Fox, such that the overall combination entails a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. This modification entails a combination of prior art elements (the instruction of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent, Wilder, and Lin, and Fox’s temporary registers) according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.
However, the combination thus far does not disclose the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Dockser discloses negating comprising inverting bits of first and second values and results of which are added by a binary one to generate first and second 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the negation circuitry of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent, Wilder, Lin, and Fox to perform negation using the specific method of negation taught by Dockser, such that the overall combination entails the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products. This modification entails a simple substitution of one known element (the negation of the negation circuitry of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent, Wilder, Lin, and Fox) for another (Docker’s specific method of negation) to obtain predictable results (the negation circuitry of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent, Wilder, Lin, and Fox performing Docker’s specific method of negation). Note that Docker’s specific method of negating values, when applied to the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘923 patent, Wilder, Lin, and Fox wherein the values that are negated are first and second temporary signed quadword products in particular, results in the overall claimed limitation.
.

Claims 1-2, 4-6, 8-9, 11-13, 15-16, 18-20, and 22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 4-6, 8-9, 11-13, 15-16, and 18-20 of U.S. Patent No. 10664270 in view of AMD (AMD64 Architecture Programmer’s Manual - Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions) in view of Wilder et al. (Wilder) (US 20100274990) in view of Lin et al. (Lin) (US 20030023646) in view of Fox (US 5119484) in view of Dockser et al. (Dockser) (US 20070174379 A1). 
Although the claims at issue are not identical, they are not patentably distinct from each other because all the limitations of each of the aforementioned instant claims are taught by a corresponding claim of the ‘270 patent (instant claim 1 corresponds to claim 1 of the ‘270 patent, instant claim 2 corresponds to claim 2 of the ‘270 patent … instant claim 21 corresponds to claim 21 of the ‘270 patent, instant claim 22 corresponds to claim 1 of the ‘923 patent), except for the limitation “negation circuitry to negate the aforementioned first and second temporary signed quadword products to generate first and second negated signed quadword products” and the limitation “the instruction includes three operands identifying the first, second, and third register“ and the limitation “a first and a second temporary register to store the first and second negated signed quadword products, respectively” recited in each independent claim and the limitation “signed” in each claim and “the negation circuitry is to invert bits of the first and 
On the other hand, AMD discloses signed data (page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source; page 191, Packed Multiply Accumulate “Signed” Low Doubleword to Signed Quadword).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent to operate on signed data, as this modification merely entails applying a known technique (signed data) to a known device (method, or product) ready for improvement to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. 
In addition, Wilder discloses negation circuitry to negate products to generate negated products for combining ([0111], lines 17-20, if a multiply-subtract operation is being performed the negate circuit 266 will negate the complex multiplication result produced by the multiplication circuit 262 prior to input to the adder 268). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent to support a packed multiply “subtract” signed low doubleword to signed quadword operation in order to increase processor capability and functionality. Alternatively, it would have 
In addition, Lin discloses that a same operand can identify a register as being both a source and a destination ([0079], lines 14-16, the control signal, in one embodiment, may only include SRC1 and SRC2, and that SRC1 (or SRC2) identifies the destination register).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for a destination register to be specified by a source operand (as taught by Lin) rather than be specified by a separate destination operand (as taught by the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent), as this modification merely entails the simple substitution of one known element for another to obtain predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Additionally, Lin’s teaching reduces complexity in the decoder (Lin, [0079], line 7). Alternatively, Lin’s teaching may save space, as the size of the instruction word may decrease 
In addition, Fox discloses using temporary registers to execute instructions (col. 2, lines 14-22, in addition, execution of some instructions require several cycles through the arithmetic and logic circuits, with each cycle producing intermediate results, and the temporary registers in particular may be used to store the intermediate results. After processing, the processed data is stored in the working registers prior to being transmitted to the final storage location, which may also be a general purpose register or the memory).
It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to implement the instruction of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent, AMD, Wilder, and Lin using temporary registers as taught by Fox, such that the overall combination entails a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. This modification entails a combination of prior art elements (the instruction of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent, AMD, Wilder, and Lin, and Fox’s temporary registers) according to known methods 
However, the combination thus far does not disclose the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Dockser discloses negating comprising inverting bits of first and second values and results of which are added by a binary one to generate first and second negated values ([0005], lines 1-3, the negation of any two's compliment [sic] number may be formed by bit-wise inverting the number (yielding the one's complement), and adding one).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the negation circuitry of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent, AMD, Wilder, Lin, and Fox to perform negation using the specific method of negation taught by Dockser, such that the overall combination entails the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products. This modification entails a simple substitution of one known element (the negation of the negation circuitry of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent, AMD, Wilder, Lin, and Fox) for another (Docker’s specific method of negation) to obtain predictable results (the negation circuitry of the combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the 
Regarding the additional limitations of claim 22 relative to claim 1, Wilder further discloses one or more additional temporary products generated responsive to execution of one additional instruction are to be subtracted  ([0005], lines 1-7, one type of operation which can benefit from the SIMD approach is the multiply-accumulate operation, which can take the form of A+B.times.C, or A-B.times.C. The multiplication operation B.times.C is typically performed multiple times for different values of B and C, with each multiplication result then being added (or subtracted) from the running accumulate value A.) It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the previously explained combination of the invention of each of the aforementioned independent claims (and dependent claims, based on dependency to the aforementioned independent claims) of the ‘270 patent, AMD, Wilder, Lin, and Fox (which entails the first accumulated signed quadword result and the second accumulated signed quadword result) to entail further subtraction of one or more additional temporary products generated responsive to execution of one additional instruction therefrom, as further taught by Wilder, in order to increase processor capability and functionality by supporting running accumulation. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the previously explained combination of the invention of each of the .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-6, 8-9, 11-13, 15-16, 18-20, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over AMD (AMD64 Architecture Programmer’s Manual - Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions) in view of Wilder et al. (Wilder) (US 20100274990) in view of Lin et al. (Lin) (US 20030023646) in view of Fox (US 5119484) in view of Dockser et al. (Dockser) (US 20070174379 A1).
Consider claim 1, AMD discloses a processor comprising: an instruction including an opcode and operands identifying a first, second, and third register (page 191, the VPMACSDQL instruction requires four operands: VPMACSDQL dest, src1, src2, src3 dest = src1 * src2 + src3; page 191, opcode 97 /r /is4), wherein the first register is to store a first plurality of packed signed doubleword data elements (page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source), the second register to store a second plurality of packed signed doubleword data elements (page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, corresponding 32-bit signed integer value in the second source), and the third register is to store a plurality of packed signed quadword data elements (page 192, src3 register having two packed 64-bit data elements; page 191, low-order 64-bit signed integer value in the third source; page 191, second 64-bit signed integer value in the third source); hardware execution circuitry to execute the instruction, the hardware execution circuitry comprising: multiplier circuitry to multiply first and second packed signed doubleword data elements from the first register with third and fourth packed signed doubleword data elements from the second register, respectively, to generate first and second temporary signed quadword products (page 191, multiplies the low-order 32-bit signed integer value of the first source by the low-order 32-bit signed integer value in the second source; 
However, AMD does not explicitly disclose a hardware decoder to decode the aforementioned instruction. AMD also does not disclose negation circuitry to negate the aforementioned first and second temporary signed quadword products to generate first and second negated signed quadword products for the aforementioned combining. AMD also does not disclose the instruction includes three operands identifying the first, second, and third register. AMD also does not disclose a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. AMD also 
On the other hand, Wilder discloses a hardware decoder to decode an instruction ([0063], lines 1-2, instruction decoder circuitry) and negation circuitry to negate products to generate negated products for combining ([0111], lines 17-20, if a multiply-subtract operation is being performed the negate circuit 266 will negate the complex multiplication result produced by the multiplication circuit 262 prior to input to the adder 268). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to entail the use of a hardware decoder to decode an instruction as taught by Wilder, as this modification merely entails combining prior art elements according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Alternatively, the decoding of encoded instructions saves memory space relative to storing non-encoded instructions in memory.  In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to support a packed multiply “subtract” signed low doubleword to signed quadword operation in order to increase processor capability and functionality. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to support a packed multiply “subtract” signed low doubleword to signed quadword operation, as this modification merely entails applying a known technique (Wilder’s teaching of using a negate circuit to perform a subtraction) to a known device (method, or product) ready for improvement to yield predictable results, which is an exemplary rationale that 
However, the combination thus far does not entail the instruction includes three operands identifying the first, second, and third register. The combination thus far also does not disclose a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. The combination thus far also does not disclose the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Lin discloses that a same operand can identify a register as being both a source and a destination ([0079], lines 14-16, the control signal, in one embodiment, may only include SRC1 and SRC2, and that SRC1 (or SRC2) identifies the destination register).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for a destination register to be specified by a source operand (as taught by Lin) rather than be specified by a separate destination operand (as taught by AMD), as this modification merely entails the simple substitution of one known element for another to obtain predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Additionally, Lin’s teaching reduces complexity in the decoder (Lin, [0079], line 7). Alternatively, Lin’s teaching may save space, as the size of the instruction word may decrease from not using a separate destination operand field. Note that the combination of AMD and Wilder (which entails dest, src1, src2, and src3 operand fields, as cited 
However, the combination thus far does not disclose a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. The combination thus far also does not disclose the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Fox discloses using temporary registers to execute instructions (col. 2, lines 14-22, in addition, execution of some instructions require several cycles through the arithmetic and logic circuits, with each cycle producing intermediate results, and the temporary registers in particular may be used to store the intermediate results. After processing, the processed data is stored in the working registers prior to being transmitted to the final storage location, which may also be a general purpose register or the memory).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the instruction of the combination of AMD, Wilder, and Lin using temporary registers as taught by Fox, such that the overall combination entails a first and a second temporary register to store the aforementioned first and second negated signed quadword products, respectively. This modification entails a combination of prior art elements (the instruction of the combination of AMD, Wilder, and Lin, and Fox’s temporary registers) according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.

On the other hand, Dockser discloses negating comprising inverting bits of first and second values and results of which are added by a binary one to generate first and second negated values ([0005], lines 1-3, the negation of any two's compliment [sic] number may be formed by bit-wise inverting the number (yielding the one's complement), and adding one).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the negation circuitry of the combination of AMD, Wilder, Lin, and Fox to perform negation using the specific method of negation taught by Dockser, such that the overall combination entails the negation circuitry is to invert bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products. This modification entails a simple substitution of one known element (the negation of the negation circuitry of the combination of AMD, Wilder, Lin, and Fox) for another (Docker’s specific method of negation) to obtain predictable results (the negation circuitry of the combination of AMD, Wilder, Lin, and Fox performing Docker’s specific method of negation). Note that Docker’s specific method of negating values, when applied to the combination of AMD, Wilder, Lin, and Fox wherein the values that are negated are first and second temporary signed quadword products in particular, results in the overall claimed limitation.

Consider claim 2, the overall combination discloses saturation circuitry to saturate the first and second accumulated signed quadword results prior to storage in the third register 

Consider claim 4, the overall combination discloses the first, second, and third registers comprise 128-bit registers configured to store four packed signed doubleword data elements or two packed signed quadword data elements (AMD, page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source; page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, corresponding 32-bit signed integer value in the second source; page 192, src3 register having two packed 64-bit data elements; page 191, low-order 64-bit signed integer value in the third source; page 191, second 64-bit signed integer value in the third source).

Consider claim 5, the overall combination discloses responsive to a first opcode (AMD, page 191, the VPMACSDQL instruction requires four operands: VPMACSDQL dest, src1, src2, src3 dest = src1 * src2 + src3; page 191, opcode 97 /r /is4), the first and third packed signed doubleword data elements are to be selected from packed signed doubleword location [31:0] of the first and second registers, respectively, and the second and fourth packed signed doubleword data elements are to be selected from packed signed doubleword location [95:64] of the first and second registers, respectively (AMD, page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source; page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, 

Consider claim 6, the overall combination discloses responsive to a second opcode (AMD, page 188, the VPMACSDQH instruction requires four operands: VPMACSDQH dest, src1, src2, src3 dest = src1 * src2 + src3; page 188, opcode 9F /r /is4), the first and third packed signed doubleword data elements are to be selected from packed signed doubleword location [63:32] of the first and second registers, respectively, and the second and fourth packed signed doubleword data elements are to be selected from packed signed doubleword location [127:96] of the first and second registers, respectively (AMD, page 189, src1 register having four packed 32-bit data elements; page 188, second 32-bit signed integer value of the first source; page 188, fourth 32-bit signed integer value of the first source; page 189, src2 register having four packed 32-bit data elements; page 188, second 32-bit signed integer value in the second source; page 188, fourth 32-bit signed integer value in the second source; page 189, doubleword locations [63:32] and [127:96] in src1 and src2).

Consider claim 8, AMD discloses a method comprising: an instruction including an opcode, and operands identifying a first, second, and third register (page 191, the VPMACSDQL instruction requires four operands: VPMACSDQL dest, src1, src2, src3 dest = src1 * src2 + src3; page 191, opcode 97 /r /is4), wherein the first register is to store a first plurality of packed signed doubleword data elements (page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source), the second register is to store a second plurality of packed 
However, AMD does not explicitly disclose decoding the aforementioned instruction. AMD also does not disclose negating the aforementioned first and second temporary signed quadword products to generate first and second negated signed quadword products for the aforementioned accumulating. AMD also does not disclose the instruction includes three operands identifying the first, second, and third register. AMD also does not disclose storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. AMD also does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Wilder discloses decoding an instruction ([0063], lines 1-2, instruction decoder circuitry) and negating products to generate negated products for combining ([0111], lines 17-20, if a multiply-subtract operation is being performed the negate circuit 266 will negate the complex multiplication result produced by the multiplication circuit 262 prior to input to the adder 268). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to entail decoding an instruction as taught by Wilder, as this modification merely entails combining prior art elements according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Alternatively, the decoding of encoded instructions saves memory space relative to storing non-encoded instructions in memory.  In 
However, the combination thus far does not entail the instruction includes three operands identifying the first, second, and third register. The combination thus far also does not disclose storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. The combination thus far also does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Lin discloses that a same operand can identify a register as being both a source and a destination ([0079], lines 14-16, the control signal, in one embodiment, may only include SRC1 and SRC2, and that SRC1 (or SRC2) identifies the destination register).

However, the combination thus far does not disclose storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. The combination thus far also does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Fox discloses using temporary registers to execute instructions (col. 2, lines 14-22, in addition, execution of some instructions require several cycles through the arithmetic and logic circuits, with each cycle producing intermediate results, and the temporary registers in particular may be used to store the intermediate results. After processing, the processed data is stored in the working registers prior to being transmitted to the final storage location, which may also be a general purpose register or the memory).

However, the combination thus far does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Dockser discloses negating comprises inverting bits of first and second values and results of which are added by a binary one to generate first and second negated values ([0005], lines 1-3, the negation of any two's compliment [sic] number may be formed by bit-wise inverting the number (yielding the one's complement), and adding one).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the negating of the combination of AMD, Wilder, Lin, and Fox to be the specific method of negation taught by Dockser, such that the overall combination entails the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products. This modification entails a simple substitution of one known element (the negating of the combination of AMD, Wilder, Lin, and Fox) for another (Docker’s specific method of negation) to obtain predictable results (the negating of the combination of AMD, 

Consider claim 9, the overall combination discloses saturating the first and second accumulated signed quadword results prior to storage in the third register (AMD, page 200, Packed Multiply Accumulate Signed Low Doubleword to Signed Quadword with Saturation; page 200, the saturated results are written to the destination register).

Consider claim 11, the overall combination discloses the first, second, and third registers comprise 128-bit registers configured to store four packed signed doubleword data elements or two packed signed quadword data elements (AMD, page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source; page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, corresponding 32-bit signed integer value in the second source; page 192, src3 register having two packed 64-bit data elements; page 191, low-order 64-bit signed integer value in the third source; page 191, second 64-bit signed integer value in the third source).

Consider claim 12, the overall combination discloses responsive to a first opcode (AMD, page 191, the VPMACSDQL instruction requires four operands: VPMACSDQL dest, src1, src2, src3 dest = src1 * src2 + src3; page 191, opcode 97 /r /is4), the first and third packed signed 

Consider claim 13, the overall combination discloses responsive to a second opcode (AMD, page 188, the VPMACSDQH instruction requires four operands: VPMACSDQH dest, src1, src2, src3 dest = src1 * src2 + src3; page 188, opcode 9F /r /is4), the first and third packed signed doubleword data elements are to be selected from packed signed doubleword location [63:32] of the first and second registers, respectively, and the second and fourth packed signed doubleword data elements are to be selected from packed signed doubleword location [127:96] of the first and second registers, respectively (AMD, page 189, src1 register having four packed 32-bit data elements; page 188, second 32-bit signed integer value of the first source; page 188, fourth 32-bit signed integer value of the first source; page 189, src2 register having four packed 32-bit data elements; page 188, second 32-bit signed integer value in the second source; page 188, fourth 32-bit signed integer value in the second source; page 189, doubleword locations [63:32] and [127:96] in src1 and src2).

Consider claim 15, AMD discloses a method comprising: an instruction including an opcode, and operands identifying a first, second, and third register (page 191, the VPMACSDQL instruction requires four operands: VPMACSDQL dest, src1, src2, src3 dest = src1 * src2 + src3; page 191, opcode 97 /r /is4), wherein the first register is to store a first plurality of packed signed doubleword data elements (page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source), the second register to store a second plurality of packed signed doubleword data elements (page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, corresponding 32-bit signed integer value in the second source), and the third register is to store a plurality of packed signed quadword data elements (page 192, src3 register having two packed 64-bit data elements; page 191, low-order 64-bit signed integer value in the third source; page 191, second 64-bit signed integer value in the third source); and executing the instruction by: multiplying first and second packed signed doubleword data elements from the first register with third and fourth packed signed doubleword data elements from the second register, respectively, to generate first and second temporary signed quadword products (page 191, multiplies the low-order 32-bit signed integer value of the first source by the low-order 32-bit signed integer value in the second source; page 191, Simultaneously, multiplies the third 32-bit signed integer value of the first source by the corresponding 32-bit signed integer value in the second source), the first, second, third, and fourth signed doubleword data elements to be selected based on the opcode of the instruction (page 191, Packed Multiply Accumulate Signed “Low” Doubleword to Signed Quadword), accumulating the first signed quadword product with a first packed signed quadword value read from the third register to generate a first accumulated signed quadword result and 
However, AMD does not explicitly disclose decoding the aforementioned instruction. AMD also does not disclose negating the aforementioned first and second temporary signed quadword products to generate first and second negated signed quadword products for the aforementioned accumulating. AMD also does not explicitly disclose a non-transitory machine-readable medium, not being a signal per se, having program code stored thereon which, when executed by a machine, causes the machine to perform the aforementioned operations. AMD also does not disclose the instruction includes three operands identifying the first, second, and third register. AMD also does not disclose storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. AMD also does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Wilder discloses decoding an instruction ([0063], lines 1-2, instruction decoder circuitry) and negating products to generate negated products for combining ([0111], 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to entail decoding an instruction as taught by Wilder, as this modification merely entails combining prior art elements according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Alternatively, the decoding of encoded instructions saves memory space relative to storing non-encoded instructions in memory.  In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to support a packed multiply “subtract” signed low doubleword to signed quadword operation in order to increase processor capability and functionality. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the invention of AMD to support a packed multiply “subtract” signed low doubleword to signed quadword operation, as this modification merely entails applying a known technique (Wilder’s teaching of using a negate circuit to perform a subtraction) to a known device (method, or product) ready for improvement to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that other rationales that may support a conclusion of obviousness, as per MPEP 2143, are also applicable. For example, this modification also merely entails simple substitution of one known element (addition) for another (subtraction, as taught by 
However, the combination thus far does not entail the instruction includes three operands identifying the first, second, and third register. The combination thus far also does not disclose storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. The combination thus far also does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for a destination register to be specified by a source operand (as taught by Lin) rather than be specified by a separate destination operand (as taught by AMD), as this modification merely entails the simple substitution of one known element for another to obtain predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Additionally, Lin’s teaching reduces complexity in the decoder (Lin, [0079], line 7). Alternatively, Lin’s teaching may save space, as the size of the instruction word may decrease from not using a separate destination operand field. Note that the combination of AMD and Wilder (which entails dest, src1, src2, and src3 operand fields, as cited above), when modified by the teaching of Lin which precludes use of the dest field by specifying a destination using a source field, results in the overall claim limitation that the instruction includes three operands identifying the first, second, and third register.
However, the combination thus far does not disclose storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. The combination thus far also does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Fox discloses using temporary registers to execute instructions (col. 2, lines 14-22, in addition, execution of some instructions require several cycles through the arithmetic and logic circuits, with each cycle producing intermediate results, and the temporary 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the instruction of the combination of AMD, Wilder, and Lin using temporary registers as taught by Fox, such that the overall combination entails storing the aforementioned first and second negated signed quadword products in a first and a second temporary register, respectively. This modification entails a combination of prior art elements (the instruction of the combination of AMD, Wilder, and Lin, and Fox’s temporary registers) according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.
However, the combination thus far does not disclose the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated signed quadword products.
On the other hand, Dockser discloses negating comprises inverting bits of first and second values and results of which are added by a binary one to generate first and second negated values ([0005], lines 1-3, the negation of any two's compliment [sic] number may be formed by bit-wise inverting the number (yielding the one's complement), and adding one).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the negating of the combination of AMD, Wilder, Lin, and Fox to be the specific method of negation taught by Dockser, such that the overall combination entails the negating comprises inverting bits of the first and second temporary signed quadword products and results of which are added by a binary one to generate the first and second negated 

Consider claim 16, the overall combination discloses saturating the first and second accumulated signed quadword results prior to storage in the third register (AMD, page 200, Packed Multiply Accumulate Signed Low Doubleword to Signed Quadword with Saturation; page 200, the saturated results are written to the destination register).

Consider claim 18, the overall combination discloses the first, second, and third registers comprise 128-bit registers configured to store four packed signed doubleword data elements or two packed signed quadword data elements (AMD, page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source; page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, corresponding 32-bit signed integer value in the second source; page 192, src3 register having two packed 64-bit data elements; page 191, low-order 64-bit signed integer value in the third source; page 191, second 64-bit signed integer value in the third source).

Consider claim 19, the overall combination discloses responsive to a first opcode (AMD, page 191, the VPMACSDQL instruction requires four operands: VPMACSDQL dest, src1, src2, src3 dest = src1 * src2 + src3; page 191, opcode 97 /r /is4), the first and third packed signed doubleword data elements are to be selected from packed signed doubleword location [31:0] of the first and second registers, respectively, and the second and fourth packed signed doubleword data elements are to be selected from packed signed doubleword location [95:64] of the first and second registers, respectively (AMD, page 192, src1 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value of the first source; page 191, third 32-bit signed integer value of the first source; page 192, src2 register having four packed 32-bit data elements; page 191, low-order 32-bit signed integer value in the second source; page 191, corresponding 32-bit signed integer value in the second source; page 192, doubleword locations [31:0] and [95:64] in src1 and src2).

Consider claim 20, the overall combination discloses responsive to a second opcode (AMD, page 188, the VPMACSDQH instruction requires four operands: VPMACSDQH dest, src1, src2, src3 dest = src1 * src2 + src3; page 188, opcode 9F /r /is4), the first and third packed signed doubleword data elements are to be selected from packed signed doubleword location [63:32] of the first and second registers, respectively, and the second and fourth packed signed doubleword data elements are to be selected from packed signed doubleword location [127:96] of the first and second registers, respectively (AMD, page 189, src1 register having four packed 32-bit data elements; page 188, second 32-bit signed integer value of the first source; page 188, fourth 32-bit signed integer value of the first source; page 189, src2 register having four packed 32-bit data elements; page 188, second 32-bit signed integer value in the second source; page 

Consider claim 22, Wilder further discloses one or more additional temporary products generated responsive to execution of one additional instruction are to be subtracted  ([0005], lines 1-7, one type of operation which can benefit from the SIMD approach is the multiply-accumulate operation, which can take the form of A+B.times.C, or A-B.times.C. The multiplication operation B.times.C is typically performed multiple times for different values of B and C, with each multiplication result then being added (or subtracted) from the running accumulate value A.) It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the previously explained combination of AMD, Wilder, Lin, Fox, and Dockser (which entails the first accumulated signed quadword result and the second accumulated signed quadword result) to entail further subtraction of one or more additional temporary products generated responsive to execution of one additional instruction therefrom, as further taught by Wilder, in order to increase processor capability and functionality by supporting running accumulation. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the previously explained combination of AMD, Wilder, Lin, Fox, and Dockser (which entails the first accumulated signed quadword result and the second accumulated signed quadword result) to entail further subtraction of one or more additional temporary products generated responsive to execution of one additional instruction therefrom, as further taught by Wilder, as this modification merely entails applying a known technique (executing an instruction multiple times and/or performing a running difference) to a known device (method, or product) ready for 

Response to Arguments
Applicant on page 9 argues: “The Examiner has objected to the amendment filed August 27, 2019 under 35 U.S.C. 132(a) because it allegedly introduces new matter into the disclosure. The Applicant submits replacement sheets for Figure 13, 15, and 16 to overcomes the objections. Paragraph [0068] is amended to have different references for Register Index Field and Register in Figure 2C.”
In view of the aforementioned amendments, the previously presented objections to the specification are withdrawn.

Applicant on page 9 argues: “Replacement sheets are submitted accompanying this response based on the Examiner's input in Section 6 of Office Action pages 3-4. Specifically, the changes are explained following the enumeration in the Office Action: a. Figures lA-1B are resubmitted, and the resubmitted Figures lA-1B are updated to remove the shading; b. Figure 1A is resubmitted to remove overlapping text; and c. Figure 2C is resubmitted to remove the duplicated reference character 244. Additionally, noted in the Specification Objections Section above, figures 13, 15, and 16 are resubmitted to overcome the specification objections.”
In view of the aforementioned amendments, the previously presented objections to the drawings are withdrawn. Examiner thanks Applicant for the enumerated changes. 

Applicant on page 10 argues: “Should it still be deemed necessary by a future Office Action, the Applicant will address this double patenting rejection once the claims are allowed.”
Examiner acknowledges Applicant’s intent. 

Applicant on page 10 argues: “Claims 1-2, 4-6, 8-9, 11-13, 15-16, and 18-20 stand rejected under 35 U.S.C. § 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the Applicant regards as the invention.”
In view of amendments made to the claims, the previously presented indefinite rejections are withdrawn.
Applicant on page 12 argues: “Thus, the Applicant respectfully submits that Wilder fails to teach or suggest the specific negation operations performed on the first and second temporary signed quadword products to generate first and second negated signed quadword products. Since the claimed feature is just added to claim 1, understandably no references are cited to cure the deficiency of Wilder. For at least these reasons, the Applicant respectfully submits that Wilder, Lin, and Fox fails to teach or suggest amended claim 1.” Applicant on page 12 argues: “Claims 9 and 17 are amended similarly as amended claim 1, and the Applicant respectfully submits that the foregoing reasons apply, and AMD in view of Wilder fails to teach or suggest amended claims 9 and 17.” Applicant on page 13 argues: “These claims are allowable for at least the reason that they depend directly or indirectly on allowable independent claims 1, 9, and 17.”
In view of the newly amended subject matter, Examiner is newly relying upon the Dockser reference — see the Claim Rejections - 35 USC § 103 section above. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314.  The examiner can normally be reached on Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571)272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.