DETAILED ACTION
Claims 1-24 are pending.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/30/2020 has been entered.
The office acknowledges the following papers:
Claims and remarks filed on 4/30/2020.

	Withdrawn objections and rejections
The drawing objections for claims 7-8, 15-16, and 23-24 have been withdrawn.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the sign/zero extending limitations from claims 1, 9, and 17 must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure 
	
New Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-24 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or 
Claim 1 recites “wherein for each value of i, where i represents a number of packed quadwords stored in the first source register or the second source register, the multiplier circuitry is to multiply sign or zero extended word[4*i+0] of the first source register by sign or zero extended word[0] of quadword[i] of the second source register; sign or zero extended word[4*i+1] of the first source register by sign or zero extended word[1] of quadword[i] of the second source register; sign or zero extended word[4*i+2] of the first source register by sign or zero extended word[2] of quadword[i] of the second source register; and sign or zero extended word[4*i+3] of the first source register by sign or zero extended word[3] of quadword[i] of the first source register, to generate the plurality of temporary products;” (emphasis added). Claims 9 and 17 recite similar limitations. The closest written description support for the newly claimed limitation appears to come from the code sequence within paragraph 135 of the specification. This code sequence shows a FOR loop controlled from i=0 until KL-1. The FOR loop is run through twice for 128-bit registers, four times for 256-bit registers, and eight times for 512-bit registers. Within the FOR loop, the “p0dword”, “p1dword”, “p2dword”, and “p3dword” results show a calculation similar to the claimed four calculations for each value of “i”. However, the claimed structure doesn’t require the FOR loop to start at “i”=0 and instead starts the FOR loop at “i”=1. For example, if source registers A and B store 128-bits, then “i”=2. This requires the calculation of two p0dword results in the FOR loop. The calculation for each value of i is SIGN_EXTEND (SRC1.word[4*1+0]) * SIGN_EXTEND(t.word[0]) AND SIGN_EXTEND (SRC1.word[4*2+0]) * th and 8th word of source 1 respectively. However, the FOR loop provides written description support for sign extending the 0th and 4th word of source 1 respectively. Similar issues occur for “p1dword”, “p2dword”, and “p3dword” when calculating the last iteration of the first source register. Thus, the amendment failed to convey to one skilled in the art that possession of the claimed invention is present in the specification at the time of filing.
Claims 2-8, 10-16, and 18-24 are rejected due to their dependency.
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 1 recites “wherein for each value of i, where i represents a number of packed quadwords stored in the first source register or the second source register, the multiplier circuitry is to multiply sign or zero extended word[4*i+0] of the first source register by sign or zero extended word[0] of quadword[i] of the second source register; sign or zero extended word[4*i+1] of the first source register by sign or zero extended word[1] of quadword[i] of the second source register; sign or zero extended word[4*i+2] of the first source register by sign or zero extended word[2] of quadword[i] of sign or zero extended word[4*i+3] of the first source register by sign or zero extended word[3] of quadword[i] of the first source register, to generate the plurality of temporary products;” (emphasis added). Claims 9 and 17 recite similar limitations. The closest written description support for the newly claimed limitation appears to come from the code sequence within paragraph 135 of the specification. This code sequence shows a FOR loop controlled from i=0 until KL-1. The FOR loop is run through twice for 128-bit registers, four times for 256-bit registers, and eight times for 512-bit registers. Within the FOR loop, the “p0dword”, “p1dword”, “p2dword”, and “p3dword” results show a calculation similar to the claimed four calculations for each value of “i”. However, the claimed structure doesn’t require the FOR loop to start at “i”=0 and instead starts the FOR loop at “i”=1. For example, if source registers A and B store 128-bits, then “i”=2. This requires the calculation of two p0dword results in the FOR loop. The calculation for each value of i is SIGN_EXTEND (SRC1.word[4*1+0]) * SIGN_EXTEND(t.word[0]) AND SIGN_EXTEND (SRC1.word[4*2+0]) * SIGN_EXTEND(t.word[0]) (emphasis added). As can be seen, the first and second iteration calculation of “p0dword” requires sign extending the 4th and 8th word of source 1 respectively. However, the FOR loop provides written description support for sign extending the 0th and 4th word of source 1 respectively. Similar issues occur for “p1dword”, “p2dword”, and “p3dword” when calculating the last iteration of the first source register. For examination purposes, each value of “i” is decremented for calculation of each sign or zero extended word of the first source register.
Claims 2-8, 10-16, and 18-24 are rejected due to their dependency.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-24 are rejected under 35 U.S.C. 103 as being unpatentable over Matsuyama et al. (U.S. 8,271,571), in view of Takashima et al. (U.S. 2011/0099354), in view of Official Notice.
As per claim 1:
Matsuyama and Takashima disclosed a processor comprising: 
a decoder to decode instructions (Matsuyama: Figure 8 element 20, column 10 lines 61-66); 
a first source register to store a first plurality of packed words (Matsuyama: Figures 8 and 10A-B element 12, column 10 lines 49-53 and column 11 lines 15-19)(Register R2 of the first VCMAD and register R4 of the second VCMAD instructions each read upon a first source register storing a plurality of 16-bit data elements (i.e. packed words).); 
a second source register to store a second plurality of packed words (Matsuyama: 
a third source register to store a plurality of packed quadwords (Matsuyama: Figures 8 and 10A-B element 12, column 10 lines 53-56, column 11 lines 19-22, and column 11 lines 63-67)(Matsuyama disclosed an embodiment of using 128-bit registers to store multiple 64-bit (i.e. quadwords) execution results from parallel execution of the first and second VCMAD instructions. In addition, official notice is given that adjacent register pairs can be addressed as larger overlaid registers for the advantage of reducing encoding bits. Thus, it would have been obvious to one of ordinary skill in the art to implement registers R1/R0 as a 128-bit register storing both 64-bit data elements to be accumulated with the VCMAD.); 
execution circuitry to execute a first instruction (Matsuyama: Figure 8 element 23), the execution circuitry comprising:
multiplier circuitry to multiply each of a first plurality of doublewords with a corresponding one of a second plurality of doublewords to generate a plurality of temporary products, the first and second plurality of doublewords generated by sign-extending or zero-extending the first and second plurality of packed words, respectively (Takashima: Figure 14 elements 131-132, paragraph 90)(Matsuyama: Figures 10A-B elements 1320-1327, column 10 lines 9-21 and column 10 lines 61-67 continued to column 11 lines 1-3)(Takashima disclosed sign or zero extending 16-bit data to 32-bit data (i.e. doublewords) prior to input of the nd rejection, the limitation is interpreted as each value of “i” is decremented for calculation of each sign or zero extended word of the first source register. Matsuyama disclosed 64-bit source registers to store a single 64-bit quadwords. Thus, i=1 in this instance. In addition, in view of the above official notice, i=1 as well.), the multiplier circuitry is to multiply sign or zero extended word[4*i+0] of the first source register by sign or zero extended word[0] of quadword[i] of the second source register (Takashima: Figure 14 elements 131-132, paragraph 90)(Matsuyama: Figures 10A-B element 1320, column 10 lines 61-67 and column 11 lines 15-22)(In view of the 112 2nd rejection, the limitation is interpreted as each value of “i” is decremented for calculation of each sign or zero extended word of the first source register. The combination allows for the 16-bit data elements of Matsuyama to be zero or sign extended prior to multiplication. For a single quadword “i” value of one decremented to zero, multiplier 1320 receives the extended word[0] from R2 and the extended word[0] of quadword[0] from R3.); sign or zero extended word[4*i+1] of the first source register by sign or zero extended word[1] of quadword[i] of the second source register (Takashima: Figure 14 elements 131-132, paragraph 90)(Matsuyama: Figures 10A-B element 1321, nd rejection, the limitation is interpreted as each value of “i” is decremented for calculation of each sign or zero extended word of the first source register. The combination allows for the 16-bit data elements of Matsuyama to be zero or sign extended prior to multiplication. For a single quadword “i” value of one decremented to zero, multiplier 1321 receives the extended word[1] from R2 and the extended word[1] of quadword[0] from R3.); sign or zero extended word[4*i+2] of the first source register by sign or zero extended word[2] of quadword[i] of the second source register (Takashima: Figure 14 elements 131-132, paragraph 90)(Matsuyama: Figures 10A-B element 1322, column 10 lines 61-67 and column 11 lines 15-22)(In view of the 112 2nd rejection, the limitation is interpreted as each value of “i” is decremented for calculation of each sign or zero extended word of the first source register. The combination allows for the 16-bit data elements of Matsuyama to be zero or sign extended prior to multiplication. For a single quadword “i” value of one decremented to zero, multiplier 1322 receives the extended word[2] from R2 and the extended word[2] of quadword[0] from R3.); and sign or zero extended word[4*i+31 of the first source register by sign or zero extended word[3] of quadword[i] of the first source register, to generate the plurality of temporary products (Takashima: Figure 14 elements 131-132, paragraph 90)(Matsuyama: Figures 10A-B element 1323, column 10 lines 61-67 and column 11 lines 15-22)(In view of the 112 2nd rejection, the limitation is interpreted as each value of “i” is decremented for calculation of each sign or zero extended word of the first source register. The combination allows for the 16-bit 
adder circuitry to add at least a first set of the temporary products to generate a first temporary sum (Matsuyama: Figures 10A-B elements 1340-1343 and 1350-1351, column 10 lines 9-21 and column 10 lines 61-67 continued to column 11 lines 1-6)(The two-stage parallel adder circuit outputs two initial outputs.); and
accumulation circuitry to combine the first temporary sum with a first packed quadword value from a first quadword location in the third source register to generate a first accumulated quadword result (Matsuyama: Figures 10A-B elements 1370-1371, column 10 lines 9-21 and column 11 lines 7-9); 
a destination register to store the first accumulated quadword result in the first quadword location (Matsuyama: Figures 8 and 10A-B element 12, column 10 lines 53-56, column 11 lines 19-22, and column 11 lines 63-67)(Matsuyama disclosed two destination registers to store the accumulated execution result. In addition, Matsuyama disclosed an embodiment of using 128-bit registers to store multiple 64-bit (i.e. quadwords) execution results from parallel execution of the first and second VCMAD instructions. Lastly, in view of the above official notice, registers R1/R0 are a single overlaid 128-bit register.).
The advantage of sign or zero extending operand values is that storage efficiency increases as compared to storing extended operand values. Thus, it would have been 
As per claim 2:
Matsuyama and Takashima disclosed the processor of claim 1 wherein the destination register and third source register are the same register (Matsuyama: Figures 10A-B, column 10 lines 53-54)(Registers R0/R1 both store the accumulation value and the final writeback value.).
As per claim 3:
Matsuyama and Takashima disclosed the processor of claim 1 further comprising: 
saturation circuitry to saturate the first accumulated quadword result prior to storage in the destination register (Matsuyama: Figures 10A-B elements 1370-1371, column 10 lines 9-21 and column 11 lines 7-9)(Official notice is given that execution results can be saturated for the advantage of preventing overflow and underflow conditions. Thus, it would have been obvious to one of ordinary skill in the art to implement a saturation check on the final execution result.).
As per claim 4:
Matsuyama and Takashima disclosed the processor of claim 1 wherein the first and second plurality of packed words are sign-extended when the first and second plurality of packed words are signed and zero-extended when the first and second plurality of packed words are unsigned to generate the first and second plurality of doublewords (Takashima: Figure 14 elements 131-132, paragraph 90)(Matsuyama: Figures 10A-B, column 10 lines 49-53)(Takashima disclosed sign or zero extending 16-
As per claim 5:
Matsuyama and Takashima disclosed the processor of claim 1 wherein the first, second, and third source registers comprise 128-bit, 256-bit, or 512-bit registers to store 16 words, 32 words, or 64 words, respectively and/or to store 4 quadwords, 8 quadwords, or 16 quadwords, respectively (Matsuyama: Figures 8 and 10A-B element 12, column 10 lines 53-56, column 11 lines 19-22, and column 11 lines 63-67)(Matsuyama disclosed an embodiment of using 128-bit registers to store multiple 64-bit (i.e. quadwords) execution results and multiple 16-bit input data elements for parallel execution of the first and second VCMAD instructions.).
As per claim 6:
Matsuyama and Takashima disclosed the processor of claim 1 further comprising: 
masking circuitry to evaluate a writemask comprising a plurality of bits, each bit associated with a packed data element location in the destination register (Matsuyama: Figures 10A-B elements 1370-1371, column 10 lines 9-21 and column 11 lines 7-9)(Official notice is given that writemasks can be used in vector instructions for the advantage of selectively writing execution results to destination registers. Thus, it would have been obvious to one of ordinary skill in the art to implement writemasks with the VCMAD instructions.).
As per claim 7:

As per claim 8:
Matsuyama and Takashima disclosed the processor of claim 7 wherein if a determination is made not to write the first accumulated quadword result, then either zeroes are written to the first quadword location or no update to the first quadword location is performed (Matsuyama: Figures 10A-B elements 1370-1371, column 10 lines 9-21 and column 11 lines 7-9)(In view of the above official notice, a clear writemask bit for the destination data element prevents the writing of the execution result to the destination register.).
As per claim 9:
Claim 9 essentially recites the same limitations of claim 1. Therefore, claim 9 is rejected for the same reasons as claim 1.
As per claim 10:
The additional limitation(s) of claim 10 basically recite the additional limitation(s) of claim 2. Therefore, claim 10 is rejected for the same reason(s) as claim 2.
As per claim 11:
The additional limitation(s) of claim 11 basically recite the additional limitation(s) of 
As per claim 12:
The additional limitation(s) of claim 12 basically recite the additional limitation(s) of claim 4. Therefore, claim 12 is rejected for the same reason(s) as claim 4.
As per claim 13:
The additional limitation(s) of claim 13 basically recite the additional limitation(s) of claim 5. Therefore, claim 13 is rejected for the same reason(s) as claim 5.
As per claim 14:
The additional limitation(s) of claim 14 basically recite the additional limitation(s) of claim 6. Therefore, claim 14 is rejected for the same reason(s) as claim 6.
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 7. Therefore, claim 15 is rejected for the same reason(s) as claim 7.
As per claim 16:
The additional limitation(s) of claim 16 basically recite the additional limitation(s) of claim 8. Therefore, claim 16 is rejected for the same reason(s) as claim 8.
As per claim 17:
Claim 17 essentially recites the same limitations of claim 1. Therefore, claim 17 is rejected for the same reasons as claim 1.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 2. Therefore, claim 18 is rejected for the same reason(s) as claim 2.
As per claim 19:

As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 4. Therefore, claim 20 is rejected for the same reason(s) as claim 4.
As per claim 21:
The additional limitation(s) of claim 21 basically recite the additional limitation(s) of claim 5. Therefore, claim 21 is rejected for the same reason(s) as claim 5.
As per claim 22:
The additional limitation(s) of claim 22 basically recite the additional limitation(s) of claim 6. Therefore, claim 22 is rejected for the same reason(s) as claim 6.
As per claim 23:
The additional limitation(s) of claim 23 basically recite the additional limitation(s) of claim 7. Therefore, claim 23 is rejected for the same reason(s) as claim 7.
As per claim 24:
The additional limitation(s) of claim 24 basically recite the additional limitation(s) of claim 8. Therefore, claim 24 is rejected for the same reason(s) as claim 8.

Response to Arguments
The arguments presented by Applicant in the response, received on 4/30/2020 are partially considered persuasive.
Applicant argues regarding the drawing objections:
“The figures stand objected to under 37 CFR 1.83(a) for failing to show every feature of the invention specified in the claims. Specifically, the Office 

This argument is partially found to be persuasive for the following reason. The examiner has withdrawn the drawing objections for claims 7-8, 15-16, and 23-24 as writemasks are generally understood to one of ordinary skill in the art and the specifics of set/clear writemasks bits aren’t needed. However, the drawings don’t show sign/zero extending source words, as currently claimed. Drawing amendments that either show extension circuitry or show source word extension in a flowchart would result in the withdrawal of the objections.
Applicant argues regarding claims 1, 9, and 17:
“Although the cited references appear to disclose multiplying elements, the cited references, when considered alone or in combination, do not appear to disclose or suggest "for each value of i, where i represents a number of packed quadwords stored in the first source register or the second source register, the multiplier circuitry is to multiply sign or zero extended word[4*i+0] of the first source register by sign or zero extended word[0] of quadword[i] of the second source register; sign or zero extended word[4*i+1] of the first source register by sign or zero extended word[l] of quadword[i] of the second source register; sign or zero extended word[4*i+2] of the first source register by sign or zero extended word[2] of quadword[i] of the second source register; and sign or zero extended word[4*i+3] of the first source register by sign or zero extended word[3] of quadword[i] of the first source register, to generate the plurality of temporary products," as recited in claim 1, as amended. Reconsideration thereof is respectfully requested. 
Claims 9 and 17 have been amended similarly to claim 1. For similar reasons to those discussed above, Applicant respectfully submits that the cited references, when considered alone or in combination, do not appear to disclose or render obvious the embodiments of claims 9 or 17, as amended. Reconsideration thereof is respectfully requested.”  


The examiner notes a potential combination of amendments to overcome both the 112 (a & b) rejections. Regarding the 112 (a & b) rejection, the limitation at issue could be changed to “for each value of i, where i represents a number of packed quadwords stored in the first source register or the second source register, the multiplier circuitry is to multiply sign or zero extended word[4*(i-1)+0] of the first source register by sign or zero extended word[0] of quadword[i] of the second source register; sign or zero extended word[4*(i-1)+1]of the first source register by sign or zero extended word[l] of quadword[i] of the second source register; sign or zero extended word[4*(i-1)+2] of the first source register by sign or zero extended word[2] of quadword[i] of the second source register; and sign or zero extended word[4*(i-1)+3] of the first source register by sign or zero extended word[3] of quadword[i] of the first source register.” By including the decrementing within the calculation of the first source register, each iteration of the FOR loop correctly accesses the appropriate word. Alternatively, an amendment could be made to change the limitation at issue to “for each value of i, starting at i=0 and iterating to i-1, where i represents a number of packed quadwords stored in the first source 
The examiner notes an amendment to overcome the 103 rejections. Matsuyama disclosed in figures 10A-B that each source register holds a single quadword of data. This allows for an i=1 value to read upon the claimed limitations. However, an amendment that states that the first and second source registers hold a plurality of quadwords or stating that “i” is greater than or equal to two would overcome the combination. 

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183