Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
1.  A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on April 22nd, 2021 has been entered.
 
Response to Arguments
2.  Applicant’s arguments, filed April 22nd, 2021, with respect to the rejections of claim 1 under 35 USC 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Moudgill et al (US 2018/0173527).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.  Claims 1, 2, 4, 6-9, 11, and 13-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lindberg et al (US 2019/0354568, herein Lindberg) in view of Moudgill et al (US 2018/0173527, herein Moudgill).

fetch circuitry to fetch a single instruction ([0038-0039], fetch unit) having fields to specify an opcode ([0040], opcode) and locations of a first source, second source, and destination vectors ([0051-0052], input and output elements), the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate resulting products with previous contents of a corresponding single-precision element of the specified destination according to a rounding mode ([0051-0053], half precision input elements & accumulation register, multiply input elements x with filter coefficients c to accumulate result y, [0058], [0077], output of FPU, [0072], 16-bit floating point operands used in execution unit, [0071], single-precision datapath, [0072] & [0079], source operands A & B used to perform convolution operation described previously, [0076], rounding logic);
decode circuitry to decode the fetched instruction ([0038-0039], decode unit); and
the execution circuitry to respond to the decoded instruction as specified by the opcode ([0040], ALU & FPU).
Lindberg fails to teach wherein the rounding mode is fixed for the single instruction regardless of any rounding mode set in a control register.
Moudgill teaches a processor for performing floating point operations wherein a rounding mode is fixed for a single instruction regardless of any rounding mode set in a control register ([0027], [0033], floating point instruction specifies its own rounding rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Lindberg and Moudgill to utilize per-instruction rounding modes.  While Lindberg discloses that the exemplary processor performs rounding on floating point results when needed, Lindberg does not explicitly any particular rounding rule or the use of a control register.  However, as the use of control registers and using instruction fields as specifiers as taught by Moudgill is a routine and conventional aspect of the microprocessor art, doing so would merely entail a combination of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Regarding claim 2, the combination of Lindberg and Moudgill teaches the processor of claim 1, wherein the locations of each of the specified source and destination vectors are either in registers or in memory (Lindberg [0072], source and destination operand registers).

Regarding claim 4, the combination of Lindberg and Moudgill teaches the processor of claim 1, wherein N is specified by the instruction and has a value of one of 4, 8, 16, and 32 (Lindberg [0084], [0098], number of elements can be any number greater than 2 according to needs of convolution kernel).

Regarding claim 6, the combination of Lindberg and Moudgill teaches the processor of claim 1, wherein the 16-bit floating-point format is either bfloat16 or binary16 (Lindberg [0046], binary16).

Regarding claim 7, the combination of Lindberg and Moudgill teaches the processor of claim 1, wherein the execution circuitry is to generate all N elements of the specified destination in parallel (Lindberg [0058], [0067]).

Claims 8, 9, 11, 13, and 14 refer to a method embodiment of the processor embodiment of claims 1, 2, 4, 6, and 7, respectively.  Therefore, the above rejections for claims 1, 2, 4, 6, and 7 are applicable to claims 8, 9, 11, 13, and 14, respectively.

Claims 15 and 16 refer to a system embodiment comprising the processor embodiment of claims 1 and 2.  Therefore, the above rejections for claims 1 and 2 are applicable to claims 15 and 16, respectively.

Regarding claim 17, the combination of Lindberg and Moudgill teaches the system of claim 15, wherein the 16-biat floating point format comprises either a 5-bit exponent or an 8-bit exponent (Lindberg [0046], half precision format uses 5 exponent bits).
Claims 18 and 19 refer to a machine readable medium embodiment of the processor embodiment of claims 1 and 4, respectively.  Therefore, the above rejections for claims 1 and 4 are applicable to claims 18 and 19.

4.  Claims 3 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Lindberg and Moudgill in view of Tagliavini et al (“A Transprecision Floating-Point Platform for Ultra-Low Power Computing”, herein Tagliavini).

Regarding claim 3, the combination of Lindberg and Moudgill teaches the processor of claim 1.  Lindberg and Moudgill fail to teach wherein the 16-bit floating-point format comprises a sign bit, an 8-bit exponent, and a mantissa comprising 7 explicit bits and an eighth implicit bit.
Tagliavini teaches a processor for performing floating point operations wherein a 16-bit floating-point format comprises a sign bit, an 8-bit exponent, and a mantissa comprising 7 explicit bits and an eighth implicit bit (Fig 1 & section 3A, binary16alt format).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Lindberg and Moudgill with those of Tagliavini to use the alternative 16-bit floating point format.  While Lindberg exemplifies using the standard half precision floating point format in the convolution kernel processing, Tagliavini shows that the binary16alt format with 1 sign bit, 8 exponent bits, and 7 mantissa bits reduces the overall energy consumption of the floating point processor compared to the binary16 format used by Lindberg due to the reduction in the number of operations required (Tagliavini Figs 5-7).  As reducing the energy consumption, execution time, and number of memory accesses of operations are all desirable for processor design (Tagliavini, Abstract), this would merely entail a combination of known prior art elements to achieve predictable results, and would have been obvious to one of ordinary skill in the art.

.

5.  Claims 5, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lindberg and Moudgill in view of Henry et al (US 2017/0103321, herein Henry).

Regarding claim 5, the combination of Lindberg and Moudgill teaches the processor of claim 1.  Lindberg and Moudgill fail to teach wherein the execution circuitry is to perform the multiplications with infinite precision without saturating and to saturate the result of the accumulation to plus or minus infinity in case of an overflow and to zero in case of any underflow.
Henry teaches a processor for performing floating point multiplication wherein execution circuitry is to perform the multiplication without saturating and to saturate the result of the accumulation to plus or minus infinity in case of an overflow and to zero in case of any underflow ([0217], [0223], multiply-accumulate operations retain full precision and saturate final result to special cases such as zero and infinity).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Lindberg and Moudgill with those of Henry to use full precision calculations and saturate the final result.  While Lindberg does not explicitly state how saturation is handled in each individual multiplication operation, Lindberg also does not state that intermediate results of the multiplications are in any way saturated.  As it is known in the prior art that floating point operations may return special case results due to overflow (Henry [0223], [0244]), performing the multiplications at full precision and saturating the end result would allow for the processor to accumulate the results of many operations without any loss of precision, as disclosed by Henry (Henry [0223]).  As both Henry and Lindberg do contemplate the importance of normalizing floating point results (Lindberg [0076], Henry [0217]), this would merely entail a combination of known prior art elements to achieve predictable results, and would have been obvious to one of ordinary skill in the art.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105.  The examiner can normally be reached on Monday-Friday 7:30-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J METZGER/             Primary Examiner, Art Unit 2182