Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 1, 3, 5-6, 9, 10, and 12-13 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4 of copending Application No. 16836117, respectively. Although the claims at issue are not identical, they are not patentably distinct from each other because of the mapping presented below.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Present Application 16836110
Application 16836117
1. A system, comprising:
a memory;
a processor coupled to the memory; and
a circuitry, coupled to the memory and the processor, to execute one or more mixed-precision layers of an artificial neural network (ANN), each mixed-precision layer including high-precision weight filters and low-precision weight filters, the circuitry configured to:

receive an input feature map having a plurality of input channels (cin),
perform one or more calculations on the input feature map using the high-precision weight filters to create a high-precision output feature map having a first number of output channels (k),

perform one or more calculations on the input feature map using the low-precision weight filters to create a low-precision output feature map having a second number of output channels (cout−k),
concatenate the high-precision output feature map and the low-precision output feature map to create a unified output feature map having a plurality of output channels (cout), and

send the unified output feature map.
1. A system, comprising:

a circuitry, configured to be coupled to a memory and a processor, to execute one or more mixed-precision layers of an artificial neural network (ANN), each mixed-precision layer including high-precision weight filters and low-precision weight filters, the circuitry configured to:
receive at least a portion of an input feature map having a plurality of input channels (cin),
perform one or more calculations on the at least a portion of an input feature map using at least a portion of the high-precision weight filters to create at least a portion of a high-precision output feature map having a first number of output channels (k),
perform one or more calculations on the input feature map using at least a portion of the low-precision weight filters to create at least a portion of a low-precision output feature map having a second number of output channels (cout−k),
concatenate the at least a portion of the high-precision output feature map and the at least a portion of the low-precision output feature map to create at least a portion of a unified output feature map having a plurality of output channels (cout), and
send the at least a portion of the unified output feature map.
2. The system of claim 1, where each high-precision weight filter includes one or more 16-bit or greater floating point weight values.

3. The system of claim 2, where each low-precision weight filter includes:
a scaling factor and a plurality of ternary weight values, each ternary weight value being −1, 0 or 1; or
a scaling factor and a plurality of binary weight values, each binary weight value being −1 or 1.
2. The system of claim 1, where each high-precision weight filter includes one or more 16-bit or greater floating point weight values, 

and each low-precision weight filter includes:
a scaling factor and a plurality of ternary weight values, each ternary weight value being −1, 0 or 1; or
a scaling factor and a plurality of binary weight values, each binary weight value being −1 or 1.

4. The system of claim 1, where the circuitry includes at least one hardware accelerator.
5. The system of claim 4, where:
the hardware accelerator includes one or more high-precision computation (HPC) units and one or more low-precision computation (LPC) units;
the HPC unit includes one or more multiply-and-accumulate (MAC) units; and
the LPC unit includes one or more Strassen calculation (SC) units.

3. The system of claim 1, where:the circuitry includes at least one hardware accelerator 

having one or more high-precision computation (HPC) units and one or more low-precision computation (LPC) units;
each HPC unit includes one or more multiply-and-accumulate (MAC) units; and
each LPC unit includes one or more Strassen calculation (SC) units.
6. The system of claim 4, where the hardware accelerator includes 
one or more mixed-precision computation (MPC) units, and each MPC unit is configured to perform a MAC operation or a Strassen operation based on a mode control signal.
4. The system of claim 1, where the circuitry includes at least one hardware accelerator having 
one or more mixed-precision computation (MPC) units, and each MPC unit is configured to perform a MAC operation or a Strassen operation based on a mode control signal.





Claims 9-10 and 12-13 disclose the method that is implemented by the system of claims 1 and 3-6, with substantially the same limitations, respectively. Therefore the above mapping applies to claims 9-10 and 12-13 as well.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
"the high precision weight filters to" in claim 1.
"each MPC unit is configured to" in claim 6.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  MPC Unit is given structure in at least ([¶0148]) of the published instant specification.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-8 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claims 5 and 6, “the hardware accelerator” lacks antecedent basis.  Claim 4 introduces “at least one hardware accelerator” but in the case of more than one hardware accelerator were used, claims 5 and 6 would be indefinite.  “the one or more hardware accelerators” is recommended. 

Claim limitation "the high-precision weight filters to" in claim 1 invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.   Paragraph 0029 of the published instant specification mentions that in a particular embodiment a system including memory and a processor and circuitry may execute an artificial neural network layer including weight filters, however, there are no further details regarding the structure of the weight filter themselves, much less the specific structure of a high-precision weight filter.  Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

The remaining claims are rejected with respect to their dependence on the rejected claims. 

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 8 rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Claim 8 reads “The system of claim 8” making it dependent on itself, which is improper.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-14 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a system which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
perform one or more calculations on the input feature map using the high-precision weight filters to create a high-precision output feature map having a first number of output channels (k) (mathematical calculation),
 perform one or more calculations on the input feature map using the low-precision weight filters to create a low-precision output feature map having a second number of output channels (mathematical calculation)
concatenate the high-precision output feature map and the low-precision output feature map to create a unified output feature map having a plurality of output channels (mathematical calculations and relationships)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “a memory”, “a processor”, “a circuitry”, and “weight filters”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 1 also recites additional elements “receive an input feature map having a plurality of input channels (cin)” and “send the unified output feature map” which amounts to gathering and outputting data which is considered insignificant extra-solution activity (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 9, which recites a method, as well as to dependent claims 2-7 and 10-14. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional insignificant extra-solution activity “where each high-precision weight filter includes one or more 16-bit or greater floating point weight values” which amounts to selection of a data type (See Intellectual Ventures I LLC v. Erie Indem. Co., 850 F.3d at 1328-29, 121 USPQ2d at 1937.)
Dependent claims 3 and 10 recite additional mathematical calculations and relationships “where each low-precision weight filter includes: a scaling factor and a plurality of ternary weight values, each ternary weight value being −1, 0 or 1; or a scaling factor and a plurality of binary weight values, each binary weight value being −1 or 1.” 
Dependent claim 4 recites additional generic computer components “hardware accelerator”
Dependent claims 5 and 11-12 recite additional generic computer components “high-precision computation units” and “low-precision computation units” as well as additional mathematical calculations and relationships “the LPC unit includes one or more Strassen calculation (SC) units.”
Dependent claims 6 and 13 recite additional generic computer components “mixed-precision computation units” as well as additional mathematical calculations “each MPC unit is configured to perform a MAC operation or a Strassen operation based on a mode control signal.”
Dependent claim 7 recites additional insignificant extra-solution activity “the ANN is a convolutional neural network (CNN), the CNN includes a plurality of depth-wise separable (DS) convolutional layers, each DS convolutional layer includes a depth-wise (DW) convolutional layer and a pointwise convolutional layer, each DW convolutional layer is a high-precision layer, and each pointwise convolutional layer is a mixed-precision layer.” Which amounts to selection of a data type.
Dependent claims 8 and 14 recite additional mathematical relationships “where the input feature map for each mixed-precision layer is an output feature map provided by a preceding high-precision layer.”
Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-14 are rejected under 35 U.S.C. § 101. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


	Claims 1, 2, 4, 7-9, and 14  are rejected under 35 U.S.C. 103 as being unpatentable over Al-Hami (“Method for Hybrid Precision Convolutional Neural Network Representation”, 2018) and in view of Chen (WO2021012148A1). 

	Regarding claim 1, Al-Hami teaches a circuitry, coupled to the memory and the processor, to execute one or more mixed-precision layers of an artificial neural network (ANN), ([p. 1] "This invention addresses fixed-point representations of convolutional neural networks(CNN) in integrated circuits.")
	each mixed-precision layer including high-precision weight filters and low-precision weight filters, the circuitry configured to: ([p. 1] "different precisions (i.e. numbers of bits) and different formats (i.e. numbers of integer and fractional bits) can be assigned to each two-dimensional (2D) kernel or three-dimensional (3D) kernel of a convolutional layer.")
	receive an input feature map having a plurality of input channels (cin), ([p. 1] "different precisions and formats can be assigned to different partitions of the input and/or output feature maps." See FIG. 1 input feature maps with 2 channels.)
	perform one or more calculations on the input feature map using the high-precision weight filters to create a high-precision output feature map having a first number of output channels (k), ([p. 1] "coeff_format[1:5, 1:5, 0, 2]: <2:-4> % Example: <2:-4> -> <sgn b2 b1 b0 . b-1 b-2 b-3 b-4> -> 8b" Floating point numbers with more bits in the mantissa interpreted as higher precision, such that input feature map 1 is interpreted as having high-precision weight filters used to create a high-precision output feature map.  With regards to the example bit format in Al-Hami, the negative bits are interpreted as the abscissa while the positive bits are interpreted as mantissa. Al-Hami shows that each of the higher precision input feature maps correspond to three intermediate output feature maps from the 2D 5x5 Conv Kernels (marked in blue).)
	perform one or more calculations on the input feature map using the low-precision weight filters to create a low-precision output feature map having a second number of output channels (cout−k), ([p. 1] "coeff_format[1:5, 1:5, 0, 2]: <2:-4> % Example: <2:-4> -> <sgn b2 b1 b0 . b-1 b-2 b-3 b-4> -> 8b" Floating point numbers with fewer bits in the mantissa interpreted as lower precision, such that input feature map 0 is interpreted as having low-precision weight filters used to create a low-precision output feature map.  With regards to the example bit format in Al-Hami, the negative bits are interpreted as the abscissa while the positive bits are interpreted as mantissa. Al-Hami shows that each of the lower precision input feature maps correspond to three intermediate output feature maps from the 2D 5x5 Conv Kernels (marked in green).)
	concatenate the high-precision output feature map and the low-precision output feature map to create a unified output feature map having a plurality of output channels (cout), and (See FIG. 1 the accumulator interpreted as concatenating the low and high precision output feature maps from the 4D convolutional filter representation to create a unified output feature map having 3 output channels.).
While it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the neural network integrated circuit system mentioned at a high level in Al-Hami might contain memory and a processor, Al-Hami does not explicitly teach A system, comprising: a memory; a processor coupled to the memory; and 
	While it would be obvious to one of ordinary skill before the effective filing date of the claimed invention that the output of a particular layer in a neural network is typically sent to the following layer, Al-Hami does not explicitly teach to send the unified output feature map.  

Chen, in the same field of endeavor, teaches A system, comprising: a memory; a processor coupled to the memory; and ([¶0072] "The on-chip memory refers to the internal memory of the processor rather than the external memory, and the on-chip memory may be on-chip memory or cache.")
	send the unified output feature map. ([¶0070] "In a deep neural network, each hidden layer processes the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer. When processing image data, the input data and output data of each hidden layer are called feature maps"). 

	Al-Hami and Chen are both directed towards mixed-precision convolutional neural networks.  Therefore, Al-Hami and Chen are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami with the teachings of Chen by performing the well-known convolutional neural network functions on a well-known generic computer system. Chen further enforces the obviousness of using mixed precision representations of weights and activations of a convolutional neural network (Chen [¶0009] “Quantify the floating-point network model to obtain at least two fixed-point network models with different precisions”).  Chen provides as an additional motivation for combination ([¶0019] "By quantifying the floating-point network model into at least two fixed-point network models with different accuracy, and selecting a fixed-point network model to process the data according to the accuracy of the fixed-point network model, it is possible to achieve as much as possible while ensuring the network accuracy It reduces the required computing power, reduces the bandwidth requirement, and balances accuracy and bandwidth, effectively solving the contradiction between network accuracy and bandwidth, and improving the performance of mobile devices to perform deep neural network operations").  This motivation for combination also applies to the remaining claims which depend on this combination.  

	Regarding claim 2, the combination of Al-Hami, and Chen teaches The system of claim 1, where each high-precision weight filter includes one or more 16-bit or greater floating point weight values. (Chen [¶0097] "The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number"). 

	Regarding claim 4, the combination of Al-Hami, and Chen teaches The system of claim 1, where the circuitry includes at least one hardware accelerator. (Chen [¶0095] "The quantization method adopted by the quantization unit can be called mixed precision quantization" Quantization unit interpreted as synonymous with hardware accelerator.). 

	Regarding claim 7, the combination of Al-Hami and Chen teaches The system of claim 1, where the ANN is a convolutional neural network (CNN), the CNN includes a plurality of depth-wise separable (DS) convolutional layers, each DS convolutional layer includes a depth-wise (DW) convolutional layer and a pointwise convolutional layer, (Chen [¶0032] "the depthwise separable convolution of the convolutional neural network is taken as an example to describe the data processing method. The convolutional layer introduced above is a standard convolutional layer, which performs standard convolution operations. In the deep separable convolutional neural network, the convolutional layer in the hidden layer includes: two convolutional layers obtained by splitting the standard convolutional layer: deep convolutional layer and point convolutional layer, which perform deep convolutions respectively")
	each DW convolutional layer is a high-precision layer, and each pointwise convolutional layer is a mixed-precision layer. (Chen [¶0048] "The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8)." [¶0051] "It is also possible to use mixed precision quantization for both the deep convolutional layer and the point convolutional layer, and the specific quantization method is similar to the quantization method of the deep convolutional layer or the point convolutional layer described above" While Chen teaches that the DW layer is mixed-precision and the pointwise layer is lower-precision, this is clearly intended to further optimize the network performance.  It would be obvious to one of ordinary skill in the art that the network accuracy could be improved by instead increasing the activation bit-width of the DW layer (w16a16) such that the DW layer was a high precision layer, and similarly increasing the bit-width of the pointwise weights (w16a8) such that the pointwise layer is a mixed precision layer.  This would lead to obvious and expected outcomes.). 

	Regarding claim 8, the combination of Al-Hami and Chen teaches The system of claim 8, where the input feature map for each mixed-precision layer is an output feature map provided by a preceding high-precision layer. (Chen [¶0048] "The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8)." [¶0125] "Then the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block. The data block of each input channel is convolved with the weight of the convolution kernel" While Chen teaches that the DW layer is mixed-precision and the pointwise layer is lower-precision, this is clearly intended to further optimize the network performance.  It would be obvious to one of ordinary skill in the art that the network accuracy could be improved by instead increasing the activation bit-width of the DW layer (w16a16) such that the DW layer was a high precision layer, and similarly increasing the bit-width of the pointwise weights (w16a8) such that the pointwise layer is a mixed precision layer.  This would lead to obvious and expected outcomes.). 

	Regarding claims 9 and 14, claims 9 and 14 are directed towards the method performed by the system in claims 1 and 8.  Therefore, the rejection applied to claims 1 and 8 also apply to claims 9 and 14

	Claims 3 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Al-Hami, and Chen and in further view of Li (“Ternary weight networks”, 2016).

	Regarding claim 3, the combination of Al-Hami and Chen teaches The system of claim 2.
	However, the combination of Al-Hami and Chen does not explicitly teach each low-precision weight filter includes: a scaling factor and a plurality of ternary weight values, each ternary weight value being −1, 0 or 1; or 
	a scaling factor and a plurality of binary weight values, each binary weight value being −1 or 1.  

Li, in the same field of endeavor, teaches The system of claim 2, where each low-precision weight filter includes: a scaling factor and a plurality of ternary weight values, each ternary weight value being −1, 0 or 1; or ([p. 2 §2] "We address the limited storage and limited computational resources issues by introducing ternary weight networks (TWNs), which constrain the weights to be ternary-valued: +1, 0 and -1...To make the ternary weight networks perform well, we seek to minimize the Euclidian distance between the full precision weights W and the ternary-valued weights Wt along with a nonnegative scaling factor α...Here n is the size of the filter. With the approximation W ≈ αWt, a basic block of forward propagation in ternary weight networks is as follows")
	a scaling factor and a plurality of binary weight values, each binary weight value being −1 or 1. ([p. 1 §1.1] "BinaryConnect [1] uses a single sign function to binarize the weights. Binary Weight Networks [15] adopts the same binarization function but adds an extra scaling factor. The extensions of the previous methods are BinaryNet [5] and XNOR-Net [15] where both weights and activations are binary-valued."). 

	Al-Hami, Chen, and Li are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Al-Hami, Chen, and Li are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami and Chen with the teachings of Li by using weight ternarization. Li provides as an additional motivation for combination ([p. 2 §2] "Compared with the BPWNs, TWNs own an extra 0 state. However, the 0 terms need not be accumulated for any multiple operations. Thus, the multiply-accumulate operations in TWNs keep unchanged compared with binary precision counterparts. As a result, it is also hardware-friendly for training large-scale networks with specialized DL hardware").  

	Regarding claim 10, claim 10 is directed towards the method performed by the system of claim 3.  Therefore, the rejection applied to claim 3 also applies to claim 10.  

	Claims 5 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Al-Hami and Chen and in further view of Wang (US20190325301A1) and Tschannen (“StrassenNets: Deep Learning with a Multiplication Budget”, 2018).  

	Regarding claim 5, the combination of Al-Hami and Chen teaches The system of claim 4.
	However, the combination of Al-Hami and Chen does not explicitly teach the hardware accelerator includes one or more high-precision computation (HPC) units and one or more low-precision computation (LPC) units; 
	the HPC unit includes one or more multiply-and-accumulate (MAC) units; and the LPC unit includes one or more Strassen calculation (SC) units.  

Wang, in the same field of endeavor, teaches the hardware accelerator includes one or more high-precision computation (HPC) units and one or more low-precision computation (LPC) units; ([¶0042] "the compute units in the first segment use a low-precision accumulator (small bit-width accumulator, e.g., 16-bit as compared to 32-bit accumulator), the accumulator units in the second segment can use a high-precision accumulator (large bit-width accumulator, e.g., 32 or 64-bit wide).")
	the HPC unit includes one or more multiply-and-accumulate (MAC) units; and ([¶0018] "The specialized processing circuit comprises one or more floating point computation units (FPU). The specialized processing circuit and the memory are configured into two specific components in the compute unit, namely a multiplier and an accumulator. In some cases, the multiply-add is regarded as one unit, i.e. multiplier-accumulator (MAC) unit." [¶0042] "Furthermore, an embodiment configures the bit-width of an accumulator unit in the second segment independently from the accumulators operating in the compute units in the first segment. For example, even if the compute units in the first segment use a low-precision accumulator (small bit-width accumulator, e.g., 16-bit as compared to 32-bit accumulator), the accumulator units in the second segment can use a high-precision accumulator (large bit-width accumulator, e.g., 32 or 64-bit wide)." FPU with high-precision accumulator interpreted as synonymous with HPC unit including one more MAC units.). 

	Al-Hami, Chen, and Wang are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Al-Hami, Chen, and Wang are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami and Chen with the teachings of Wang by utilizing both HPC and LPC units in a hardware accelerator.  Wang teaches as an additional motivation for combination ([¶0041] “an accumulator unit in the one-dimensional array of accumulators is an advantageously simpler circuit than the FPU of the first segment and is configured only for accumulation operation. In another embodiment, an accumulator unit in the one-dimensional array of accumulators is a multifunctional circuit, such as an FPU, but is repurposed only for accumulation operation, hence the operation of the multifunctional circuit is advantageously simplified.”).  This motivation for combination also applies to the remaining claims which depend on this combination.   
	However, the combination of Al-Hami, Chen, and Wang does not explicitly teach the LPC unit includes one or more Strassen calculation (SC) units.  

Tschannen, in the same field of endeavor, teaches the LPC unit includes one or more Strassen calculation (SC) units. ([p. 2 §1] "(SPN) (arithmetic circuit). The SPNs disentangle (scalar) multiplications and additions in a way similar to Strassen’s algorithm. The number of hidden units in the SPNs therefore determines the multiplication budget of the corresponding DNN layers."). 

	Al-Hami, Chen, Wang, and Tschannen are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Al-Hami, Chen, Wang, and Tschannen are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami, Chen, and Wang with the teachings of Tschannen by using Strassen’s algorithm in an integrated circuit for convolutional neural network acceleration.  Tschannen provides as an additional motivation for combination ([p. 1 §1] "This algorithm design principle has led to many fast algorithms in linear algebra, most notably Strassen’s matrix multiplication algorithm (Strassen, 1969). Strassen’s algorithm uses 7 instead 8 multiplications to compute the product of two 2x2 matrices (and requires O(n^2.807) operations for multiplying nxn matrices).").  This motivation for combination also applies to the remaining claims which depend on this combination.  

	Regarding claims 11-12, claims 11-12 are directed towards the method performed by the system of claim 5.  Therefore, the rejection applied to claim 5 also applies to claims 11-12.  

	Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Al-Hami and Chen and in further view of Culurciello (US 20180341495 A1).

	Regarding claim 6, the combination of Al-Hami and Chen teaches The system of claim 4, where the hardware accelerator includes one or more mixed-precision computation (MPC) units (Chen [¶0095] "The quantization method adopted by the quantization unit can be called mixed precision quantization") 
and each MPC unit is configured to perform a MAC operation or a Strassen operation based on a mode control signal.  
	However, the combination of Al-Hami and Chen does not explicitly teach and each MPC unit is configured to perform a MAC operation or a Strassen operation based on a mode control signal.  

Culurciello, in the same field of endeavor, teaches each MPC unit is configured to perform a MAC operation or a Strassen operation based on a mode control signal. ([0056] "One operating mode for the accelerators 100 and 200 is the cooperative mode. In the cooperative mode, the 16 words of a vector in a trace are split up among a group of 16 multiply-accumulate (MAC) units in each vMAC" Performing MAC operation in cooperative mode is interpreted as being based on a mode control signal.). 

	Al-Hami, Chen, and Culurciello are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Al-Hami, Chen, and Culurciello are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami and Chen with the teachings of Culurciello by performing MAC operations based on a mode control signal.  Culurciello provides as an additional motivation for benefit ([¶0057] "The benefits of the cooperative mode are twofold. First, the cooperative mode maximizes MAC utilization. In a 2D systolic grid, maximum utilization can only be achieved if the size of the grid is equal to or an integer multiple of the kernel size. In the accelerator, MAC utilization is 100% any time the number of input maps is an integer multiple of 16. This is usually the case for all but the first layer of a CNN").  

	Regarding claim 13, claim 13 is directed towards the method performed by the system of claim 6.  Therefore, the rejection applied to claim 6 also applies to claim 13.

	Claims 15-16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (“HAQ: Hardware-Aware Automated Quantization with Mixed Precision”, 2019) and in view of Al-Hami. 

	Regarding claim 15, Wang teaches A method for training an artificial neural network (ANN) having one or more mixed-precision layers, each mixed-precision layer having high-precision weight filters and low-precision weight filters, the method comprising: ([p. 3 §3] "We have three hardware environments that covers edge and cloud, spatial and temporal architectures for mixed-precision accelerator" [p. 6 §4.1.1] "For weights, the bitwidth of these types of layers are nearly the same on the edge; while on the cloud, the depthwise convolution layers got more bitwidth than the pointwise convolution layers")
	for each mixed-precision layer: receiving a value for a; ([p. 5 §4] "Both MobileNets are inspired from the depthwise separable convolutions [3] and replace the regular convolutions with the pointwise and depthwise convolutions" [p. 4 §3.2] "At the kth time step, we take the continuous action ak (which is in the range of [0; 1]), and round it into the discrete bitwidth value bk")
	determining a number of high-precision output channels based on a and a total number of output channels (cout); ([p. 4 §3.1] "If the kth layer is a convolution layer, the state Ok is Ok = (k; cin; cout; skernel; sstride; sfeat; nparams; idw; iw/a; ak-1); (1) where k is the layer index, cin is #input channels, cout is #output channels...and ak-1 is the action from the last time step." FIG. 3 shows depthwise weights may have lower precision than pointwise weights in a separable layer.)
	determining a number of low-precision output channels based on a and the total number of output channels (cout); ([p. 4 §3.1] "If the kth layer is a convolution layer, the state Ok is Ok = (k; cin; cout; skernel; sstride; sfeat; nparams; idw; iw/a; ak-1); (1) where k is the layer index, cin is #input channels, cout is #output channels...and ak-1 is the action from the last time step." FIG. 3 shows depthwise weights may have lower precision than pointwise weights in a separable layer.)
	simultaneously training the high-precision weight filters and the low-precision weight filters, based on a training feature map, to create a high-precision output feature map and a low-precision output feature map, the high-precision output feature map having the number of high-precision output channels, and the low-precision output feature map having the number of low-precision output channels; ([p. 5 §3.6] "During exploration, we finetune the quantized model for one epoch to help recover the performance (using SGD with a fixed learning rate of 10^3 and momentum of 0:9). We randomly select 100 categories from ImageNet [5] to accelerate the model finetuning during exploration. After exploration, we quantize the model with our best policy and finetune it on the full dataset." See also FIG. 3 which shows each layer having both high-precision and low-precision weight filters.)
	determining an accuracy of the unified output feature map; and ([p. 4 §3.5] "we define our reward function R to be only related to the accuracy" See Eqn. 6.)
	adjusting the value for α based on the accuracy of the unified output feature map. ([p. 4 §3.2] "After our RL agent gives actions fakg to all layers, we measure the amount of resources that will be used by the quantized model. The feedback is directly obtained from the hardware accelerator, which we will discuss in Section 3.3. If the current policy exceeds our resource budget (on latency, energy or model size), we will sequentially decrease the bitwidth of each layer until the constraint is finally satisfied." [p. 4 §3.5] "we define our reward function R to be only related to the accuracy" See also Eqns 7 and 8.  The Q-learning is based on a reward which is based on the accuracy of the unified output feature map.  As the intent of the Q-learning is to maximize the reward based on the action, adjusting the value for the action in Wang is interpreted as being based on the accuracy of the unified output feature map.).
	However, Wang does not explicitly teach concatenating the high-precision output feature map and the low-precision output feature map to create a unified output feature map having the total number of output channels (cout);  

Al-Hami, in the same field of endeavor, teaches concatenating the high-precision output feature map and the low-precision output feature map to create a unified output feature map having the total number of output channels (cout); (See FIG. 1 the accumulator interpreted as concatenating the low and high precision output feature maps from the 4D convolutional filter representation to create a unified output feature map having 3 output channels.). 

	Wang and Al-Hami are both directed towards accelerating convolutional neural networks by using mixed-precision representations.  Therefore, Wang and Al-Hami are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Wang with the teachings of Al-Hami by concatenating the mixed-precision representations. Al-Hami provides as an additional motivation for combination (While defining precisions and formats in a hybrid fashion adds overhead for representation both in terms of storage space and facilities to decode and use the information, it can bring about a significant improvement in CNN accuracy when compared to a homogeneous quantization (i.e. 4D). Additionally, precisions for intermediate accumulated values can also be specified.).    

	Regarding claim 16,  the combination of Wang and Al-Hami teaches The method of claim 15, where: the value for α is between 0 and 1; (Wang [p. 4 §3.2] "At the kth time step, we take the continuous action ak (which is in the range of [0; 1]), and round it into the discrete bitwidth value bk:" [p. 5 §3.6] "one step means that our agent makes an action to decide the number of bits assigned to the weights or activations of a specific layer, while one episode is composed of multiple steps, where our RL agent makes actions to all layers....We apply a variant form of the Bellman’s Equation, where each transition in an episode is defined as Tk = (Ok; ak;R;Ok+1). During exploration, the Q-function is computed as [See Eqn. 7]" See Eqn. 6.)
	the number of high-precision output channels is an integer given by α*cout; and (Al-Hami [p. 1] "coeff_format[1:5, 1:5, 0, 2]: <2:-4> % Example: <2:-4> -> <sgn b2 b1 b0 . b-1 b-2 b-3 b-4> -> 8b" Floating point numbers with more bits in the mantissa interpreted as higher precision, such that input feature map 1 is interpreted as having high-precision weight filters used to create a high-precision output feature map.  With regards to the example bit format in Al-Hami, the negative bits are interpreted as the abscissa while the positive bits are interpreted as mantissa. Al-Hami shows that each of the higher precision input feature maps correspond to three intermediate output feature maps from the 2D 5x5 Conv Kernels (marked in blue). With respect to Al-Hami Cout is interpreted as 6 and alpha is interpreted as .5.)
	the number of low-precision output channels is an integer given by (1−α)*cout. (Al-Hami  Floating point numbers with fewer bits in the mantissa interpreted as lower precision, such that input feature map 0 is interpreted as having low-precision weight filters used to create a low-precision output feature map.  With regards to the example bit format in Al-Hami, the negative bits are interpreted as the abscissa while the positive bits are interpreted as mantissa. Al-Hami shows that each of the lower precision input feature maps correspond to three intermediate output feature maps from the 2D 5x5 Conv Kernels (marked in green). With respect to Al-Hami Cout is interpreted as 6 and alpha is interpreted as .5.). 

	Regarding claim 19, the combination of Wang and Al-Hami teaches The method of claim 16, where said simultaneously training includes at least one of gradient-descent based training and knowledge distillation based training. (Wang [p. 2 §1] "During the exploration, we leverage the deep deterministic policy gradient (DDPG) [17] to supervise our RL agent."). 

	Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wang, and Al-Hami and in further view of Mellempudi (“Mixed Precision Training With 8-bit Floating Point”, 2019).

	Regarding claim 17, the combination of Wang and Al-Hami teaches The method of claim 16
	However, the combination of Wang and Al-Hami does not explicitly teach each high-precision weight filter includes one or more 16-bit or greater floating point weight values, and the method further comprises: 
	repeating said simultaneously training, concatenating, determining the accuracy and adjusting the value for α until the accuracy of the unified output feature map is less than a threshold.  

Mellempudi, in the same field of endeavor, teaches The method of claim 16, where each high-precision weight filter includes one or more 16-bit or greater floating point weight values, and the method further comprises: ([p. 6 §4] "For these convolution networks, the first convolution and the last fully-connected (FC) layers are maintained at a higher precision (16-bit)")
	repeating said simultaneously training, concatenating, determining the accuracy and adjusting the value for α until the accuracy of the unified output feature map is less than a threshold. ([p. 4 §3.1] " Figure.2b shows the loss scaling schedule that worked for GNMT – we set the minimum threshold to 8K after the first 40K iterations, then increased it to 32K at around 150K iterations." Iterating interpreted as synonymous with repeating.). 

	Wang, Al-Hami, and Mellempudi are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Wang, Al-Hami, and Mellempudi are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Wang and Al-Hami with the teachings of Mellempudi by ensuring that the high-precision elements were at least 16-bit, and further repeating the iterative training until the accuracy reached a threshold. Mellempudi provides as an additional motivation for combination ([p. 2 §1] "Our paper extends the state of the art in 8-bit floating point (FP8) training with the following key contributions: Propose enhanced loss scaling method to compensate for the reduced subnormal range of 8-bit floating point representation for improved error propagation leading to better model accuracy").  This motivation for combination also applies to the remaining claims which depend on this combination.  

	Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wang, Al-Hami, and Mellempudi and in further view of Tschannen.

	Regarding claim 18, the combination of Wang, Al-Hami, and Mellempudi teaches The method of claim 17, where each high-precision [Strassen] weight matrix including one or more 16-bit or greater floating point weight values (Mellempudi [p. 6 §4] "For these convolution networks, the first convolution and the last fully-connected (FC) layers are maintained at a higher precision (16-bit)").
However, the combination of Wang, Al-Hami, and Mellempudi does not explicitly teach where said simultaneously training the high-precision weight filters and the low-precision weight filters includes: simultaneously training the high-precision weight filters and high-precision Strassen weight matrices for a first number of epochs at a first learning rate
	quantizing the high-precision Strassen weight matrices to create low-precision Strassen weight matrices, each low-precision Strassen weight matrix including ternary weight values and a scaling factor, each ternary weight value being −1, 0 or 1; 
	simultaneously training the high-precision weight filters and the low-precision Strassen weight matrices for a second number of epochs at a second learning rate; 
	fixing the values of the low-precision Strassen weight matrices to create fixed, low-precision Strassen weight matrices; 
	simultaneously training the high-precision weight filters and the fixed, low-precision Strassen weight matrices for a third number of epochs at a third learning rate; and 
	creating the low-precision weight filters based on the fixed, low-precision Strassen weight matrices.  

Tschannen, in the same field of endeavor, teaches The method of claim 17, where said simultaneously training the high-precision weight filters and the low-precision weight filters includes: simultaneously training the high-precision weight filters and high-precision Strassen weight matrices for a first number of epochs at a first learning rate, ([Abstract] "We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data [p. 6 §3.2.2] "We use an initial learning rate of 0:05 and mini-batch size 256, with two different learning rate schedules depending on the value of r in the convolution layers...We train for 40 epochs without quantization..." [p. 3 §2.1] "Writing matrix products in the form (1) is not specific to square matrices. Indeed, it is easy to see that r > nmk is a sufficient condition for the existence of matrices Wa;Wb;Wc with elements in K such that the product of any two matrices...including matrix-vector products (i.e., n = 1), can be written in the form (1)." Form 1 interpreted as Strassen weight matrix form such that Tschannen explicitly anticipates any matrix being formulated as a Strassen weight matrix. Tschannen teaches that the low-precision ternary weight filters are representative of high-precision data ([Abstract] "while maintaining the predictive performance of the full-precision models.").).
	quantizing the high-precision Strassen weight matrices to create low-precision Strassen weight matrices, each low-precision Strassen weight matrix including ternary weight values and a scaling factor, each ternary weight value being −1, 0 or 1; ([Abstract] "We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data" [p. 5 §2.4] "The resulting SPN realizes a convolution with r ternary whcin filters (the rows of Wb), followed by a channel-wise scaling with ~a = Wavec( ~W`), followed by convolution with a ternary 1x1xr filter for each of the cout outputs (the rows of Wc) see Fig. 1," Ternary weight values are by definition limited to {-1,0,1}.  Learning low cost ternary edge weight interpreted as synonymous with quantizing.).
	simultaneously training the high-precision weight filters and the low-precision Strassen weight matrices for a second number of epochs at a second learning rate; ([p. 6 §3.2.2] "for 70 epochs, multiplying the learning rate by 0.1 after 40 and 60 epochs" Second learning rate being .005 for 20 epochs.).
	fixing the values of the low-precision Strassen weight matrices to create fixed, low-precision Strassen weight matrices; ([p. 5 §3.1] "Before applying the proposed method to DNNs, we demonstrate that it is able to rediscover Strassen’s algorithm...A set of ternary weight matrices implementing an exact matrix multiplication, found by our method, is").
	simultaneously training the high-precision weight filters and the fixed, low-precision Strassen weight matrices for a third number of epochs at a third learning rate; and ([p. 6 §3.2.2] "for 70 epochs, multiplying the learning rate by 0.1 after 40 and 60 epochs" Third learning rate being .0005 for 10 epochs.).
	creating the low-precision weight filters based on the fixed, low-precision Strassen weight matrices. ([p. 5 §3.1] "Before applying the proposed method to DNNs, we demonstrate that it is able to rediscover Strassen’s algorithm...A set of ternary weight matrices implementing an exact matrix multiplication, found by our method, is"). 

	Al-Hami, Mellempudi, Wang, and Tschannen are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Al-Hami, Mellempudi, Wang, and Tschannen are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami, Mellempudi, and Wang with the teachings of Tschannen by using ternary weights and Strassen’s algorithm in an integrated circuit for convolutional neural network acceleration.  Tschannen provides as an additional motivation for combination ([p. 1 §1] "This algorithm design principle has led to many fast algorithms in linear algebra, most notably Strassen’s matrix multiplication algorithm (Strassen, 1969). Strassen’s algorithm uses 7 instead 8 multiplications to compute the product of two 2x2 matrices (and requires O(n^2.807) operations for multiplying nxn matrices).").  This motivation for combination also applies to the remaining claims which depend on this combination.  

	Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wang, Al-Hami, and Chen and in further view of Tschannen.

	Regarding claim 20, the combination of Wang and Al-Hami teaches The method of claim 16, where: the ANN is a convolutional neural network (CNN); the CNN includes a plurality of depth-wise separable (DS) convolutional layers; each DS convolutional layer includes a depth-wise (DW) convolutional layer and a pointwise convolutional layer; (Wang [p. 5 §4] "Both MobileNets are inspired from the depthwise separable convolutions [3] and replace the regular convolutions with the pointwise and depthwise convolutions")
	each DW convolutional layer is a high-precision layer; (Wang [p. 6 §4.1.1] "For weights...while on the cloud, the depthwise convolution layers got more bitwidth than the pointwise convolution layers.")
	However, the combination of Wang and Al-Hami does not explicitly teach each pointwise convolutional layer is a mixed-precision layer; 
and each low-precision output channel is generated by a strassenified convolution.  

Chen, in the same field of endeavor, teaches each pointwise convolutional layer is a mixed-precision layer; and ([¶0048] "The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8)." [¶0051] "It is also possible to use mixed precision quantization for both the deep convolutional layer and the point convolutional layer, and the specific quantization method is similar to the quantization method of the deep convolutional layer or the point convolutional layer described above" While Chen teaches that the DW layer is mixed-precision and the pointwise layer is lower-precision, this is clearly intended to further optimize the network performance.  It would be obvious to one of ordinary skill in the art that the network accuracy could be improved by instead increasing the activation bit-width of the DW layer (w16a16) such that the DW layer was a high precision layer, and similarly increasing the bit-width of the pointwise weights (w16a8) such that the pointwise layer is a mixed precision layer.  This would lead to obvious and expected outcomes.). 

	Wang, Al-Hami, and Chen are both directed towards mixed-precision convolutional neural networks.  Therefore, Wang, Al-Hami, and Chen are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Wang and Al-Hami with the teachings of Chen by performing the well-known convolutional neural network functions on a well-known generic computer system. Chen further enforces the obviousness of using mixed precision representations of weights and activations of a convolutional neural network (Chen [¶0009] “Quantify the floating-point network model to obtain at least two fixed-point network models with different precisions”).  Chen provides as an additional motivation for combination ([¶0019] "By quantifying the floating-point network model into at least two fixed-point network models with different accuracy, and selecting a fixed-point network model to process the data according to the accuracy of the fixed-point network model, it is possible to achieve as much as possible while ensuring the network accuracy It reduces the required computing power, reduces the bandwidth requirement, and balances accuracy and bandwidth, effectively solving the contradiction between network accuracy and bandwidth, and improving the performance of mobile devices to perform deep neural network operations").  
	However, the combination of Wang, Al-Hami, and Chen does not explicitly teach each low-precision output channel is generated by a strassenified convolution.  

Tschannen, in the same field of endeavor, teaches each low-precision output channel is generated by a strassenified convolution. ([p. 2 §1] "We then learn the addition and multiplication operations for all layers jointly from data by learning the edges of the SPNs, encoded as ternary {-1.0,1} matrices. As the transforms realized by the SPNs are approximate and adapted to the weight matrices and distribution of the activation tensors in the DNN, this allows us to reduce the number of multiplications much more drastically than hand engineered transforms like Strassen’s algorithm or the more specialized Winograd filter-based convolution"). 

		Al-Hami, Chen, Wang, and Tschannen are all directed towards accelerating convolutional neural networks by reducing precision for at least some of the weights or activations.  Therefore, Al-Hami, Chen, Wang, and Tschannen are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Al-Hami, Chen, and Wang with the teachings of Tschannen by using ternary weights and Strassen’s algorithm in an integrated circuit for convolutional neural network acceleration.  Tschannen provides as an additional motivation for combination ([p. 1 §1] "This algorithm design principle has led to many fast algorithms in linear algebra, most notably Strassen’s matrix multiplication algorithm (Strassen, 1969). Strassen’s algorithm uses 7 instead 8 multiplications to compute the product of two 2x2 matrices (and requires O(n^2.807) operations for multiplying nxn matrices).").  This motivation for combination also applies to the remaining claims which depend on this combination.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Srinivasan (“High Performance Scalable FPGA Accelerator for Deep Neural Networks”, 2019) is directed towards a mixed-precision convolutional neural network accelerator implemented using ternary weights and MAC units. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124