echNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on September 23, 2022, in which claims 1-2, 4, 6, 8-10, 12-15 and 17-19 are currently amended. Claims 1-19 are currently pending.

Response to Arguments
The objections to claims 13 and 19 are hereby withdrawn, as necessitated by applicant’s amendments and remarks made to the objections.
The interpretation to claims 9 and 14-17 under 35 U.S.C. § 112(f) have been maintained without traverse.
The rejections to claims 4, 8, and 19 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-19 under 35 U.S.C. 101 based on amendment have been considered, however, have not been deemed persuasive.  
With respect to Applicant's arguments that the 101 rejection is contradicting, Examiner respectfully disagrees.  While the claims may be directed towards a statutory category such as a method, that does not exempt the claims from being considered non-statutory subject matter in view of being directed towards a judicial exception.  The detailed 101 rejection explains that the substance of the methods is directed towards mathematical calculations and mental processes which are judicial exceptions to the four statutory categories of invention (See MPEP 2106).  
With respect to Applicant's arguments that the Office Action does not provide factual evidence that storing and retrieving information from memory is well-understood, routine, and conventional, Examiner respectfully disagrees.  The analysis under Prong 2B in the Office Action mailed 7/7/2022 explicitly cites examples under MPEP 2106.05(d).  
With respect to Applicant’s arguments that the claims are not directed towards a judicial exception, Examiner respectfully disagrees.  The claims are directed entirely to performing mathematical calculations on generic computer components.  The “compression” and “decompression” described in the claims amounts to performing a binary XOR operation on data, which is a mathematical operation that could readily and practically be performed in the mind.  The claim limitations do not amount to significantly more than the judicial exceptions and Examiner asserts that the mathematical operations being performed are well-understood, routine, and conventional and that it would therefore be inappropriate to consider them to be significantly more than the judicial exceptions.  For these reasons, Examiner asserts that it is appropriate to maintain the rejection.
Applicant’s arguments with respect to rejection of claims 1-19 under 35 U.S.C. 102/103 based on amendment have been considered.
With respect to Applicant's arguments that Chen does not teach when and how to use an inverse quantization operation, Examiner respectfully disagrees.  One of ordinary skill in the art would recognize before the effective filing date of the claimed invention that an inverse quantization operation is a form of quantization operation and Chen explicitly teaches that the quantization and activation operations both occur after obtaining and before storing intermediate values [¶0086] or before final accumulation [¶0206].
With respect to Applicant's arguments that Chen teaches away from the claims as drafted due to the mention of off-chip memory, Examiner respectfully disagrees.  One of ordinary skill in the art would recognize that DDR is a typical example of internal memory which typically refers to chips as opposed to external memory which refers to hard-disks and optical drives.  Examiner asserts that Applicant's interpretation of internal memory is not common or explicitly supported by the instant specification.  Chen also teaches at paragraph ¶0140 "and storing the results in the register or the on-chip SRAM or the off-chip DDR" where a register is interpreted as also being a form of second internal memory such that Chen clearly anticipates the claims under either interpretation. Applicant is reminded that cited prior art references must be considered in their entirety and not only the cited sections [ MPEP 2141.02(VI) ].
With respect to Applicant's arguments regarding the "input image" disclosed in the claims.  Examiner respectfully notes that claim 2 clearly teaches that the first input image is the result of a convolution operation being performed on a second input image and as is well-known in the art, the output feature map is generated as a result of depthwise and pointwise convolutions on the input feature map.  This is demonstrated in FIG. 12 of Chen which shows input and output feature maps being divided horizontally. 
With respect to Applicant's arguments regarding the difference between convolution layers and convolution channels, Examiner agrees and notes that the disclosure of Chen teaching performing the steps for each of the convolution channels is synonymous with performing the steps on each element of a particular convolution layer and that this process is performed on each of the convolution layers.  Examiner asserts that it would be improper to interpret the more narrow and detailed disclosure of Chen as somehow being different than the claimed invention as one of ordinary skill in the art would recognize that iteratively performing convolution on an input feature map is synonymous with performing the operations of a convolution layer as is taught in Chen. 
With respect to Applicant's arguments regarding the difference of the filter in Cooper and the reference row in the claimed invention, Examiner respectfully disagrees.  While the claim language does not make any distinction that would support this argument, Examiner asserts that Cooper still teaches Applicant's interpretation.  [p. 239 §1] of Cooper clearly teaches "In grayscale images, each byte value of a pixel is XORed by the byte value of the pixel above." such that the "filter" as described by the Applicant is clearly also image data synonymous with the claimed invention.  Similarly, with respect to Applicant' argument regarding the combination of Cooper and Chen requiring undue experimentation, Examiner respectfully disagrees. Both Chen and Cooper operate explicitly on image data, and it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the disclosure of Cooper could be performed on the matrix image data in Chen.  Cooper explicitly states with regard to 3D matrices ([p. 239 §1] "RGB images are simply processed as 3 separate color planes") and it would be obvious to apply this to convolution kernels and/or input/output feature maps which are all 3D matrices.
	With regards to Applicant's arguments regarding claims 8 and 17.  These arguments are seen as arguing from the specification and not from the claims themselves.
The remaining arguments are moot in view of a new ground of rejection set forth below.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“the compressor…configured to” in claims 9, 14, and 15
“the decompressor…configured to” in claims 9, 16, and 17
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  With respect to the compressor and decompressor, structure is provided in at least [¶0074] of the instant specification “In an alternative embodiment, the DSPs 140, the MAC circuits 111, the neural function unit 112, the compressor 113 and the de-compressor 114 are implemented with a general-purpose processor and a program memory”.  
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-19 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
decompressing a first compressed segment associated with a current cuboid of a first input image (mathematical calculation)
performing cuboid convolution over the decompressed data to generate a 3D pointwise output array (mathematical calculation)
compressing the 3D pointwise output array into a second compressed segment  by removing repetitive data between neighboring elements (mathematical calculations and relationships)
the first input image is fed to any one of the convolution layers and horizontally divided into a plurality of cuboids of the same dimension (mathematical calculation)
the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution (mathematical calculation)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “first internal memory”, “second internal memory”, and “convolution layer”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 1 also recites additional elements “outputted from the first internal memory to store decompressed data in the second internal memory” which amounts to gathering and outputting data, which is insignificant extra-solution activity (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.  Claim 1 also recites additional elements “repeating steps (a) to (c) until all the cuboids associated with a target convolution layer are processed” and “repeating steps (a) to (d) until all of all the multiple convolution layers of the DNN are completed” which is considered to be well understood, routine, and conventional in the art (MPEP §2106.05(d) See Bancorp Services v. Sun Life, 687 F.3d 1266, 1278, 103 USPQ2d 1425, 1433 (Fed. Cir. 2012) ("The computer required by some of Bancorp’s claims is employed only for its most basic function, the performance of repetitive calculations, and as such does not impose meaningful limits on the scope of those claims.") as well as and “to store it in the first internal memory” which is well-understood, routine, and conventional (See MPEP 2106.05(d): Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. The rejection applies as well as to dependent claims 2-8. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional mathematical calculations “applying a regular convolution on a second input image in the second internal memory with regular convolution filters for a layer that precedes the multiple convolution layers of the DNN to generate the first input image” and “compressing the first input image into multiple first compressed segments on a cuboid by cuboid basis” and “the step of applying the regular convolution and the step (b) are performed in a pipelined manner.” [arithmetic pipelining] as well as additional elements “to store them in the first internal memory after the step of applying and before the steps.” which is well-understood, routine, and conventional.
Dependent claim 3 recites additional insignificant extra-solution activity “wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.” which amounts to selection of a data-type.
Dependent claim 4 recites additional mathematical calculations “performing the depthwise convolution over the decompressed data with depthwise convolution filters to generate a 3D depthwise output array” and “performing the pointwise convolution over the 3D depthwise output array with pointwise convolution filters to generate the 3D pointwise output array”.
Dependent claim 5 recites additional mathematical calculations “compressing the 3D pointwise output array into the second compressed segment according to a row repetitive value compression (RRVC) scheme.”
Dependent claim 6 recites additional mathematical calculations “dividing a target channel of the 3D pointwise output array into multiple subarrays”, “performing bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray” as well as additional observation, evaluation, and judgement “forming a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray”, and “replacing non-zero (NZ) values of elements having multiple bits in the result map with 1 and combining their corresponding original values from the target subarray to form a portion of the second compressed segment”
Dependent claim 7 recites additional mathematical calculations “decompressing the first compressed segment for the current cuboid to generate the decompression data according to a row repetitive value decompression scheme.”
Dependent claim 8 recites additional observation, evaluation, and judgement “Fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, the bitwise XOR operations over the target restored reference row and row 1 of the target restored subarray, and the bitwise XOR operations over any two adjacent rows of the target restored subarray”, “forming a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray” and “restoring NZ values of elements having multiple bits in the target restored subarray according to the NZ bitmap and its corresponding original values” as well as additional insignificant extra-solution activity of gathering and outputting data “fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid” and “writing zeros in a restored result map according to the locations of zeros in the NZ bitmap”

Regarding Claim 9:  Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a circuit which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 9 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
to perform a cuboid convolution over decompression data for each cuboid of a first input image fed to any one of multiple convolution layers of the DNN (mathematical calculation),
performing multiplication and accumulation operations associated with the cuboid convolution to output a 3D pointwise output array (mathematical calculation)
to compress the 3D pointwise output array into one compressed segment by removing repetitive data between neighboring elements (mathematical calculation)
decompress the compressed segments from the second internal memory on a compressed segment by compressed segment basis (mathematical calculation)
the first input image is horizontally divided into a plurality of cuboids of the same dimension, with an overlap of at least one row for each channel between any two adjacent cuboids (mathematical calculation)
the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution (mathematical calculation)
Therefore, claim 9 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 9 recites additional elements “first internal memory”, “second internal memory”, “MAC circuit”, “compressor”, and “decompressor”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 9 is directed to a judicial exception.
Step 2B Analysis:  Claim 9 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 9 amount to no more than mere instructions to apply the judicial exception using a generic computer component.  Claim 9 also recites additional elements “to store it in the second internal memory”,  “to store the decompression data for a single cuboid in the first internal memory” and “storing multiple compressed segments only” which is well-understood, routine, and conventional (See MPEP 2106.05(d): Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;).
For the reasons above, claim 9 is rejected as being directed to non-patentable subject matter under §101. The rejection applies as well as to dependent claims 10-19. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 10 recites additional mathematical calculations “to perform a regular convolution over a second input image with regular convolution filters for a layer that precedes the multiple convolution layers of the DNN” and “wherein the at least one processor is further configured to perform the regular convolution and the cuboid convolution in a pipelined manner.” [arithmetic pipelining] as well as additional insignificant extra-solution activity “cause the at least one MAC circuit to generate the first input image having multiple second convoluted cuboids” and “to store the second input image” which is well-understood, routine, and conventional.
Dependent claim 11 recites additional insignificant extra-solution activity “wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.” which amounts to selection of a data-type.
Dependent claim 12 recites additional mathematical calculations “to perform the depthwise convolution over the decompressed data with depthwise convolution filters and cause the at least one MAC circuit to generate a 3D depthwise output array” and “perform the pointwise convolution over the 3D depthwise output array with pointwise filters and cause the at least one MAC circuit to generate the 3D pointwise output array”.
Dependent claim 13 recites additional insignificant extra-solution activity “the at least one processor is further configured to read corresponding coefficients from the flash memory and temporarily store them in the first internal memory prior to the regular convolution and the cuboid convolution” that amounts to gathering and outputting data.  Claim 13 also recites additional generic computer components “flash memory”
Dependent claim 14 recites additional mathematical calculations “to compress each of the 3D pointwise output array and each cuboid of the first input image as a target cuboid to generate a corresponding compressed segment according to a row repetitive value compression (RRVC) scheme.”
Dependent claim 15 recites additional mathematical calculations “dividing a target channel of the 3D pointwise output array into multiple subarrays”, “performing bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray” as well as additional observation, evaluation, and judgement “forming a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray”, and “replacing non-zero (NZ) values of elements having multiple bits in the result map with 1 and fetching their corresponding original values from the target subarray to form a portion of the second compressed segment”
Dependent claim 16 recites additional mathematical calculations “the decompressor is further configured to decompress each compressed segment from the second internal memory according to a row repetitive value decompression scheme.”
Dependent claim 17 recites additional observation, evaluation, and judgement “Fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, the bitwise XOR operations over the target restored reference row and row 1 of the target restored subarray, and multiple bitwise XOR operations over any two adjacent rows of the target restored subarray”, “forming a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray” and “restoring NZ values of elements having multiple bits in the target restored subarray according to the NZ bitmap and its corresponding original values” as well as additional insignificant extra-solution activity of gathering and outputting data “fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid” and “writing zeros in a restored result map according to the locations of zeros in the NZ bitmap”
Dependent claim 18 recites “to apply a selected activation function to each element outputted from the at least one MAC circuit.” Which is seen as generally linking the judicial exception to a particular field or technology.
Dependent claim 19 recites additional generic computer components “lookup tables”, “multiplexer”, and “adder” as well as additional insignificant extra-solution activity “outputting Q activation values according to the biased element” and “selecting one of the Q activation values as an output element” which amounts to gathering and outputting data.

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-19 are rejected under 35 U.S.C. § 101. 

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 4, 9, 10, 12,  and 18 are rejected under U.S.C. §103 as being unpatentable over the combination of Chen and Gao (“DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator”, 2018)

	 Regarding claim 1, Chen teaches A method for convolution calculation applied in an integrated circuit for use in a deep neural network (DNN), the integrated circuit comprising a first internal memory and a second internal memory, the method comprising:([¶0001] "The present disclosure relates to the technical field of neural network, and more particularly to a method and an electronic device for convolution calculation in a neural network" [¶0049] "the memory may be an on-chip random access memory (SRAM) to achieve faster access speed and avoid occupying data transmission bandwidth. However, the present disclosure is not limited thereto. For example, the memory may be other memories, such as an off chip memory (DDR). The available space in the memory may be used to buffer intermediate output results of depthwise convolution operations.")
	(a) decompressing a first compressed segment associated with a current cuboid of a first input image and outputted from the first internal memory ([¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data." Inverse quantization interpreted as synonymous with decompressing a first compressed segment.)
	to store decompressed data in the second internal memory;([¶0005] "when the data amount of input and output is relatively large, a larger on-chip random access memory (SRAM) is required for buffering intermediate results. However, a size of on-chip SRAM is fixed. If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR)" SRAM interpreted as synonymous with first internal memory.  DDR interpreted as synonymous with second internal memory.)
	(b) performing cuboid convolution over the decompressed data to generate a 3D pointwise output array;([¶0065] "the pointwise convolution calculations are performed according to the intermediate feature values of the first predetermined number p of points on all depthwise convolution output channels and pointwise convolution kernels, to obtain output feature values of the first predetermined number p of points on all pointwise convolution output channels.")
	(d) repeating steps (a) to (c) until all second compressed segments for all the cuboids associated with a target convolution layer of multiple convolution layers of the DNN are produced; and([¶0032] "A convolutional neural network may generally include multiple convolutional layers" [¶0094] "In step S212, the above operations are repeated (i.e. step S211), and the depthwise convolution calculations are performed according to the input feature maps and the depthwise convolution kernels, to obtain the intermediate feature values of the first predetermined number p of points")
	(e) repeating steps (a) to (d) until all of the multiple convolution layers of the DNN are completed;([¶0094] "on a next second predetermined number m of depthwise convolution output channels, and correspondingly performing subsequent operations, until the intermediate feature values of the first predetermined number p of points on all depthwise convolutional output channels are obtained." All convolution channels interpreted as synonymous with output layer.  Chen teaches that these steps are performed for each of the convolution layers in the DNN.)
	wherein the first input image is fed to any one of the convolution layers ([¶0032] "a convolution kernel of the layer is used to perform convolution operations on the input feature map (also known as input feature data or input feature value) of the layer")
	and horizontally divided into a plurality of cuboids of the same dimension, ([¶0038] "The depthwise convolution in FIG. 2A may be regarded as splitting M channels of one convolution kernel in the conventional convolution into M depthwise convolution kernels, each of which has R rows and S columns, and only 1 channel." [¶0039] "The pointwise convolution in FIG. 2B is exactly the same as the conventional convolution operation, except that the size of the convolution kernel is 1 row and 1 column, with a total of M channels and N of such convolution kernels. The N depthwise convolution kernels respectively convolve with the input feature map to obtain output results of the N channels" Splitting filter horizontally interpreted as synonymous with pointwise convolution as shown in FIG. 2B where each of the cuboids has the same dimension.)
	with an overlap of at least one row for each channel between any two adjacent cuboids; and([¶0078] "Depending on the reading stride, an overlapping portion may be located between every two adjacent groups of points in the p groups of points")
	wherein the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution.([¶0004] "it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations").
	However, Chen does not explicitly teach (c) compressing the [3D pointwise] output array into a second compressed segment by removing repetitive data between neighboring elements to store it in the first internal memory;.

	Gao, in the same field of endeavor, teaches (c) compressing the [3D pointwise] output array into a second compressed segment by removing repetitive data between neighboring elements to store it in the first internal memory;([p. 23 §2] "Figure 3: Skipping neuron updates save multiplications between input vectors and columns that correspond to zero ∆x(t) (also the behavior of Matrix-Vector Multiplication Channel discussed in Section 3.2.2)" Skipping interpreted as synonymous with removing.).

	Chen as well as Gao are directed towards accelerating neural networks.  Therefore, Chen as well as Gao are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Chen with the teachings of Gao by implementing delta-encoding in a convolutional neural-network accelerator.  Gao provides as additional motivation for combination ([p. 29 §4.4] "For GPU, the model is run in Theano with CUDA 9 and cuDNN 7 while the GPU power is measured using the nvidiasmi utility. With batch size = 128, the CPU/GPU finished the task in 295.87/45.05 ms with 35.6/95.9 W average power. On this task, DRNN is 63.8x/9.7x faster and 312.7x/128.5x more power efficient than CPU/GPU." [p. 29 §5] "In this paper, we illustrate how high power efficiency can be achieved for RNN inference by combining the DN algorithm, of which the training process is already integrated in Lasagne powered by Theano, with our proposed DRNN hardware architecture").  This motivation for combination also applies to the remaining claims which depend on this combination.  

	 Regarding claim 2, the combination of Chen, and Gao teaches The method according to claim 1, further comprising: applying a regular convolution on a second input image in the second internal memory with regular convolution filters for a layer that precedes the multiple convolution layers of the DNN to generate the first input image prior to steps (a) to (e); and(Chen [¶0003] "A Mobile network (i.e. MobileNet) is a latest special convolutional neural network, which reduces the calculation amount by decomposing the traditional three-dimensional convolution operation into two convolution operations, i.e. depthwise convolution and pointwise convolution, while the calculation accuracy is little different from that of the traditional convolution." [¶0094] "In step S212, the above operations are repeated (i.e. step S211), and the depthwise convolution calculations are performed according to the input feature maps and the depthwise convolution kernels, to obtain the intermediate feature values of the first predetermined number p of points on a next second predetermined number m of depthwise convolution output channels, and correspondingly performing subsequent operations, until the intermediate feature values of the first predetermined number p of points on all depthwise convolutional output channels are obtained. Intermediate feature values interpreted as synonymous with first input image obtained from applying a regular convolution operation on a second input image.  Chen explicitly teaches that intermediate results may be quantized (compressed) and stored in second memory [¶0049].)
	wherein the step of applying the regular convolution and the step (b) are performed in a pipelined manner.(Gao [p. 25 §3.1.2] "The MxV channel is fully pipelined to fetch operands in every following clock cycle once launched")
	compressing the first input image into multiple first compressed segments on a cuboid by cuboid basis to store them in the first internal memory after the step of applying and before the steps (a) to (e).(Chen [¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through (e) are performed.).
	
	 Regarding claim 4, the combination of Chen, and Gao teaches The method according to claim 1, wherein step (b) comprises: performing the depthwise convolution over the decompressed data with depthwise convolution filters to generate a 3D depthwise output array; and(Chen [¶0009] "performing the depthwise convolution calculations according to the input feature map and the depthwise convolution kernels, to obtain intermediate feature values of the first predetermined number p of points on a second predetermined number m of depthwise convolution output channels;.")
	performing the pointwise convolution over the 3D depthwise output array with pointwise convolution filters to generate the 3D pointwise output array.(Chen [¶0009] "performing the pointwise convolution calculations according to the intermediate feature values of the first predetermined number p of points on the second predetermined number m of depthwise convolution output channels and the pointwise convolution kernels, to obtain a current pointwise convolution partial sums of the first predetermined number p of points on all the pointwise convolution output channels;").
	
	 Regarding claim 9, Chen teaches An integrated circuit for convolution calculation applied in a deep neural network (DNN), comprising: at least one processor configured to perform a cuboid convolution over decompression data for each cuboid of a first input image fed to any one of multiple convolution layers of the DNN;([¶0002] "Deep learning technology based on convolutional neural network may be used for image recognition and detection" [¶0004] "Regarding the existing implementation solution of MobileNet, whether it is based on a general purpose processor (CPU), a dedicated mapics processor (GPU), or a dedicated processing chip, it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations" Depthwise convolution interpreted as synonymous with cuboid convolution.)
	a first internal memory coupled to the at least one processor;([¶0010] "disclosed is an electronic device comprising a processor, and a memory having computer program instructions stored therein, when executed by the processor, making the processor to perform a method for convolution calculation in a neural network[¶0002] "Deep learning technology based on convolutional neural network may be used for image recognition and detection" [¶0004] "Regarding the existing implementation solution of MobileNet, whether it is based on a general purpose processor (CPU), a dedicated mapics processor (GPU), or a dedicated processing chip, it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations")
	at least one multiply-accumulator (MAC) circuit coupled to the at least one processor and the first internal memory for performing multiplication and accumulation operations associated with the cuboid convolution to output a 3D pointwise output array;([¶0129] "For depthwise convolution, firstly calculating multiplication and accumulation results of p(p<=H*W) points and m(m<=M) channels, the accumulation here being the accumulation performed in the direction of the length and width of the convolution kernel, as R and S shown in FIG. 6, and here p*m multiply-accumulate (MAC) units being shared, and p*m multiply-accumulate results being obtained.")
	a second internal memory for storing multiple compressed segments only;([¶0005] " If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR), and then read these results out of DDR in batches and perform pointwise convolution calculations." Chen explicitly teaches that the compression step is part of the calculation step and that the operation is split into segments which are stored in the second memory.)
	a decompressor coupled to the at least one processor, the first and the second internal memories and configured to decompress the compressed segments from the second internal memory on a compressed segment by compressed segment basis to store the decompression data for a single cuboid in the first internal memory;([¶0005] "when the data amount of input and output is relatively large, a larger on-chip random access memory (SRAM) is required for buffering intermediate results. However, a size of on-chip SRAM is fixed. If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR)" [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data." SRAM interpreted as synonymous with first internal memory.  DDR interpreted as synonymous with second internal memory. Inverse quantization interpreted as synonymous with decompressing a first compressed segment. With respect to the instant specification a compressor is interpreted as a software device run on the processor.)
	wherein the first input image is horizontally divided into a plurality of cuboids of the same dimension, with an overlap of at least one row for each channel between any two adjacent cuboids; and([¶0038] "The depthwise convolution in FIG. 2A may be regarded as splitting M channels of one convolution kernel in the conventional convolution into M depthwise convolution kernels, each of which has R rows and S columns, and only 1 channel." [¶0039] "The pointwise convolution in FIG. 2B is exactly the same as the conventional convolution operation, except that the size of the convolution kernel is 1 row and 1 column, with a total of M channels and N of such convolution kernels. The N depthwise convolution kernels respectively convolve with the input feature map to obtain output results of the N channels" Splitting filter horizontally interpreted as synonymous with pointwise convolution as shown in FIG. 2B where each of the cuboids has the same dimension.)
	wherein the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution.([¶0004] "it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations").
	However, Chen does not explicitly teach a compressor coupled to the at least one processor, the at least one MAC circuit and the [first and the second] internal memories and configured to compress the [3D pointwise] output array into one compressed segment by removing repetitive data between neighboring elements to store it in the second internal memory; and.

	Gao, in the same field of endeavor, teaches a compressor coupled to the at least one processor, the at least one MAC circuit and the [first and the second] internal memories and configured to compress the [3D pointwise] output array into one compressed segment by removing repetitive data between neighboring elements to store it in the second internal memory; and([p. 22 §1] "The system was implemented on an Xilinx Zynq-7100 FPGA controlled by a dual ARM Cortex-A9 CPU. To provide sufficient bandwidth for arithmetic units and less external memory access" [p. 23 §2] "Figure 3: Skipping neuron updates save multiplications between input vectors and columns that correspond to zero ∆x(t) (also the behavior of Matrix-Vector Multiplication Channel discussed in Section 3.2.2)" [p. 25 §3.1] "Each channel has 128 clusters of multipliers (MUL) and 128 clusters of summation adders (ADD) both with two instances per cluster. This is equivalent to 256 MAC units per channel. Multipliers are instantiated using DSP blocks to perform multiplications on 16-bit signed integers" Skipping interpreted as synonymous with removing. See also FIG. 8).

Chen as well as Gao are directed towards accelerating neural networks.  Therefore, Chen as well as Gao are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Chen with the teachings of Gao by implementing delta-encoding in a convolutional neural-network accelerator.  Gao provides as additional motivation for combination ([p. 29 §4.4] "For GPU, the model is run in Theano with CUDA 9 and cuDNN 7 while the GPU power is measured using the nvidiasmi utility. With batch size = 128, the CPU/GPU finished the task in 295.87/45.05 ms with 35.6/95.9 W average power. On this task, DRNN is 63.8x/9.7x faster and 312.7x/128.5x more power efficient than CPU/GPU." [p. 29 §5] "In this paper, we illustrate how high power efficiency can be achieved for RNN inference by combining the DN algorithm, of which the training process is already integrated in Lasagne powered by Theano, with our proposed DRNN hardware architecture").  This motivation for combination also applies to the remaining claims which depend on this combination.  	 

Regarding claim 10, the combination of Chen, and Gao teaches The integrated circuit according to claim 9, wherein the at least one processor is further configured to perform a regular convolution over a second input image with regular convolution filters for a layer that precedes the multiple convolution layers of the DNN(Chen [¶0094] "In step S212, the above operations are repeated (i.e. step S211), and the depthwise convolution calculations are performed according to the input feature maps and the depthwise convolution kernels, to obtain the intermediate feature values of the first predetermined number p of points on a next second predetermined number m of depthwise convolution output channels, and correspondingly performing subsequent operations, until the intermediate feature values of the first predetermined number p of points on all depthwise convolutional output channels are obtained. Intermediate feature values interpreted as synonymous with first input image obtained from applying a regular convolution operation on a second input image.  Input layer interpreted as preceding the convolution layers. Chen explicitly teaches that intermediate results may be quantized (compressed) and stored in second memory [¶0049].)
	and cause the at least one MAC circuit to generate the first input image, and wherein the first internal memory is used to store the second input image.(Chen [¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." [¶0129] " For depthwise convolution, firstly calculating multiplication and accumulation results of p(p<=H*W) points and m(m<=M) channels, the accumulation here being the accumulation performed in the direction of the length and width of the convolution kernel, as R and S shown in FIG. 6, and here p*m multiply-accumulate (MAC) units being shared, and p*m multiply-accumulate results being obtained." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through (e) are performed.)
	and wherein the at least one processor is further configured to perform [the regular convolution and the cuboid convolution] in a pipelined manner(Gao [p. 25 §3.1.2] "The MxV channel is fully pipelined to fetch operands in every following clock cycle once launched" It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chen and Gao by performing the convolution operations in Chen in a pipelined manner as suggested by Gao.  This would amount to a simple substitution of the neural network layer type in Gao.).
	
	 Regarding claim 12, the combination of Chen, and Gao teaches The integrated circuit according to claim 10, wherein the at least one processor is further configured to perform the depthwise convolution over the decompressed data with depthwise convolution filters and cause the at least one MAC circuit to generate a 3D depthwise output array, (Chen [¶0009] "performing the depthwise convolution calculations according to the input feature map and the depthwise convolution kernels, to obtain intermediate feature values of the first predetermined number p of points on a second predetermined number m of depthwise convolution output channels;.")
	and then perform the pointwise convolution over the 3D depthwise output array with pointwise convolution filters and cause the at least one MAC circuit to generate the 3D pointwise output array.(Chen [¶0104] "reading intermediate feature values (as intermediate feature values shown in the intermediate feature map (1) in FIG. 8) of the first predetermined number p of points on a third predetermined number of depthwise convolution output channels" [¶0106] "for different hardware designs, the number of pointwise convolution calculation units MAC′ may or may not be equal to the number of depthwise convolution calculation unit MAC.").
	
	 Regarding claim 18, the combination of Chen, and Gao teaches The integrated circuit according to claim 9, further comprising: a neural function unit coupled among the at least one processor, the at least one MAC circuit and the compressor and configured to apply a selected activation function to each element outputted from the at least one MAC circuit.(Chen [¶0209] "1. For the depthwise convolution, firstly, calculating the multiplication and accumulation results of p(p<=H*W) points and m(m<=M) channels, the accumulation here being the accumulation performed in the direction of the length and width of the convolution kernel, as R and S shown in FIG. 6, and p*m multiply-accumulate (MAC) units being shared here, and p*m multiply-accumulate results being obtained." [¶0211] "2. Performing an optional activation operation on the results of abovementioned step 1" [¶0211] "3. Performing an optional quantization operation on the results of the abovementioned step 2" Step 1-3 of Chen explicitly teaches a MAC unit configured to apply an activation function and a quantization (compression) in series.).
	
	Claims 3 and 11 are rejected under U.S.C. §103 as being unpatentable over the combination of Chen and Gao and in further view of Price (US10223611B1).

	 Regarding claim 3, the combination of Chen, and Gao teaches The method according to claim 2.
	However, the combination of Chen, and Gao doesn't explicitly teach the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal..

	Price, in the same field of endeavor, teaches the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.([Abstract] "The method comprises receiving image data for an image in a system comprising a convolutional neural network (CNN), the CNN comprising a first convolutional layer, a last convolutional layer, and a fully connected layer; providing the image data to an input of the first convolutional layer; extracting multi-channel data from the output of the last convolutional layer" [Col. 4 l. 19-27] "an image taken with a digital camera may have a width of 640 pixels, a height of 960 pixels, and three (3) channels (red, green, and blue) or one (1) channel (greyscale)" Price explicitly teaches that the general image may have multiple channels which may be extracted through the CNN.).

	The combination of Chen and Gao as well as Price are directed towards image analysis.  Therefore, the combination of Chen and Gao as well as Price are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Price by using a general image as input to the convolutional neural network. It would be obvious to one of ordinary skill in the art that said image would be a standard format image with multiple channels (such as RGB).  Such image formats are well known in the art.  Price, however, teaches that the object detection can be performed in a single pass which would add a level of efficiency improving upon the disclosure of Chen.

	 Regarding claim 11, the combination of Chen, and Gao teaches The integrated circuit according to claim 10.
	However, the combination of Chen, and Gao doesn't explicitly teach, wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal..

	Price, in the same field of endeavor, teaches The integrated circuit according to claim 10, wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.([Abstract] "The method comprises receiving image data for an image in a system comprising a convolutional neural network (CNN), the CNN comprising a first convolutional layer, a last convolutional layer, and a fully connected layer; providing the image data to an input of the first convolutional layer; extracting multi-channel data from the output of the last convolutional layer" Price explicitly teaches that the general image may have multiple channels which may be extracted through the CNN.).

	The combination of Chen and Gao as well as Price are directed towards image analysis.  Therefore, the combination of Chen and Gao as well as Price are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Price by using a general image as input to the convolutional neural network. It would be obvious to one of ordinary skill in the art that said image would be a standard format image with multiple channels (such as RGB).  Such image formats are well known in the art.  Price, however, teaches that the object detection can be performed in a single pass which would add a level of efficiency improving upon the disclosure of Chen.

	Claims 5-7 and 14-16 are rejected under U.S.C. §103 as being unpatentable over the combination of Chen and Gao and in further view of Cooper (“Huffman Coding Analysis of XOR Filtered Images”, 2015).

	 Regarding claim 5, the combination of Chen, and Gao teaches The method according to claim 1, wherein step (c) further comprises: (c1) compressing the 3D pointwise output array into the second compressed segment (Chen [¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through (e) are performed.).
	However, the combination of Chen, and Gao doesn't explicitly teach compressing the [3D pointwise] output array according to a row repetitive value compression (RRVC) scheme..

	Cooper, in the same field of endeavor, teaches compressing the [3D pointwise] output array according to a row repetitive value compression (RRVC) scheme.([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.).

		The combination of Chen and Gao as well as Cooper are directed towards the field of image processing.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Cooper by implementing the XOR-shift based encoding in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  This motivation for combination also applies to the remaining claims which depend on this combination. 

	 Regarding claim 6, the combination of Chen, Gao, and Cooper teaches The method according to claim 5, wherein step (c1) further comprises: (1) dividing a target channel of the [3D pointwise] output array into multiple subarrays;(Cooper [p. 239 §IIA] "Fig. 1 : Illustration of the exclusive-or filter window for a sample image. The filter window is a height of 1 row and a width of m columns.  Figure 1 illustrates filter window, w, that is 1 row x m columns." Window interpreted as synonymous with subarray.)
	(2) forming a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray; (3) performing bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray;(Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi")
	(4) replacing non-zero (NZ) values of elements having multiple bits in the result map with 1 (Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi" The only possible value in a binary XOR operation other than zero is one.)
	and combining with their corresponding original values from the target subarray to form a portion of the second compressed segment;(Cooper [p. 239 §IIA] "The value of the filter window is then updated to be the value of the first row of the original image, w = xi")
	(5) repeating steps (2) to (4) until all the subarrays for the target channel are processed; and (6) repeating steps (1) to (5) until all the channels of the [3D pointwise] output array are processed to form the second compressed segment.(Cooper [p. 239 §IIA] "The process repeats for each row of the image.").
	
	 Regarding claim 7, the combination of Chen, and Gao teaches The method according to claim 1, wherein step (a) further comprises: (a1) decompressing the first compressed segment for the current cuboid to generate the decompression data (Chen [¶0089] "According to the current design parameters of the convolutional layer, at least one of the following operations may be performed for each intermediate feature value after obtaining but before storing each intermediate feature value: activation operation and quantization operation." [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data. ").
	However, the combination of Chen, and Gao doesn't explicitly teach to generate the decompression data  according to a row repetitive value decompression scheme..

	Cooper, in the same field of endeavor, teaches to generate the decompression data  according to a row repetitive value decompression scheme.([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter...The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.  XOR unfiltered algorithm shown in FIG. 3 interpreted as row repetitive value decompression scheme.).

	The combination of Chen and Gao as well as Cooper are directed towards the field of image processing.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Cooper by implementing the XOR-shift based encoding in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  This motivation for combination also applies to the remaining claims which depend on this combination. 

	 Regarding claim 14, the combination of Chen, and Gao teaches The integrated circuit according to claim 10, wherein the compressor is further configured to compress each of the 3D pointwise output array and each cuboid of the first input image as a target cuboid to generate a corresponding compressed segment (Chen [¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through (e) are performed.).
	However, the combination of Chen, and Gao doesn't explicitly teach to generate a corresponding compressed segment according to a row repetitive value compression (RRVC) scheme..

	Cooper, in the same field of endeavor, teaches to generate a corresponding compressed segment according to a row repetitive value compression (RRVC) scheme.([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.).

	The combination of Chen and Gao as well as Cooper are directed towards the field of image processing.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Cooper by implementing the XOR-shift based encoding in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  This motivation for combination also applies to the remaining claims which depend on this combination. 

	 Regarding claim 15, the combination of Chen, Gao, and Cooper teaches The integrated circuit according to claim 14, wherein according to the RRVC scheme, the compressor is further configured to (1) divide a target channel of the target cuboid into multiple subarrays;(Cooper [p. 239 §IIA] "Fig. 1 : Illustration of the exclusive-or filter window for a sample image. The filter window is a height of 1 row and a width of m columns.  Figure 1 illustrates filter window, w, that is 1 row x m columns." Window interpreted as synonymous with subarray.)
	(2) form a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray;(Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi")
	(3) perform bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray;(Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi" The only possible value in a binary XOR operation other than zero is one.)
	(4) replace non-zero values of elements having multiple bits in the result map with 1 and combine with their corresponding original values from the target subarray to form a portion of the corresponding compressed segment;(Cooper [p. 239 §IIA] "The value of the filter window is then updated to be the value of the first row of the original image, w = xi")
	(5) repeat steps (2) to (4) until all the subarrays for the target channel are processed; and (6) repeat steps (1) to (5) until all the channels of the target cuboid are processed to form the corresponding compressed segment.(Cooper [p. 239 §IIA] "The process repeats for each row of the image.").
	
	 Regarding claim 16, the combination of Chen, and Gao teaches The integrated circuit according to claim 9, wherein the decompressor is further configured to decompress each compressed segment from the second internal memory (Chen [¶0089] "According to the current design parameters of the convolutional layer, at least one of the following operations may be performed for each intermediate feature value after obtaining but before storing each intermediate feature value: activation operation and quantization operation." [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data. ").
	However, the combination of Chen, and Gao doesn't explicitly teach to decompress each compressed segment from the [second internal] memory according to a row repetitive value decompression scheme..

	Cooper, in the same field of endeavor, teaches to decompress each compressed segment from the [second internal] memory according to a row repetitive value decompression scheme.([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter...The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.  XOR unfiltered algorithm shown in FIG. 3 interpreted as row repetitive value decompression scheme.).

		The combination of Chen and Gao as well as Cooper are directed towards the field of image processing.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Cooper by implementing the XOR-shift based encoding in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  This motivation for combination also applies to the remaining claims which depend on this combination. 

	Claims 8 and 17 are rejected under U.S.C. §103 as being unpatentable over the combination of Chen and Gao and Cooper and Juefei-Xu (“Local Binary Convolutional Neural Networks”, 2017).

	 Regarding claim 8, the combination of Chen, Gao, and Cooper teaches (2) restoring NZ values of elements having multiple bits in the target restored subarray according to the NZ bitmap and its corresponding original values;(Cooper [p. 237 §I] "A relationship that can be captured through decorrelation is the difference between neighboring pixels. A sample data set is [179 179 176 176 179 179 181 180]. Using the subtraction operator between pixels is low complexity and reversible with output of small integer values. The values in the sample data set as extracted through subtraction operator are [-1 0 -3 0 3 0 2 -1] assuming a boundary value of 180. The Boolean exclusive-or operator can be used to identify differences between pixels and also result in a dataset of small values. This operator is not only reversible, it only returns positive integers. Table I illustrates the exclusive-or operator of the same sample data and the output values are [7 0 3 0 3 0 6 1]." [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set" Recovered image row interpreted as synonymous with restored NZ bitmap subarray. See also FIG. 3 algorithm.  Each row in Cooper shows that each of the pixel values in each of the rows has multiple bits.)
	(3) forming a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray;(Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi." See also FIG. 3 algorithm.)
	(4) writing zeros in a restored result map according to the locations of zeros in the [NZ bitmap];(Cooper [p. 238 §I] "The authors of [6] aim to produce long runs of zeros and ones by applying neighboring bit-wise exclusive-or logic operations to the original data set as a transform." Original data set interpreted as restored result map.  Performing the inverse decorrelation filter interpreted as synonymous with restoring the result map.)
	(5) Filling in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, (Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi. The reversal process repeats for each row of the image to recover the original data set." Cooper teaches reading and writing the values of the target restored subarray in place.  It would be obvious to one of ordinary skill in the art to use a secondary blank array to store the values of the XOR operation and would lead to obvious and expected results.)
	multiple bitwise XOR operations over the target restored reference row and row 1 of the target restored subarray, and the multiple bitwise XOR operations over any two adjacent rows of the target restored subarray;(Cooper [p. 239 §II] "Figure 1 illustrates filter window, w, that is 1 row x m columns. The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi" See FIG. 2 algorithm. All corresponding ith elements interpreted as adjacent row elements.)
	(6) repeating steps (1) to (5) until all the restored subarrays for the target channel are processed; and (7) repeating steps (1) to (6) until all the channels for the first compressed segment are processed to form the decompressed data.(Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set." See also FIG. 3 algorithm.).
	However, the combination of Chen, Gao, and Cooper doesn't explicitly teach The method according to claim 7, wherein step (a1) further comprises: (1) fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid;.

	Juefei-Xu, in the same field of endeavor, teaches The method according to claim 7, wherein step (a1) further comprises: (1) fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid;([p. 21 §3.1] "The input image xl is filtered by these LBC filters to generate m difference maps that are then activated through a non-linear activation function, resulting in m bit maps." Target restored subarray is an image row which is interpreted as being associated with the NZ bitmap.  Resulting in binary image bit map interpreted as synonymous with fetching a NZ bitmap.  See also FIG. 1 and 2.).

	The combination of Chen, Gao, and Cooper as well as Juefei-Xu are directed towards a convolutional neural network accelerator.  Therefore, the combination of Chen, Gao, and Cooper as well as Juefei-Xu are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, Gao, and Cooper with the teachings of Juefei-Xu by using a binary image. Juefei-Xu teaches as motivation for combination ([p. 20 §I] “The idea of using binary filters for convolutional layers is not new… BinaryConnect alternates between binarized and real-valued weights during the network training process… These approaches lead to drastic improvement in run-time efficiency by replacing most 32-bit floating point multiply-accumulations by 1-bit XNOR-count operations”).  The usage of XNOR operations in Juefei-Xe is also seen as strengthening the combination of Chen and Cooper as the disclosure of Juefei-Xu is highly analogous to both arts.

	 Regarding claim 17, the combination of Chen, Gao, and Cooper teaches (1) fetch a non-zero (NZ) bitmap and its corresponding original values associated with a target restored subarray of a target channel in one compressed segment for a target cuboid;(Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set" Recovered image row interpreted as synonymous with restored NZ bitmap subarray. See also FIG. 3 algorithm.)
	(2) restore NZ values of elements having multiple bits in the target restored subarray according to the NZ bitmap and its corresponding original values;(Cooper [p. 237 §I] "A relationship that can be captured through decorrelation is the difference between neighboring pixels. A sample data set is [179 179 176 176 179 179 181 180]. Using the subtraction operator between pixels is low complexity and reversible with output of small integer values. The values in the sample data set as extracted through subtraction operator are [-1 0 -3 0 3 0 2 -1] assuming a boundary value of 180. The Boolean exclusive-or operator can be used to identify differences between pixels and also result in a dataset of small values. This operator is not only reversible, it only returns positive integers. Table I illustrates the exclusive-or operator of the same sample data and the output values are [7 0 3 0 3 0 6 1]." [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi." See also FIG. 3 algorithm.  Cooper shows that each of the pixel values in each of the image rows have multiple bits.)
	(3) form a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray;(Cooper [p. 238 §I] "The authors of [6] aim to produce long runs of zeros and ones by applying neighboring bit-wise exclusive-or logic operations to the original data set as a transform." Original data set interpreted as restored result map.  Performing the inverse decorrelation filter interpreted as synonymous with restoring the result map.)
	(4) write zeros in a restored result map according to the locations of zeros in the NZ bitmap;(Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi. The reversal process repeats for each row of the image to recover the original data set." Cooper teaches reading and writing the values of the target restored subarray in place.  It would be obvious to one of ordinary skill in the art to use a secondary blank array to store the values of the XOR operation and would lead to obvious and expected results.)
	(5) fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, multiple bitwise XOR operations over the restored reference row and row 1 of the target restored subarray, and the multiple bitwise XOR operations over any two adjacent rows of the target restored subarray;(Cooper [p. 239 §II] "Figure 1 illustrates filter window, w, that is 1 row x m columns. The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi" See FIG. 2 algorithm. All corresponding ith elements interpreted as adjacent row elements.)
	(6) repeat steps (1) to (5) until all the restored subarrays for the target channel are processed; and (7) repeat steps (1) to (6) until all the channels of the compressed segment for the target cuboid are processed to form the decompressed data.(Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set." See also FIG. 3 algorithm.).
	However, the combination of Chen, Gao, and Cooper doesn't explicitly teach The integrated circuit according to claim 16, wherein according to the row repetitive value decompression scheme, the decompressor is further configured to:.

	Juefei-Xu, in the same field of endeavor, teaches The integrated circuit according to claim 16, wherein according to the row repetitive value decompression scheme, the decompressor is further configured to:([p. 21 §3.1] "The input image xl is filtered by these LBC filters to generate m difference maps that are then activated through a non-linear activation function, resulting in m bit maps." Target restored subarray is an image row which is interpreted as being associated with the NZ bitmap.  Resulting in binary image bit map interpreted as synonymous with fetching a NZ bitmap.  See also FIG. 1 and 2.).

	The combination of Chen, Gao, and Cooper as well as Juefei-Xu are directed towards a convolutional neural network accelerator.  Therefore, the combination of Chen, Gao, and Cooper as well as Juefei-Xu are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, Gao, and Cooper with the teachings of Juefei-Xu by using a binary image. Juefei-Xu teaches as motivation for combination ([p. 20 §I] “The idea of using binary filters for convolutional layers is not new… BinaryConnect alternates between binarized and real-valued weights during the network training process… These approaches lead to drastic improvement in run-time efficiency by replacing most 32-bit floating point multiply-accumulations by 1-bit XNOR-count operations”).  The usage of XNOR operations in Juefei-Xe is also seen as strengthening the combination of Chen and Cooper as the disclosure of Juefei-Xu is highly analogous to both arts.

	Claim 13 is rejected under U.S.C. §103 as being unpatentable over the combination of Chen and Gao and Raju (WO2018217965A1).

	 Regarding claim 13, the combination of Chen, and Gao teaches The integrated circuit according to claim 12, further comprising: a flash memory (Chen [¶0228] "The memory 12 may comprise one or more computer program products which may comprise various forms of computer readable and writable storage media, such as a volatile memory and/or a non-volatile memory. The volatile memory may comprise, for example, a random access memory (RAM) and/or a cache, etc. The non-volatile memory may comprise, for example, a read only memory (ROM), a hard disk, a flash memory, etc").
	However, the combination of Chen, and Gao doesn't explicitly teach a flash memory for storing coefficients forming the regular convolution filters, the depthwise convolution filters and the pointwise convolution filters;
	wherein the at least one processor is further configured to read corresponding coefficients from the flash memory and temporarily store them in the first internal memory prior to the regular convolution and the cuboid convolution..

	Raju, in the same field of endeavor, teaches a flash memory for storing coefficients forming the regular convolution filters, the depthwise convolution filters and the pointwise convolution filters;([¶0021]"filter coefficients (or weights) that may be used on the CNN algorithm may be encrypted and stored at memories that are external to the SoC device 104, such as external flash 208 and/or external memory 210")
	wherein the at least one processor is further configured to read corresponding coefficients from the flash memory and temporarily store them in the first internal memory prior to the regular convolution and the cuboid convolution.([¶0024] "At any time during the signal processing, the decrypted weights, the decrypted inputs, and the encrypted outputs may not be available to the external memories (i.e., external flash 208 and external memory 210), in order to prevent exposure to malicious attacks" [¶0025] " the CNN HW engine 200 may implement parallel execution of convolutions of the decrypted inputs and weights, and supply the output back to the secure IP block 202 to form an encrypted output." Raju explicitly teaches that the weights may only be stored on flash prior to decryption and convolution.).

	The combination of Chen and Gao as well as Raju are directed towards accelerating a convolutional neural network.  Therefore, the combination of Chen, and Gao as well as Raju are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Raju by using a flash memory to store the filter weights. Raju explicitly teaches as a motivation for combination that the neural network filter coefficients may be encrypted and stored on an external memory device such as flash, and then decrypted on a secure local system.  Raju teaches as a motivation for combination ([¶0025] “Accordingly, the CNN HW engine 200 may be configured to retrieve and use directly the decrypted weights and decrypted input through a hardware concurrent parallel execution of security engines for hidden layers during the signal processing.”).   

	Claim 19 is rejected under U.S.C. §103 as being unpatentable over the combination of Chen and Gao and Coenen (US20190340493A1).

	 Regarding claim 19, the combination of Chen, and Gao teaches The integrated circuit according to claim 18.
	However, the combination of Chen, and Gao doesn't explicitly teach the neural function unit comprises: an adder for adding each element from the at least one MAC circuit with a bias value to generate a biased element;
	a number Q of activation function lookup tables coupled to the adder for outputting Q activation values according to the biased element; and
	a multiplexer coupled to the compressor and Q output terminals of the number Q of activation function lookup tables for selecting one of the Q activation values as an output element..

	Coenen, in the same field of endeavor, teaches The integrated circuit according to claim 18, wherein the neural function unit comprises: an adder for adding each element from the at least one MAC circuit with a bias value to generate a biased element;([¶0041] "there is a look up table (LUT) 601 that implements the activation function and an adder 602 for computing the bias." See FIG. 6 which shows neuron unit comprising MAC units.)
	a number Q of activation function lookup tables coupled to the adder for outputting Q activation values according to the biased element; and([¶0041] "there is a look up table (LUT) 601 that implements the activation function and an adder 602 for computing the bias.")
	a multiplexer coupled to the compressor and Q output terminals of the number Q of activation function lookup tables for selecting one of the Q activation values as an output element.(See FIG. 6 showing the compressor connected to the activation function unit which is explicitly implemented with a LUT. [¶0041] " The application of the activation function is configurable by selecting on of the inputs to a multiplexor 610").

	The combination of Chen, and Gao as well as Coenen are directed towards neural network accelerators.  Therefore, the combination of Chen, and Gao as well as Coenen are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Chen, and Gao with the teachings of Coenen by using a bias as well as a lookup table for the activation functions. Coenen teaches as a motivation for combination of the lookup table circuit ([¶0041] “This implementation of the accelerator allows for software to control the neural network processing and either hardware or software to apply the activation function.”).   
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124