Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 13 objected to because of the following informalities:  
Regarding claim 13, “pre-storing” coefficients is not a widely used term in the art.  Pre-storing is not a term widely recognized in the art and the instant specification does not explicitly differentiate pre-storing from storing.  In the interest of further examination pre-storing is interpreted as storing.  
Regarding claim 19, “the at least MAC circuit” is interpreted as a typo which should read “the at least one MAC circuit”

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“the compressor…configured to” in claims 9, 14, and 15
“the decompressor…configured to” in claims 9, 16, and 17
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  With respect to the compressor and decompressor, structure is provided in at least [¶0074] of the instant specification “In an alternative embodiment, the DSPs 140, the MAC circuits 111, the neural function unit 112, the compressor 113 and the de-compressor 114 are implemented with a general-purpose processor and a program memory”.  
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 4, 8, and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 4, second filters are indefinite.  Claim 1 which claim 4 depends on does not introduce filters or first filters such that it would be obvious to one of ordinary skill in the art what filters the second filters are differentiating themselves from.  The instant specification does not teach second filters in order to cure this deficiency.  A second filter could either be a filter in a second layer, or a copy of the first filter in the second memory, or any other number of interpretations which may contradict each other.  In any of these cases disclosure of the first filter must be introduced in a claim which the second filter is dependent on.  In the interest of further examination, “second filters” are interpreted as merely filters.  

Regarding claim 8, “the bitwise XOR operations” lacks antecedent basis. “a bitwise XOR operation” is recommended. 

Regarding claim 19, “the output terminals” lacks antecedent basis.  “an output terminal” is recommended.

	The remaining claims are rejected with respect to their dependence on the rejected claims.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-19 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
decompressing a first compressed segment associated with a current cuboid of a first input image (mathematical calculation)
performing cuboid convolution over the decompressed data to generate a 3D pointwise output array (mathematical calculation)
compressing the 3D pointwise output array into a second compressed segment (mathematical calculation)
the first input image is fed to any one of the convolution layers and horizontally divided into a plurality of cuboids of the same dimension (mathematical calculation)
the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution (mathematical calculation)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “first internal memory”, “second internal memory”, and “convolution layer”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 1 also recites additional elements “outputted from the first internal memory to store decompressed data in the second internal memory” which amounts to gathering and outputting data, which is insignificant extra-solution activity (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.  Claim 1 also recites additional elements “repeating steps (a) to (c) until all the cuboids associated with a target convolution layer are processed” and “repeating steps (a) to (d) until all of multiple convolution layers are completed” which is considered to be well understood, routine, and conventional in the art (MPEP §2106.05(d) See Bancorp Services v. Sun Life, 687 F.3d 1266, 1278, 103 USPQ2d 1425, 1433 (Fed. Cir. 2012) ("The computer required by some of Bancorp’s claims is employed only for its most basic function, the performance of repetitive calculations, and as such does not impose meaningful limits on the scope of those claims.") as well as and “to store it in the first internal memory” which is well-understood, routine, and conventional (See MPEP 2106.05(d): Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. The rejection applies as well as to dependent claims 2-8. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional mathematical calculations “applying a regular convolution on a second input image in the second internal memory with first filters for a layer that precedes the convolution layers to generate the first input image” and “compressing the first input image into multiple first compressed segments on a cuboid by cuboid basis” as well as additional elements “to store them in the first internal memory after the step of applying and before the steps.” which is well-understood, routine, and conventional.
Dependent claim 3 recites additional insignificant extra-solution activity “wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.” which amounts to selection of a data-type.
Dependent claim 4 recites additional mathematical calculations “performing the depthwise convolution over the decompressed data with second filters to generate a 3D depthwise output array” and “performing the pointwise convolution over the 3D depthwise output array with third filters to generate the 3D pointwise output array”.
Dependent claim 5 recites additional mathematical calculations “compressing the 3D pointwise output array into the second compressed segment according to a row repetitive value compression (RRVC) scheme.”
Dependent claim 6 recites additional mathematical calculations “dividing a target channel of the 3D pointwise output array into multiple subarrays”, “performing bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray” as well as additional observation, evaluation, and judgement “forming a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray”, and “replacing non-zero (NZ) values in the result map with 1 and fetching their corresponding original values from the target subarray to form a portion of the second compressed segment”
Dependent claim 7 recites additional mathematical calculations “decompressing the first compressed segment for the current cuboid to generate the decompression data according to a row repetitive value decompression scheme.”
Dependent claim 8 recites additional observation, evaluation, and judgement “Fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, the bitwise XOR operations over the target restored reference row and row 1 of the target restored subarray, and the bitwise XOR operations over any two adjacent rows of the target restored subarray”, “forming a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray” and “restoring NZ values in the target restored subarray according to the NZ bitmap and its corresponding original values” as well as additional insignificant extra-solution activity of gathering and outputting data “fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid” and “writing zeros in a restored result map according to the locations of zeros in the NZ bitmap”

Regarding Claim 9:  Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 9 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
to perform a cuboid convolution over decompression data for each cuboid of a first input image fed to any one of multiple convolution layers (mathematical calculation),
performing multiplication and accumulation operations associated with the cuboid convolution to output a first convoluted cuboid (mathematical calculation)
to compress the first convoluted cuboid into one compressed segment (mathematical calculation)
decompress the compressed segments from the second internal memory on a compressed segment by compressed segment basis (mathematical calculation)
the first input image is horizontally divided into a plurality of cuboids of the same dimension, with an overlap of at least one row for each channel between any two adjacent cuboids (mathematical calculation)
the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution (mathematical calculation)
Therefore, claim 9 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 9 recites additional elements “first internal memory”, “second internal memory”, “MAC circuit”, “compressor”, and “decompressor”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 9 is directed to a judicial exception.
Step 2B Analysis:  Claim 9 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 9 amount to no more than mere instructions to apply the judicial exception using a generic computer component.  Claim 9 also recites additional elements “to store it in the second internal memory”,  “to store the decompression data for a single cuboid in the first internal memory” and “storing multiple compressed segments only” which is well-understood, routine, and conventional (See MPEP 2106.05(d): Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;).
For the reasons above, claim 9 is rejected as being directed to non-patentable subject matter under §101. The rejection applies as well as to dependent claims 10-19. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 10 recites additional mathematical calculations “to perform a regular convolution over a second input image with first filters for a layer that precedes the convolution layers” as well as additional insignificant extra-solution activity “cause the at least one MAC circuit to generate the first input image having multiple second convoluted cuboids” and “to store the second input image” which is well-understood, routine, and conventional.
Dependent claim 11 recites additional insignificant extra-solution activity “wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.” which amounts to selection of a data-type.
Dependent claim 12 recites additional mathematical calculations “to perform the depthwise convolution over the decompressed data with second filters and cause the at least one MAC circuit to generate a 3D depthwise output array” and “perform the pointwise convolution over the 3D depthwise output array with third filters and cause the at least one MAC circuit to generate the first convoluted cuboid”.
Dependent claim 13 recites additional insignificant extra-solution activity “the at least one processor is further configured to read corresponding coefficients from the flash memory and temporarily store them in the first internal memory prior to the regular convolution and the cuboid convolution” that amounts to gathering and outputting data.  Claim 13 also recites additional generic computer components “flash memory”
Dependent claim 14 recites additional mathematical calculations “to compress each of the first and the second convoluted cuboids as a target cuboid to generate a corresponding compressed segment according to a row repetitive value compression (RRVC) scheme.”
Dependent claim 15 recites additional mathematical calculations “dividing a target channel of the 3D pointwise output array into multiple subarrays”, “performing bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray” as well as additional observation, evaluation, and judgement “forming a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray”, and “replacing non-zero (NZ) values in the result map with 1 and fetching their corresponding original values from the target subarray to form a portion of the second compressed segment”
Dependent claim 16 recites additional mathematical calculations “the decompressor is further configured to decompress each compressed segment from the second internal memory according to a row repetitive value decompression scheme.”
Dependent claim 17 recites additional observation, evaluation, and judgement “Fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, the bitwise XOR operations over the target restored reference row and row 1 of the target restored subarray, and the bitwise XOR operations over any two adjacent rows of the target restored subarray”, “forming a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray” and “restoring NZ values in the target restored subarray according to the NZ bitmap and its corresponding original values” as well as additional insignificant extra-solution activity of gathering and outputting data “fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid” and “writing zeros in a restored result map according to the locations of zeros in the NZ bitmap”
Dependent claim 18 recites “to apply a selected activation function to each element outputted from the at least MAC circuit.” Which is seen as generally linking the judicial exception to a particular field or technology.
Dependent claim 19 recites additional generic computer components “lookup tables”, “multiplexer”, and “adder” as well as additional insignificant extra-solution activity “outputting Q activation values according to the biased element” and “selecting one of the Q activation values as an output element” which amounts to gathering and outputting data.

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-19 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 4, 9, 10, 12, and 18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen (US20190188237A1).

	Regarding claim 1, Chen teaches A method applied in an integrated circuit for use in a deep neural network, the integrated circuit comprising a first internal memory and a second internal memory, the method comprising: ([¶0001] "The present disclosure relates to the technical field of neural network, and more particularly to a method and an electronic device for convolution calculation in a neural network" [¶0049] "the memory may be an on-chip random access memory (SRAM) to achieve faster access speed and avoid occupying data transmission bandwidth. However, the present disclosure is not limited thereto. For example, the memory may be other memories, such as an off chip memory (DDR). The available space in the memory may be used to buffer intermediate output results of depthwise convolution operations.")
	(a) decompressing a first compressed segment associated with a current cuboid of a first input image and outputted from the first internal memory ([¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data." Inverse quantization interpreted as synonymous with decompressing a first compressed segment.)
	to store decompressed data in the second internal memory; ([¶0005] "when the data amount of input and output is relatively large, a larger on-chip random access memory (SRAM) is required for buffering intermediate results. However, a size of on-chip SRAM is fixed. If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR)" SRAM interpreted as synonymous with first internal memory.  DDR interpreted as synonymous with second internal memory.  Chen explicitly teaches that the quantization and inverse quantization may be introduced to the calculation data, and that the calculation data may be stored in the second internal memory.)
	(b) performing cuboid convolution over the decompressed data to generate a 3D pointwise output array; ([¶0065] "the pointwise convolution calculations are performed according to the intermediate feature values of the first predetermined number p of points on all depthwise convolution output channels and pointwise convolution kernels, to obtain output feature values of the first predetermined number p of points on all pointwise convolution output channels." As shown in FIG. 1 the convolution kernels are cuboids such that performing all depthwise and pointwise convolutions is interpreted as synonymous with performing cuboid convolutions to generate a 3D depthwise/pointwise output array.)
	(c) compressing the 3D pointwise output array into a second compressed segment to store it in the first internal memory; ([¶0091] " high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.")
	(d) repeating steps (a) to (c) until all the cuboids associated with a target convolution layer are processed; and ([¶0094] "In step S212, the above operations are repeated (i.e. step S211), and the depthwise convolution calculations are performed according to the input feature maps and the depthwise convolution kernels, to obtain the intermediate feature values of the first predetermined number p of points")
	(e) repeating steps (a) to (d) until all of multiple convolution layers are completed; ([¶0094] "on a next second predetermined number m of depthwise convolution output channels, and correspondingly performing subsequent operations, until the intermediate feature values of the first predetermined number p of points on all depthwise convolutional output channels are obtained." Convolution channel interpreted as synonymous with output layer.)
	wherein the first input image is fed to any one of the convolution layers ([¶0032] "a convolution kernel of the layer is used to perform convolution operations on the input feature map (also known as input feature data or input feature value) of the layer")
	and horizontally divided into a plurality of cuboids of the same dimension, ([¶0038] "The depthwise convolution in FIG. 2A may be regarded as splitting M channels of one convolution kernel in the conventional convolution into M depthwise convolution kernels, each of which has R rows and S columns, and only 1 channel." [¶0039] "The pointwise convolution in FIG. 2B is exactly the same as the conventional convolution operation, except that the size of the convolution kernel is 1 row and 1 column, with a total of M channels and N of such convolution kernels. The N depthwise convolution kernels respectively convolve with the input feature map to obtain output results of the N channels" Splitting filter horizontally interpreted as synonymous with pointwise convolution as shown in FIG. 2B where each of the cuboids has the same dimension and is divided horizontally.)
	with an overlap of at least one row for each channel between any two adjacent cuboids; and ([¶0078] "Depending on the reading stride, an overlapping portion may be located between every two adjacent groups of points in the p groups of points").
	wherein the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution. ([¶0004] "it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations"). 

	Regarding claim 2, Chen teaches The method according to claim 1, further comprising: applying a regular convolution on a second input image in the second internal memory with first filters for a layer that precedes the convolution layers to generate the first input image prior to steps (a) to (e); and ([¶0094] "In step S212, the above operations are repeated (i.e. step S211), and the depthwise convolution calculations are performed according to the input feature maps and the depthwise convolution kernels, to obtain the intermediate feature values of the first predetermined number p of points on a next second predetermined number m of depthwise convolution output channels, and correspondingly performing subsequent operations, until the intermediate feature values of the first predetermined number p of points on all depthwise convolutional output channels are obtained. Intermediate feature values interpreted as synonymous with first input image obtained from applying a regular convolution operation on a second input image.  Chen explicitly teaches that intermediate results may be quantized (compressed) and stored in second memory [¶0049].)
	compressing the first input image into multiple first compressed segments on a cuboid by cuboid basis to store them in the first internal memory after the step of applying and before the steps (a) to (e). ([¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through ( e) are performed.). 

Regarding claim 4, Chen teaches The method according to claim 1, wherein step (b) comprises: performing the depthwise convolution over the decompressed data with second filters to generate a 3D depthwise output array; and ([¶0009] "performing the depthwise convolution calculations according to the input feature map and the depthwise convolution kernels, to obtain intermediate feature values of the first predetermined number p of points on a second predetermined number m of depthwise convolution output channels;.")
	performing the pointwise convolution over the 3D depthwise output array with third filters to generate the 3D pointwise output array. ([¶0009] "performing the pointwise convolution calculations according to the intermediate feature values of the first predetermined number p of points on the second predetermined number m of depthwise convolution output channels and the pointwise convolution kernels, to obtain a current pointwise convolution partial sums of the first predetermined number p of points on all the pointwise convolution output channels;")

Regarding claim 9, Chen teaches  ([¶0002] "Deep learning technology based on convolutional neural network may be used for image recognition and detection" [¶0004] "Regarding the existing implementation solution of MobileNet, whether it is based on a general purpose processor (CPU), a dedicated mapics processor (GPU), or a dedicated processing chip, it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations" Depthwise convolution interpreted as synonymous with cuboid convolution.)
	a first internal memory coupled to the at least one processor; ([¶0010] "disclosed is an electronic device comprising a processor, and a memory having computer program instructions stored therein, when executed by the processor, making the processor to perform a method for convolution calculation in a neural network[¶0002] "Deep learning technology based on convolutional neural network may be used for image recognition and detection" [¶0004] "Regarding the existing implementation solution of MobileNet, whether it is based on a general purpose processor (CPU), a dedicated mapics processor (GPU), or a dedicated processing chip, it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations")
	at least one multiply-accumulator (MAC) circuit coupled to the at least one processor and the first internal memory for performing multiplication and accumulation operations associated with the cuboid convolution to output a first convoluted cuboid; ([¶0129] "For depthwise convolution, firstly calculating multiplication and accumulation results of p(p<=H*W) points and m(m<=M) channels, the accumulation here being the accumulation performed in the direction of the length and width of the convolution kernel, as R and S shown in FIG. 6, and here p*m multiply-accumulate (MAC) units being shared, and p*m multiply-accumulate results being obtained.")
	a second internal memory for storing multiple compressed segments only; ([¶0005] " If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR), and then read these results out of DDR in batches and perform pointwise convolution calculations." Chen explicitly teaches that the compression step is part of the calculation step and that the operation is split into segments which are stored in the second memory.)
	a compressor coupled to the at least one processor, the at least one MAC circuit and the first and the second internal memories and configured to compress the first convoluted cuboid into one compressed segment to store it in the second internal memory; and ([¶0005] "when the data amount of input and output is relatively large, a larger on-chip random access memory (SRAM) is required for buffering intermediate results. However, a size of on-chip SRAM is fixed. If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR)" [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data." SRAM interpreted as synonymous with first internal memory.  DDR interpreted as synonymous with second internal memory. Inverse quantization interpreted as synonymous with decompressing a first compressed segment. With respect to the instant specification a compressor is interpreted as a software device run on the processor.)
	a decompressor coupled to the at least one processor, the first and the second internal memories and configured to decompress the compressed segments from the second internal memory on a compressed segment by compressed segment basis to store the decompression data for a single cuboid in the first internal memory; ([¶0005] "when the data amount of input and output is relatively large, a larger on-chip random access memory (SRAM) is required for buffering intermediate results. However, a size of on-chip SRAM is fixed. If the size of on-chip SRAM is insufficient to buffer the intermediate results, it is necessary to split the depthwise convolution operation into multiple calculations and write each calculation result into off-chip memory (DDR) until the calculation results of the depthwise convolution operation are completely calculated and written into the off-chip memory (DDR)" [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data." SRAM interpreted as synonymous with first internal memory.  DDR interpreted as synonymous with second internal memory. Inverse quantization interpreted as synonymous with decompressing a first compressed segment. With respect to the instant specification a compressor is interpreted as a software device run on the processor. Chen explicitly teaches that the quantization and inverse quantization may be introduced to the calculation data, and that the calculation data may be stored in the second internal memory.)
	wherein the first input image is horizontally divided into a plurality of cuboids of the same dimension, with an overlap of at least one row for each channel between any two adjacent cuboids ([¶0038] "The depthwise convolution in FIG. 2A may be regarded as splitting M channels of one convolution kernel in the conventional convolution into M depthwise convolution kernels, each of which has R rows and S columns, and only 1 channel." [¶0039] "The pointwise convolution in FIG. 2B is exactly the same as the conventional convolution operation, except that the size of the convolution kernel is 1 row and 1 column, with a total of M channels and N of such convolution kernels. The N depthwise convolution kernels respectively convolve with the input feature map to obtain output results of the N channels" Splitting filter horizontally interpreted as synonymous with pointwise convolution as shown in FIG. 2B where each of the cuboids has the same dimension.).
	wherein the cuboid convolution comprises a depthwise convolution followed by a pointwise convolution. ([¶0004] "it is necessary to firstly calculate the output of depthwise convolution operation, and then take them as input data of the pointwise convolution operation, and then perform calculations"). 

Regarding claim 10, Chen teaches The integrated circuit according to claim 9, wherein the at least one processor is further configured to perform a regular convolution over a second input image with first filters for a layer that precedes the convolution layers ([¶0094] "In step S212, the above operations are repeated (i.e. step S211), and the depthwise convolution calculations are performed according to the input feature maps and the depthwise convolution kernels, to obtain the intermediate feature values of the first predetermined number p of points on a next second predetermined number m of depthwise convolution output channels, and correspondingly performing subsequent operations, until the intermediate feature values of the first predetermined number p of points on all depthwise convolutional output channels are obtained. Intermediate feature values interpreted as synonymous with first input image obtained from applying a regular convolution operation on a second input image.  Input layer interpreted as preceding the convolution layers. Chen explicitly teaches that intermediate results may be quantized (compressed) and stored in second memory [¶0049].)
	and cause the at least one MAC circuit to generate the first input image having multiple second convoluted cuboids, and wherein the first internal memory is used to store the second input image. ([¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." [¶0129] " For depthwise convolution, firstly calculating multiplication and accumulation results of p(p<=H*W) points and m(m<=M) channels, the accumulation here being the accumulation performed in the direction of the length and width of the convolution kernel, as R and S shown in FIG. 6, and here p*m multiply-accumulate (MAC) units being shared, and p*m multiply-accumulate results being obtained." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through ( e) are performed.). 

	Regarding claim 12, Chen teaches The integrated circuit according to claim 10, wherein the at least one processor is further configured to perform the depthwise convolution over the decompressed data with second filters and cause the at least one MAC circuit to generate a 3D depthwise output array, ([¶0009] "performing the depthwise convolution calculations according to the input feature map and the depthwise convolution kernels, to obtain intermediate feature values of the first predetermined number p of points on a second predetermined number m of depthwise convolution output channels")
	and then perform the pointwise convolution over the 3D depthwise output array with third filters and cause the at least one MAC circuit to generate the first convoluted cuboid. ([¶0104] "reading intermediate feature values (as intermediate feature values shown in the intermediate feature map (1) in FIG. 8) of the first predetermined number p of points on a third predetermined number of depthwise convolution output channels" [¶0106] "for different hardware designs, the number of pointwise convolution calculation units MAC′ may or may not be equal to the number of depthwise convolution calculation unit MAC."). 

Regarding claim 18, Chen teaches The integrated circuit according to claim 9, further comprising: a neural function unit coupled among the at least one processor, the at least one MAC circuit and the compressor and configured to apply a selected activation function to each element outputted from the at least MAC circuit. ([¶0209] "1. For the depthwise convolution, firstly, calculating the multiplication and accumulation results of p(p<=H*W) points and m(m<=M) channels, the accumulation here being the accumulation performed in the direction of the length and width of the convolution kernel, as R and S shown in FIG. 6, and p*m multiply-accumulate (MAC) units being shared here, and p*m multiply-accumulate results being obtained." [¶0211] "2. Performing an optional activation operation on the results of abovementioned step 1" [¶0211] "3. Performing an optional quantization operation on the results of the abovementioned step 2" Step 1-3 of Chen explicitly teaches a MAC unit configured to apply an activation function and a quantization (compression) in series.). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Chen and in view of Price (US10223611B1). 

	Regarding claim 3, Chen teaches The method according to claim 2.
	While Chen explicitly teaches that the network may be used for image recognition and detection, and while it would be obvious to one of ordinary skill in the art that said image would be a standard format image with multiple channels (such as RGB), Chen does not explicitly teach, the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.  

Price, in the same field of endeavor, teaches The method according to claim 2, wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal. ([Abstract] "The method comprises receiving image data for an image in a system comprising a convolutional neural network (CNN), the CNN comprising a first convolutional layer, a last convolutional layer, and a fully connected layer; providing the image data to an input of the first convolutional layer; extracting multi-channel data from the output of the last convolutional layer" Price explicitly teaches that the general image may have multiple channels which may be extracted through the CNN.). 

	Chen and Price are both directed towards using convolutional neural networks for image analysis.  Therefore, Chen and Price are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Chen with the teachings of Price by using a general image as input to the convolutional neural network. It would be obvious to one of ordinary skill in the art that said image would be a standard format image with multiple channels (such as RGB).  Such image formats are well known in the art.  Price, however, teaches that the object detection can be performed in a single pass which would add a level of efficiency improving upon the disclosure of Chen. 

Regarding claim 11, Chen teaches The integrated circuit according to claim 10.
	While Chen explicitly teaches that the network may be used for image recognition and detection, and while it would be obvious to one of ordinary skill in the art that said image would be a standard format image with multiple channels (such as RGB), Chen does not explicitly teach the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal.  

Price, in the same field of endeavor, teaches The integrated circuit according to claim 10, wherein the second input image is one of a general image with multiple channels and a spectrogram with a single channel derived from an audio signal. ([Abstract] "The method comprises receiving image data for an image in a system comprising a convolutional neural network (CNN), the CNN comprising a first convolutional layer, a last convolutional layer, and a fully connected layer; providing the image data to an input of the first convolutional layer; extracting multi-channel data from the output of the last convolutional layer" Price explicitly teaches that the general image may have multiple channels which may be extracted through the CNN.). 

	Chen and Price are both directed towards using convolutional neural networks for image analysis.  Therefore, Chen and Price are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Chen with the teachings of Price by using a general image as input to the convolutional neural network. It would be obvious to one of ordinary skill in the art that said image would be a standard format image with multiple channels (such as RGB).  Such image formats are well known in the art.  Price, however, teaches that the object detection can be performed in a single pass which would add a level of efficiency improving upon the disclosure of Chen.

	Claims 5-7, and 14-17 are rejected under 35 U.S.C. 103 as being unpatentable over Chen and in view of Cooper (“Huffman Coding Analysis of XOR Filtered Images”, 2015). 

	Regarding claim 5, Chen teaches The method according to claim 1, wherein step (c) further comprises: (c1) compressing the 3D pointwise output array into the second compressed segment ([¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through ( e) are performed.).
	However, Chen does not explicitly teach compressing the [3D pointwise] output array according to a row repetitive value compression (RRVC) scheme.  

Cooper, in the same field of endeavor, teaches compressing the [3D pointwise] output array according to a row repetitive value compression (RRVC) scheme. ([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.). 

	Chen and Cooper are both directed towards the field of image processing.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network of Chen with the image processing methods in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  

	Regarding claim 6, the combination of Chen, and Cooper teaches The method according to claim 5, wherein step (c1) further comprises: (1) dividing a target channel of the [3D pointwise] output array into multiple subarrays; (Cooper [p. 239 §IIA] "Fig. 1 : Illustration of the exclusive-or filter window for a sample image. The filter window is a height of 1 row and a width of m columns.  Figure 1 illustrates filter window, w, that is 1 row x m columns." Window interpreted as synonymous with subarray.)
	(2) forming a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray; (3) performing bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray; (Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi")
	(4) replacing non-zero (NZ) values in the result map with 1 (Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi" The only possible value in a binary XOR operation other than zero is one.)
	and fetching their corresponding original values from the target subarray to form a portion of the second compressed segment; (Cooper [p. 239 §IIA] "The value of the filter window is then updated to be the value of the first row of the original image, w = xi")
	(5) repeating steps (2) to (4) until all the subarrays for the target channel are processed; and (6) repeating steps (1) to (5) until all the channels of the [3D pointwise] output array are processed to form the second compressed segment. (Cooper [p. 239 §IIA] "The process repeats for each row of the image."). 

	Regarding claim 7, Chen teaches The method according to claim 1, wherein step (a) further comprises: (a1) decompressing the first compressed segment for the current cuboid to generate the decompression data ([¶0089] "According to the current design parameters of the convolutional layer, at least one of the following operations may be performed for each intermediate feature value after obtaining but before storing each intermediate feature value: activation operation and quantization operation." [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data. ").
	However, Chen does not explicitly teach to generate the decompression data  according to a row repetitive value decompression scheme.  

Cooper, in the same field of endeavor, teaches to generate the decompression data  according to a row repetitive value decompression scheme. ([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter...The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.  XOR unfiltered algorithm shown in FIG. 3 interpreted as row repetitive value decompression scheme.). 

	Chen and Cooper are both directed towards the field of image processing.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network of Chen with the image processing methods in Cooper by substituting the quantization compression technique in Chen with the exclusive-or decorrelated compression technique in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  The combination merely amounts to substituting a known compression technique with another. 

Regarding claim 14, Chen teaches The integrated circuit according to claim 10, wherein the compressor is further configured to compress each of the first and the second convoluted cuboids as a target cuboid to generate a corresponding compressed segment ([¶0122] "For example, at least one of an activation operation and a quantization operation may be performed on each output feature value before the final accumulation calculation results of the first predetermined number p of points are stored in the memory as output feature values of the first predetermined number p of points on the fourth predetermined number n of pointwise convolution output channels corresponding to the fourth predetermined number n of pointwise convolution kernels." See FIG. 8 and [¶0049] for how intermediate results are stored on a cuboid by cuboid basis.  Chen shows that the results are quantized (compressed) before being stored and passed to the next layer where steps (a) through ( e) are performed.).
	However, Chen does not explicitly teach to generate a corresponding compressed segment according to a row repetitive value compression (RRVC) scheme.  

Cooper, in the same field of endeavor, teaches to generate a corresponding compressed segment according to a row repetitive value compression (RRVC) scheme. ([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.). 

	Chen and Cooper are both directed towards the field of image processing.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network of Chen with the image processing methods in Cooper by substituting the quantization compression technique in Chen with the exclusive-or decorrelated compression technique in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  The combination merely amounts to substituting a known compression technique with another. 

	Regarding claim 15,  the combination of Chen and Cooper teaches The integrated circuit according to claim 14, wherein according to the RRVC scheme, the compressor is further configured to (1) divide a target channel of the target cuboid into multiple subarrays; (Cooper [p. 239 §IIA] "Fig. 1 : Illustration of the exclusive-or filter window for a sample image. The filter window is a height of 1 row and a width of m columns.  Figure 1 illustrates filter window, w, that is 1 row x m columns." Window interpreted as synonymous with subarray.).
	(2) form a reference row for a target subarray according to a first reference phase and multiple elements in row 1 of the target subarray; (Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi")
	(3) perform bitwise exclusive-OR (XOR) operations to generate a result map based on the reference row and the target subarray; (Cooper [p. 239 §IIA] "The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi. The results of this operation are stored as the first row of the filtered image, yi" The only possible value in a binary XOR operation other than zero is one.)
	(4) replace non-zero values in the result map with 1 and fetching their corresponding original values from the target subarray to form a portion of the corresponding compressed segment; (Cooper [p. 239 §IIA] "The value of the filter window is then updated to be the value of the first row of the original image, w = xi")
	(5) repeat steps (2) to (4) until all the subarrays for the target channel are processed; and (6) repeat steps (1) to (5) until all the channels of the target cuboid are processed to form the corresponding compressed segment. (Cooper [p. 239 §IIA] "The process repeats for each row of the image."). 

	Regarding claim 16, Chen teaches The integrated circuit according to claim 9, wherein the decompressor is further configured to decompress each compressed segment from the second internal memory ([¶0089] "According to the current design parameters of the convolutional layer, at least one of the following operations may be performed for each intermediate feature value after obtaining but before storing each intermediate feature value: activation operation and quantization operation." [¶0091] "Further, the quantization operation and inverse quantization operation may also be introduced to the calculation data. ")
	However, Chen does not explicitly teach to decompress each compressed segment from the [second internal] memory according to a row repetitive value decompression scheme.  

Cooper, in the same field of endeavor, teaches to decompress each compressed segment from the [second internal] memory according to a row repetitive value decompression scheme. ([p. 239 §IIA] "the decorrelation method, applies Huffman coding to an image after it has been de-correlated using the XOR filter...The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi" decorrelating row using a XOR filter interpreted as synonymous with row repetitive value compression.  XOR unfiltered algorithm shown in FIG. 3 interpreted as row repetitive value decompression scheme.). 

	Chen and Cooper are both directed towards the field of image processing.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network of Chen with the image processing methods in Cooper by substituting the quantization compression technique in Chen with the exclusive-or decorrelated compression technique in Cooper.  Chen explicitly teaches that image recognition and detection is the primary focus of the invention which is expected in a convolutional neural network accelerator.  Chen further teaches a quantization step as a fundamental step of the convolution process.  Therefore, it would be obvious to one of ordinary skill in the art that known image compression techniques could be advantageous in the neural network accelerator disclosed by Chen.  Chen teaches ([¶0092] “high-precision output data may be compressed into low precision output data by shifting or multiplication and division, such that the storage space occupied by each data in the memory is reduced and the access speed is fully improved.”).  One of ordinary skill in the art would recognize that quantization loss generally leads to loss of accuracy in neural network systems, and there is a fine balance between increasing performance through quantization and maintaining accuracy.  For this reason lossless compression offers an ideal solution.  Cooper teaches, as a motivation for combination, that in the field of image compression, the well-known Huffman coding technique can be improved by performing exclusive-or operations row-wise on the input image ([p. 238 §I] “Research shows that the performance of lossless image compression algorithms can be improved by including reversible decorrelation methods in preprocessing”).  As the output channels of the convolution layer are 2D matrices like the input image, it is obvious that the same compression technique could be beneficial at each step of the convolutional neural network.  The combination merely amounts to substituting a known compression technique with another. 

	Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chen, and Cooper and in further view of Juefei-Xu (“Local Binary Convolutional Neural Networks”, 2017).

	Regarding claim 8, the combination of Chen and Cooper teaches The method according to claim 7, (2) restoring NZ values in the target restored subarray according to the NZ bitmap and its corresponding original values; (Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set" Recovered image row interpreted as synonymous with restored NZ bitmap subarray. See also FIG. 3 algorithm.)
	(3) forming a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray; (Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi." See also FIG. 3 algorithm.)
	(4) writing zeros in a restored result map according to the locations of zeros in the [NZ bitmap]; (Cooper [p. 238 §I] "The authors of [6] aim to produce long runs of zeros and ones by applying neighboring bit-wise exclusive-or logic operations to the original data set as a transform." Original data set interpreted as restored result map.  Performing the inverse decorrelation filter interpreted as synonymous with restoring the result map.)
	(5) Fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, (Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi. The reversal process repeats for each row of the image to recover the original data set." Cooper teaches reading and writing the values of the target restored subarray in place.  It would be obvious to one of ordinary skill in the art to use a secondary blank array to store the values of the XOR operation and would lead to obvious and expected results.)
	the bitwise XOR operations over the target restored reference row and row 1 of the target restored subarray, and the bitwise XOR operations over any two adjacent rows of the target restored subarray; (Cooper [p. 239 §II] "Figure 1 illustrates filter window, w, that is 1 row x m columns. The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi" See FIG. 2 algorithm. All corresponding ith elements interpreted as adjacent row elements.)
	(6) repeating steps (1) to (5) until all the restored subarrays for the target channel are processed; and (7) repeating steps (1) to (6) until all the channels for the first compressed segment are processed to form the decompressed data. (Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set." See also FIG. 3 algorithm.). 

	However, the combination of Chen and Cooper does not explicitly teach (1) fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid;  

Juefei-Xu, in the same field of endeavor, teaches (1) fetching a NZ bitmap and its corresponding original values associated with a target restored subarray of a target channel in the first compressed segment for the current cuboid; ([p. 21 §3.1] "The input image xl is filtered by these LBC filters to generate m difference maps that are then activated through a non-linear activation function, resulting in m bit maps." Target restored subarray is an image row which is interpreted as being associated with the NZ bitmap.  Resulting in binary image bit map interpreted as synonymous with fetching a NZ bitmap.  See also FIG. 1 and 2.). 

	The combination of Chen and Cooper as well as Juefei-Xu are directed towards a convolutional neural network accelerator.  Therefore, the combination of Chen and Cooper as well as Juefei-Xu are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Chen and Cooper with the teachings of Juefei-Xu by using a binary image. Juefei-Xu teaches as motivation for combination ([p. 20 §I] “The idea of using binary filters for convolutional layers is not new… BinaryConnect alternates between binarized and real-valued weights during the network training process… These approaches lead to drastic improvement in run-time efficiency by replacing most 32-bit floating point multiply-accumulations by 1-bit XNOR-count operations”).  The usage of XNOR operations in Juefei-Xe is also seen as strengthening the combination of Chen and Cooper as the disclosure of Juefei-Xu is highly analogous to both arts.

	Regarding claim 17, the combination of Chen and Cooper teaches The integrated circuit according to claim 16, (1) [fetch a non-zero (NZ) bitmap] and its corresponding original values associated with a target restored subarray of a target channel in one compressed segment for a target cuboid; (Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set" Recovered image row interpreted as synonymous with restored NZ bitmap subarray. See also FIG. 3 algorithm.)
	(2) restore NZ values in the target restored subarray according to its corresponding original values; (Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi." See also FIG. 3 algorithm.)
	(3) form a restored reference row according to a second reference phase and multiple elements of row 1 in the target restored subarray; (Cooper [p. 238 §I] "The authors of [6] aim to produce long runs of zeros and ones by applying neighboring bit-wise exclusive-or logic operations to the original data set as a transform." Original data set interpreted as restored result map.  Performing the inverse decorrelation filter interpreted as synonymous with restoring the result map.)
	(4) write zeros in a restored result map according to the locations of zeros in the NZ bitmap; (Cooper [p. 239 §II] "The XOR filter is easily reversed without data loss using the same variables. The initial value of the filter window in reversal is a row of the same constant byte c. This row is bitwise XORed with the first row of the filtered image, yi. The value of the filter window is then updated to be the value of the newly recovered values of the current row, xi. The reversal process repeats for each row of the image to recover the original data set." Cooper teaches reading and writing the values of the target restored subarray in place.  It would be obvious to one of ordinary skill in the art to use a secondary blank array to store the values of the XOR operation and would lead to obvious and expected results.)
	(5) Fill in blanks row-by-row in the target restored subarray according to known elements in the restored reference row, in the target restored subarray and in the restored result map, the bitwise XOR operations over the restored reference row and row 1 of the target restored subarray, and the bitwise XOR operations over any two adjacent rows of the target restored subarray; (Cooper [p. 239 §II] "Figure 1 illustrates filter window, w, that is 1 row x m columns. The initial value of the filter window is equal to a row of a constant byte c. This row is bitwise XORed with the first row of the image, xi" See FIG. 2 algorithm. All corresponding ith elements interpreted as adjacent row elements.)
	(6) repeat steps (1) to (5) until all the restored subarrays for the target channel are processed; and (7) repeat steps (1) to (6) until all the channels of the compressed segment for the target cuboid are processed to form the decompressed data. (Cooper [p. 239 §II] "The reversal process repeats for each row of the image to recover the original data set." See also FIG. 3 algorithm.). 
	However, the combination of Chen and Cooper does not explicitly teach (1) [fetch a non-zero (NZ) bitmap] and its corresponding original values associated with a target restored subarray of a target channel in one compressed segment for a target cuboid

Juefei-Xu, in the same field of endeavor, teaches (1) [fetch a non-zero (NZ) bitmap] and its corresponding original values associated with a target restored subarray of a target channel in one compressed segment for a target cuboid ([p. 21 §3.1] "The input image xl is filtered by these LBC filters to generate m difference maps that are then activated through a non-linear activation function, resulting in m bit maps." Target restored subarray is an image row which is interpreted as being associated with the NZ bitmap.  Resulting in binary image bit map interpreted as synonymous with fetching a NZ bitmap.  See also FIG. 1 and 2.). 

	The combination of Chen and Cooper as well as Juefei-Xu are directed towards a convolutional neural network accelerator.  Therefore, the combination of Chen and Cooper as well as Juefei-Xu are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Chen and Cooper with the teachings of Juefei-Xu by using a binary image. Juefei-Xu teaches as motivation for combination ([p. 20 §I] “The idea of using binary filters for convolutional layers is not new… BinaryConnect alternates between binarized and real-valued weights during the network training process… These approaches lead to drastic improvement in run-time efficiency by replacing most 32-bit floating point multiply-accumulations by 1-bit XNOR-count operations”).  The usage of XNOR operations in Juefei-Xe is also seen as strengthening the combination of Chen and Cooper as the disclosure of Juefei-Xu is highly analogous to both arts.

	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and in view of Raju (WO2018217965A1). 

	Regarding claim 13, Chen teaches The integrated circuit according to claim 12, further comprising: a flash memory ([¶0228] "The memory 12 may comprise one or more computer program products which may comprise various forms of computer readable and writable storage media, such as a volatile memory and/or a non-volatile memory. The volatile memory may comprise, for example, a random access memory (RAM) and/or a cache, etc. The non-volatile memory may comprise, for example, a read only memory (ROM), a hard disk, a flash memory, etc.").
	However, Chen does not explicitly teach a flash memory for pre-storing coefficients forming the first, the second and the third filters; 
	wherein the at least one processor is further configured to read corresponding coefficients from the flash memory and temporarily store them in the first internal memory prior to the regular convolution and the cuboid convolution.  

Raju, in the same field of endeavor, teaches a flash memory for pre-storing coefficients forming the first, the second and the third filters; ([¶0021]"filter coefficients (or weights) that may be used on the CNN algorithm may be encrypted and stored at memories that are external to the SoC device 104, such as external flash 208 and/or external memory 210")
	wherein the at least one processor is further configured to read corresponding coefficients from the flash memory and temporarily store them in the first internal memory prior to the regular convolution and the cuboid convolution. ([¶0024] "At any time during the signal processing, the decrypted weights, the decrypted inputs, and the encrypted outputs may not be available to the external memories (i.e., external flash 208 and external memory 210), in order to prevent exposure to malicious attacks" [¶0025] " the CNN HW engine 200 may implement parallel execution of convolutions of the decrypted inputs and weights, and supply the output back to the secure IP block 202 to form an encrypted output." Raju explicitly teaches that the weights may only be stored on flash prior to decryption and convolution.). 

	Chen and Raju are both directed towards convolutional neural network accelerators.  Therefore, Chen and Raju are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Chen with the teachings of Raju by using a flash memory to store the filter weights. Raju explicitly teaches as a motivation for combination that the neural network filter coefficients may be encrypted and stored on an external memory device such as flash, and then decrypted on a secure local system.  Raju teaches as a motivation for combination ([¶0025] “Accordingly, the CNN HW engine 200 may be configured to retrieve and use directly the decrypted weights and decrypted input through a hardware concurrent parallel execution of security engines for hidden layers during the signal processing.”).   

	Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and in view of Coenen (US20190340493A1). 

	Regarding claim 19, Chen teaches The integrated circuit according to claim 18
	However, Chen does not explicitly teach, wherein the neural function unit comprises: an adder for adding each element from the at least MAC circuit with a bias value to generate a biased element; 
	a number Q of activation function lookup tables coupled to the adder for outputting Q activation values according to the biased element; and 
	a multiplexer coupled to the compressor and the output terminals of the number Q of activation function lookup tables for selecting one of the Q activation values as an output element.  

Coenen, in the same field of endeavor, teaches The integrated circuit according to claim 18, wherein the neural function unit comprises: an adder for adding each element from the at least MAC circuit with a bias value to generate a biased element; ([¶0041] "there is a look up table (LUT) 601 that implements the activation function and an adder 602 for computing the bias." See FIG. 6 which shows neuron unit comprising MAC units.)
	a number Q of activation function lookup tables coupled to the adder for outputting Q activation values according to the biased element; and ([¶0041] "there is a look up table (LUT) 601 that implements the activation function and an adder 602 for computing the bias.")
	a multiplexer coupled to the compressor and the output terminals of the number Q of activation function lookup tables for selecting one of the Q activation values as an output element. (See FIG. 6 showing the compressor connected to the activation function unit which is explicitly implemented with a LUT. [¶0041] " The application of the activation function is configurable by selecting on of the inputs to a multiplexor 610"). 

	Chen and Coenen are both directed towards neural network accelerators.  Therefore, Chen and Coenen are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Chen with the teachings of Coenen by using a bias as well as a lookup table for the activation functions. Coenen teaches as a motivation for combination of the lookup table circuit ([¶0041] “This implementation of the accelerator allows for software to control the neural network processing and either hardware or software to apply the activation function.”).   

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Howard (US20180137406A1) is directed towards a convolutional neural network accelerator focusing predominantly on 3D CNNs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        


/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126