DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Regarding applicant’s remarks on pages 18-19 regarding independent claim 1:
	The rejection acknowledged that Zhang does not teach or suggest the clamed method
steps A) and B).
Firstly, the rejection contended that page 2, left column, paragraph 2 of Simpson teaches the claimed method step A). Applicant respectfully disagrees. Page 2, left column, paragraph 2 of Simpson discloses Parallel dither and dropout. 
That is, the operations in Simpson involves applying dithering algorithm/technique to 100 duplicate datasets independently, and then averaging the results. In contrast, the claimed method step A) recites, in part, generating a number of dithering algorithms from the set of the number of dithering algorithms. Simpson does not teach or suggest generating a number of dithering algorithms from a set of dithering algorithms.
Examiner’s response:
	Under the broadest reasonable interpretation, generating a number of dithering algorithms is not being interpreted as the literal creation or generation of new algorithms. But rather, selecting a predetermine number of algorithms from a pool of available/possible dithering algorithms. 
Regarding applicant’s remarks on page 19 regarding independent claim 1 step A:
Without conceding the correctness of the contention, to facilitate the prosecution, claim 1 is amended to recite the number (Z) is equal to 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 That is, the claimed method step A) recites, in part, generating a number (
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
) of Y-combinations of dithering algorithms from the set of the number (X) of dithering algorithms, each of the Y-combinations including a number (Y) of dithering algorithms, where 1 <= Y <= (X-1). Support can be found in, for example, paragraph [0040] of the published application. That is, when X=2 and Y=1, Z equals 2 and two Y-combinations of dithering algorithms are generated. Simpson does not teach or suggest the claimed method step A).
Examiner’s response:
	Applicants arguments regarding step A and the combination of 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
are moot on grounds of new rejection necessitated by the amendment. 
Regarding applicant’s remarks on page 20 regarding independent claim 1 step B:
The rejection contended that "any operation that reduces the size of data will inherently reduce the number of bits it takes to represent the data". See the second paragraph on page 6 of the Office Action. However, since the number (Z) is equal to 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
, the method step B recites, in part, for each of the number (
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
) of Y-combinations of dithering algorithms ... so as to obtain ... a number (
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
) of size-reduced data groups .... As such, even if the contention is true, which Applicant does not concede, Simpson does not teach or suggest the features in claimed method step B).
Examiner’s response: 
Applicants arguments regarding step B and the combination of 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 are moot on grounds of new rejection necessitated by the amendment.
Regarding applicant’s remarks on page 20-21 regarding independent claim 1 step C:
	The rejection contended that under the broadest reasonable interpretation, the claims are
describing training a neural network. Applicant respectfully disagrees. First of all, the above paragraph of Zhang does not even teach or suggest training a neural network. In addition, the claimed method step C) does not describe the general concept of training a neural network, but specifically recites detailed steps of performing a number (Z) of training operations on the DNN using the number (Z) of size-reduced data groups, respectively, so as to generate, for each of the number (Z) of training operations, a DNN model, a training result of the training operation, and a steady deviation between the training result and a predetermined expectation.
Examiner’s response:
	Examiner notes that prior to the amendment of the number Z = 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
, under the broadest reasonable interpretation, step C was seen as a general training of a neural network. Furthermore, Zhang teaches that parameters of neural networks are investigated and modified by altering number of hidden layers, number of neurons, and size of training sets until an optimal parameter is found (Zhang, page 252, left column paragraphs 1-3 starting with “In this section…”). The neural networks are modified until an optimal network is found based on the given the training sets. Examiner further notes that any implications of the number Z = 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 are moot on grounds of new rejection necessitated by the amendment.
Regarding applicant’s remarks on page 21 regarding independent claim 1 step C:
Furthermore, since the number (Z) is equal to 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
, the method step C) recites, in part, performing a number (
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
) of training operations ... using the number 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 of size-reduced data groups ... to generate, for each of the number (
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
) of training operations ...
Examiner’s response:
Applicants arguments regarding step C and the number 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 are moot on grounds of new rejection necessitated by the amendment.
Regarding applicant’s remarks on page 21-22 regarding independent claim 1 step D:
That is, the above paragraphs of Zhang disclose selecting one of the networks with a best
performance. However, nowhere in Zhang or in the above paragraphs of Zhang teaches or
suggest the claimed method step D) selecting one of the number (Z) of Y-combinations of
dithering algorithms corresponding to the size-reduced data group that results in the training
result with the smallest steady deviation as a filter module, and selecting the corresponding DNN
model as the data-recognition model, let alone that the selection is from one of the 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 (the
number Z) of Y-combinations of dithering algorithms.
Examiner’s response:	Examiner notes that under the broadest reasonable interpretation, having a smallest steady deviation, is one possible way to determine consistency. While Zhang uses the term “performance” in section 5.2, page 252, which could be construed to mean time efficiency, etc., it is clear from 5.2, pages 252-253, right column paragraph 2 starting with “The size…” and cited Figure 10 that they are using accuracy to determine performance. Particularly, as seen in the case of PCCR on different training sets, as performance, or accuracy, is maximized, deviations decrease. 
	Examiner further notes that arguments regarding the combination 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 are moot on grounds of new rejection necessitated by amendment.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Where applicant acts as his or her own lexicographer to specifically define a term of a claim contrary to its ordinary meaning, the written description must clearly redefine the claim term and set forth the uncommon definition so as to put one reasonably skilled in the art on notice that the applicant intended to so redefine that claim term. Process Control Corp. v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. Cir. 1999). The term “
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
” in claims 1, 17 is used by the claim in an unconventional notation used to define “combination,” which is typically denoted with the notations: C(x,y), xCy, or 
    PNG
    media_image2.png
    33
    29
    media_image2.png
    Greyscale
. The term 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 is indefinite because the specification does not clearly define the term. While the specification in paragraph [0040] gives an example of 
    PNG
    media_image3.png
    24
    63
    media_image3.png
    Greyscale
, it is not apparent or clear what formulas or calculations were carried out to arrive at 36. For purposes of examination, 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 is taken to mean the mathematical combination of xCy or any equivalents.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 3, 4, 17, 18, 19, 20, 37, 38 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Yan, Erhu Zhang, and Wanjun Chen. "Deep neural network for halftone image classification based on sparse auto-encoder." Engineering Applications of Artificial Intelligence 50 (2016): 245-255 [hereinafter Zhang] in view of Simpson, Andrew JR. "Parallel Dither and Dropout for Regularising Deep Neural Networks." arXiv preprint arXiv:1508.07130 (2015) [hereinafter Simpson] further in view of tecmath, “Combinations made easy” Youtube (2017).
Regarding claim 1, Zhang teaches a method for establishing a data-recognition model, the method being implemented by a computer system that stores a deep neural network (DNN) (Zhang; 1 Introduction, page 245, left column paragraph 1, starting with “Digital halftoning…”; Technique taught by Zhang is widely used in computer systems such as the human visual system, desktop publishing systems, etc.) and
C) performing a number (Z) of training operations 20on the DNN using the number (Z) of size-reduced data groups, respectively, so as to generate, for each of the number (Z) of training operations, a DNN model, a training result of the training operation, and a steady deviation between the training result and a 25predetermined expectation; (Zhang; 5 Experimental Results, page 251, left column, paragraph 4, starting with “The performance of the…”; Performance of proposed algorithms are examined through extensive experiment including changing parameter settings, classification accuracy, and time cost of the deep neural network.) and
Examiner notes that under the broadest reasonable interpretation, the claims are describing training a neural network.
D) selecting one of the number (Z) of Y-combinations of dithering algorithms corresponding to the 37 size-reduced data group that results in the training result with the smallest steady deviation as a filter module, and selecting the corresponding DNN model as the data-recognition model (Zhang; 5.2 Deep neural network parameter decision, page 252, left column paragraph 2-right column, paragraph 1, starting with “The number of layers…”; After D_net1 has been recognized as having the best performance, it is then used for other data sets.).
Examiner notes that the under the broadest reasonable interpretation, the DNN model with the most desired trait (smallest steady deviation) is selected. In Zhang, D_net1 was selected as the model, and is used further on other datasets. 
Zhang does not explicitly teach and a set of a number (X) of dithering algorithms, where X >= 2, the method comprising steps of: A) generating a number (Z) of Y-combinations of dithering algorithms from the set of the number (X) of dithering algorithms, each of the Y-combinations including a number (Y) of dithering algorithms, where 1 <= Y <= (X-1); B) for each of the number (Z) of Y-combinations of dithering algorithms, using the number (Y) of dithering algorithms of the Y-combination to perform a dithering 15operation on a to-be-processed data group represented in (a) number of bit(s), so as to obtain, in total, a number (Z) of size-reduced data groups each being represented in (b) number of bit (s) , where 1 <= b <= (a-1);
Simpson teaches and a set of a number (X) of dithering algorithms, where X >= 2, the method comprising steps of:
A) generating a number (Z) of Y-combinations of dithering algorithms from the set of the number (X) of dithering algorithms, each of the Y-combinations including a number (Y) of dithering algorithms, where 1 <= Y <= (X-1) (Simpson; II Method; page 2, left column, paragraph 2, starting with “Parallel dither and dropout…”; Simpson teaches that dithering algorithms/techniques are applied to 100 duplicate datasets independently, and then the results are then averaged.);
Examiner notes that this maps to the broadest reasonable interpretation of the claims. Simpson is in essence teaching the use of multiple dithering algorithms. Averaging the results of multiple dithering algorithms on duplicate datasets takes into account the effect of all dithering algorithms used, and maps to the combination of dithering algorithms. Furthermore, under the broadest reasonable interpretation, if X = 2, which satisfies “X >= 2,” then Y will always be 1, thus in actuality, even the use of one dithering algorithm would map to the claims.
B) for each of the number (Z) of Y-combinations of dithering algorithms, using the number (Y) of dithering algorithms of the Y-combination to perform a dithering 15operation on a to-be-processed data group represented in (a) number of bit(s), so as to obtain, in total, a number (Z) of size-reduced data groups each being represented in (b) number of bit (s) , where 1 <= b <= (a-1) (Simpson; II Method; page 2, left column, paragraph 2, starting with “Parallel dither and dropout…”; Simpson teaches that dithering algorithms/techniques are applied to 100 duplicate datasets independently, and then the results are then averaged.);
Examiner notes that as previously explained, the parallel dithering that Simpson teaches maps to the combination of dithering algorithms. Examiner further notes that any operation that reduces the size of data will inherently reduce the number of bits it takes to represent the data as well.
It would have been obvious before the effective filing date to modify the teachings of Zhang and combine it with Simpson because Simpson’s method of parallel dither and regularization has substantially better results than what is possible with batch-SGD (Simpson; Abstract, page 1).
	Neither Zhang or Simpson teach: the number (Z) is equal to 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
.
	Tecmath teaches the number (Z) is equal to 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
. As previously explained, for purposes of examining, 
    PNG
    media_image1.png
    25
    19
    media_image1.png
    Greyscale
 is seen as a Combination in the form of xCy. Tecmath teaches that such notations can be calculated by: x! / ((x-y)!*y!) which results in a number (See timestamp 4:59 or screenshot 1 for formula, and timestamp 6:57 or screenshot 2 for example).
	It would have been obvious before the effective filing date to use a combination to calculate a number to use in the selection of number of algorithms from a pool of available algorithms. In fact, combinations are often used in the context of selecting a subset of items or numbers from a larger group (See timestamp 6:57 or screenshot 2 for example).
Regarding claim 2, Simpson teaches the method of Claim 1, wherein:
step A) includes generating a number (Z) of 2-combinations of dithering algorithms from the set of the number (X) of dithering algorithms (Simpson; II Method; page 2, left column, paragraph 2, starting with “Parallel dither and dropout…”; The dithering is applied to 100 datasets and then the results are averaged.);
Examiner notes that under the broadest reasonable interpretation, Z could be any number, as no strict limit has been set on variable Z. Furthermore, according to Simpson, the dithering is applied independently.
step B) includes, for each of the number (Z) of 102-combinations of dithering algorithms, performing a dithering operation on the to-be-processed data group using one of the dithering algorithms in the 2-combination, so as to obtain a first part of the size-reduced data group being represented 15in (m) number of bit(s) (Simpson; II Method; page 2, left column, paragraph 2, starting with “Parallel dither and dropout…”; The elements are dithered independently and then the parallel set of gradients are averaged.),
Examiner notes that each of the elements are dithered independently. Under the broadest reasonable interpretation, obtaining the first part of the size-reduced data group would be one of the dithered elements before they are averaged. If the same dithering algorithm was applied to the first 50 datasets, then that would be the first group.
performing a dithering operation on the to-be-processed data group using another one of the dithering algorithms in the 2-combination, so as to obtain a second part of the size-reduced data group 20being represented in (n) number of bit(s) (Simpson; II Method; page 2, left column, paragraph 2, starting with “Parallel dither and dropout…”; The elements are dithered independently and then the parallel set of gradients are averaged.),
Examiner notes that each of the elements are dithered independently. Under the broadest reasonable interpretation, obtaining the second part of the size-reduced data group would be another one of the dithered elements before they are averaged. If another dithering algorithm was applied to the second 50 datasets, then that would be the second group.
where m+n=b, and combining the first part and the second part to obtain the size-reduced data group (Simpson; II Method; page 2, left column, paragraph 2, starting with “Parallel dither and dropout…”; Each parallel set of gradients averaged and applied.).
Examiner notes that under the broadest reasonable interpretation, averaging groups of dithered datasets is a form of combination. As previously explained if the datasets were originally separated into a first 50 sets, and then a second 50, then the two sets combined would equal the total of 100 that Simpson teaches.
The motivation to combine Zhang with Simpson are the same motivation previously used for the claim 1 rejection.
Regarding claim 3, Zhang teaches the method of Claim 1, wherein in step C), each of 25the training operations includes the sub-steps of:
C1) inputting one of the size-reduced data groups into the DNN, so as to obtain the training result (Zhang; 5 Experimental Results, page 251, left column, paragraph 4-right column, paragraph 2, starting with “The performance of…”; Two public datasets are used to evaluate the performance by modifying parameter settings, classification, accuracy, etc.);
Examiner notes that Zhang teaches inputting training sets into the neural network to evaluate performance.
C2) comparing the training result with the predetermined expectation so as to obtain a deviation between the training result and the predetermined expectation as an intermediate deviation C3) inputting the intermediate deviation into the DNN; (Zhang; 5 Experimental Results, page 251, right column, paragraph 3, starting with “In order to evaluate the performance…”; Functions are given to calculate, where the number of image patches classified correctly is divided by the total number of image patches in each class to obtain a correct classification rate. Correct classification rate formulas are given that determine how often the correct classifications occur. The neural network classifier is run N times with data sets of halftone images.);
Examiner notes that under the broadest reasonable interpretation, the claims are describing training a neural network with training sets.
C4) repeating sub-steps C1) to C3) until the intermediate deviation currently obtained in sub-step C1) does not vary with respect to the intermediate 10deviation obtained in a previous iteration of sub-step C1); (Zhang; 5.2 Deep neural network parameter decision, page 252, left column paragraph 2-right column, paragraph 1, starting with “The number of layers…”; The network is trained up to 600 times, until one with superior performance is obvious.) and
Examiner notes that under the broadest reasonable interpretation, the claims are training a neural network with training sets until one network and format with the best performance, measured in terms of deviation, is found. Zhang teaches that networks are trained up to 600 times. Finally, one network format (D_net1 with 296 neurons) has obviously superior performance.
C5) after sub-step C4), outputting the DNN as the DNN model and the intermediate deviation currently obtained in sub-step C1) as the steady deviation (Zhang; 5.2 Deep neural network parameter decision, page 252, right column, paragraph2, starting with “The size of training set…”; After determining the ideal neural network format and parameters, the selected network is used on other datasets.).
Examiner notes that once Zhang has found the ideal neural network format, it is then used to additional, different datasets.
Regarding claim 4, Zhang teaches the method of Claim 2, wherein in step C), each of 25the training operations includes the sub-steps of:
C1) inputting one of the size-reduced data groups into the DNN, so as to obtain the training result (Zhang; 5 Experimental Results, page 251, left column, paragraph 4-right column, paragraph 2, starting with “The performance of…”; Two public datasets are used to evaluate the performance by modifying parameter settings, classification, accuracy, etc.);
Examiner notes that Zhang teaches inputting training sets into the neural network to evaluate performance.
C2) comparing the training result with the predetermined expectation so as to obtain a deviation between the training result and the predetermined expectation as an intermediate deviation C3) inputting the intermediate deviation into the DNN; (Zhang; 5 Experimental Results, page 251, right column, paragraph 3, starting with “In order to evaluate the performance…”; Functions are given to calculate, where the number of image patches classified correctly is divided by the total number of image patches in each class to obtain a correct classification rate. Correct classification rate formulas are given that determine how often the correct classifications occur. The neural network classifier is run N times with data sets of halftone images.);
Examiner notes that under the broadest reasonable interpretation, the claims are describing training a neural network with training sets.
C4) repeating sub-steps C1) to C3) until the intermediate deviation currently obtained in sub-step C1) does not vary with respect to the intermediate 10deviation obtained in a previous iteration of sub-step C1); (Zhang; 5.2 Deep neural network parameter decision, page 252, left column paragraph 2-right column, paragraph 1, starting with “The number of layers…”; The network is trained up to 600 times, until one with superior performance is obvious.) and
Examiner notes that under the broadest reasonable interpretation, the claims are training a neural network with training sets until one network and format with the best performance, measured in terms of deviation, is found. Zhang teaches that networks are trained up to 600 times. Finally, one network format (D_net1 with 296 neurons) has obviously superior performance.
C5) after sub-step C4), outputting the DNN as the DNN model and the intermediate deviation currently obtained in sub-step C1) as the steady deviation (Zhang; 5.2 Deep neural network parameter decision, page 252, right column, paragraph2, starting with “The size of training set…”; After determining the ideal neural network format and parameters, the selected network is used on other datasets.).
Examiner notes that once Zhang has found the ideal neural network format, it is then used to additional, different datasets.
Regarding claim 17, Zhang in view of Simpson further in view of Tecmath [hereinafter Zhang-Simpson-Tecmath] teaches all the limitations and motivations of claim 1 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 1 applies equally as well to those elements of claim 17. The claims additionally recite a computer system comprising a data storage, a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivation to combine Zhang with Simpson are the same motivation previously used for the claim 1 rejection.
Regarding claim 18, Zhang-Simpson-Tecmath teaches all the limitations and motivations of claim 2 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 2 applies equally as well to those elements of claim 18. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivation to combine Zhang with Simpson are the same motivation previously used for the claim 1 rejection.
Regarding claim 19, Zhang-Simpson-Tecmath teaches all the limitations and motivations of claim 3 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 3 applies equally as well to those elements of claim 19. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
Regarding claim 20, Zhang-Simpson-Tecmath teaches all the limitations and motivations of claim 4 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 4 applies equally as well to those elements of claim 20. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
Regarding claim 37, Zhang teaches the computer system of Claim 17, wherein, said data storage (1) and said processor (2) are integrated in a computer device (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…” ; The networks were tested on the testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM).
Examiner notes that the hardware system described by Zhang must inherently have data storage and a processor integrated in the device.
Regarding claim 38, Zhang teaches the computer system of Claim 17, wherein said data storage (1) is coupled to said processor (2) using a wired connection or a wireless connection (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…” ; The networks were tested on the testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM).

Claims 5, 7, 8, 9, 21, 23, 24, 25, 26, 27, 29, 30 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang-Simpson-Tecmath further in view of Molchanov, Pavlo, et al. "Pruning convolutional neural networks for resource efficient inference." arXiv preprint arXiv:1611.06440 (2016) [hereinafter Molchanov].
Regarding claim 5, Zhang teaches the method of Claim 1, wherein:
the data-recognition model includes a hidden layer set including a plurality of hidden layers that are 10connected in series, each of the plurality of hidden layers including a plurality of neurons, each of the neurons having a weight matrix containing a plurality of weights; (Zhang; Figure 3, 5.2 Deep neural network decision; The base system flowchart has hidden layer with neurons connected in a series… one of the factors for finding the optimal neural network format is by altering the number of hidden layers. For example, networks with two hidden layers or three hidden layers are tested with 50, 100, and 196 neurons.) and
The examiner notes that all neural networks have at least one neural network. Furthermore, by definition, deep neural networks have multiple layers of hidden neural networks. The claims and Zhang are both referring to deep neural networks. Furthermore, under the broadest reasonable interpretation, hidden layers connected in series can mean the layers are connected in some sort of series, which they must be, since they are layers within the same deep neural network.
the method further comprises E) for each of the neurons included in the plurality of hidden layers, calculating an average of the plurality of weights of the neuron, and a variance of the plurality of weights (Zhang; 4.1.1 Single-layer sparse auto-encoder; page 249, right column, paragraph 3 starting with “As shown in Fig. 5…”; Weight matrix is encoded and is connected to the code and the reconstructed data.),
Examiner notes that regularization terms on weight are added can be added in the auto-encoder stage of the neural network.
when it is determined that a ratio between the average of the plurality of weights and the variance of the plurality of weights is larger than a predetermined threshold, keeping the weight matrix unchanged (Zhang; 4.1.1 Single-layer sparse auto-encoder; page 249, right column, paragraph 3 starting with “As shown in Fig. 5…”; Weight matrix is encoded and is connected to the code and the reconstructed data.),
when it is determined that the ratio between the average of the plurality of weights and the variance of the plurality of weights is not larger than the predetermined threshold, calculating a reference value for the neuron based on the average of the plurality of weights (Zhang; 4.1.1 Single-layer sparse auto-encoder; page 249, right column, paragraph 3 starting with “As shown in Fig. 5…”; Regularization terms on weight are added can be added in the auto-encoder stage of the neural network.),
Examiner notes that Zhang teaches that regularization is used to make tradeoffs if there is a difference between the reconstruction objective and weight decay. It is activated if needed. Ρ acts as the threshold needed to activate.
F) after step E), applying at least one golden sample to the data-recognition model, so as to obtain a plurality of outputs respectively from the neurons in the data-recognition model, (Zhang; 5.2 Deep neural network parameter decision, pages 252-253, right column paragraphs 1-2 starting with “Selection of the number of hidden neurons…”; After determining D_net1 is the ideal neural network model, it is trained with additional training sets.) and
Examiner notes that under the broadest reasonable interpretation, golden sample can refer to a training data set where the inputs and outputs are both known.
Zhang does not explicitly teach when it is determined that the reference value is substantially equal to zero, deleting the neuron from the data-recognition model, and when it is determined that the reference value is substantially not equal to zero, substituting the reference value for the weight matrix of the neuron; and for each of the neurons included in the plurality of hidden layers and having the weight matrix, deleting the neuron from the data-recognition model when it is determined that the output of the neuron is substantially equal to zero and retaining the neuron when otherwise.
Molchanov teaches:
when it is determined that the reference value is substantially equal to zero, deleting the neuron from the data-recognition model, (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; The least important neuron is removed or pruned.) and
Under the broadest reasonable interpretation, a reference value of zero if the reference value, which is based on the average of the plurality of weights, is zero, then it essentially has no importance. Molchanov teaches that neurons with no importance are pruned.
when it is determined that the reference value is substantially not equal to zero, substituting the reference value for the weight matrix of the neuron; (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; Neurons that are not the least important are kept and not pruned.) and
Under the broadest reasonable interpretation, the reference value is derived from the average of the weight matrix of the neuron. Furthermore, if the reference value is not 0, then the neuron is still kept.
for each of the neurons included in the plurality of hidden layers and having the weight matrix, deleting the neuron from the data-recognition model when it is determined that the output of the neuron is substantially equal to zero and retaining the neuron when otherwise (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; The least important neuron is removed or pruned, the others are not.).
It would have been obvious before the effective filing date to modify the teachings of Zhang-Simpson-Tecmath and combine it with the teachings of Molchanov because Molchanov’s teaches enable efficient interference and demonstrates superior performance compared to other criteria (Molchanov, abstract, page 1). 

Regarding claim 7, Zhang teaches the method of Claim 5, wherein step F) includes: applying the golden sample to a first hidden layer in the plurality of hidden layers of the  data-recognition model so as to obtain a plurality of outputs respectively from the neurons in the first hidden layer, each of the outputs including a plurality of output values (Zhang; 5.2 Deep neural network parameter decision, pages 253-254, left column paragraphs 1-4, starting with “In this section…”; Zhang teaches that training data sets are used after they have identified the ideal model (D_net1) with 296 hidden layer neurons. Output is classification of groups of images, each group containing multiple images.);
Examiner notes that as previously stated, the golden sample can also be referred to as a training set, where the desired outcome is already known.
applying the outputs from the neurons in a preceding hidden layer of the plurality of hidden layers to a succeeding hidden layer of the plurality of hidden layers that is connected to and succeeds the preceding hidden layer so as to obtain a plurality of outputs from the neurons in the succeeding hidden layer (Zhang; 5.2 Deep neural network parameter decision, pages 253-254, left column paragraphs 1-4, starting with “In this section…”; D_net1 with 296 hidden neurons is trained, using set T, ten groups testing images drawn from testing sets EA and EB.);
Examiner notes that what is being recited in the claims is inherent to all neural networks with hidden layers. Outputs of neurons in a preceding layer must go on to neurons in the succeeding layer as long as they are still connected.
Zhang does not explicitly teach and for each of the neurons, deleting the neuron from the data-recognition model when it is determined that the output values of the outputs from the neuron are all substantially equal to zero, and retaining the neuron when otherwise.
Molchanov teaches and for each of the neurons, deleting the neuron from the data-recognition model when it is determined that the output values of the outputs from the neuron are all substantially equal to zero, and retaining the neuron when otherwise (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; The least important neuron is removed or pruned.).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.
Regarding claim 8, the method of Claim 1, wherein:
the data-recognition model includes a hidden layer set including a plurality of hidden layers that are connected in series, each of the plurality of hidden layers including a plurality of neurons, each of the neurons having a weight matrix containing a plurality of weights; (Zhang; Figure 3, 5.2 Deep neural network decision; The base system flowchart has hidden layer with neurons connected in a series… one of the factors for finding the optimal neural network format is by altering the number of hidden layers. For example, networks with two hidden layers or three hidden layers are tested with 50, 100, and 196 neurons.) and
The examiner notes that all neural networks have at least one neural network. Furthermore, by definition, deep neural networks have multiple layers of hidden neural networks. The claims and Zhang are both referring to deep neural networks. Furthermore, under the broadest reasonable interpretation, hidden layers connected in series can mean the layers are connected in some sort of series, which they must be, since they are layers within the same deep neural network.  
The method further comprises G) applying at least one golden sample to the data-recognition model, so as to obtain a plurality of outputs respectively from the neurons in the data-recognition model, (Zhang; 5.2 Deep neural network parameter decision, pages 252-253, right column paragraphs 1-2 starting with “Selection of the number of hidden neurons…”; After determining D_net1 is the ideal neural network model, it is trained with additional training sets.) and
Examiner notes that under the broadest reasonable interpretation, golden sample can refer to a training data set where the inputs and outputs are both known.
F) for each of the neurons included in the plurality of hidden layers, calculating an average of the plurality of weights of the neuron, and a variance of the plurality of weights (Zhang; 4.1.1 Single-layer sparse auto-encoder; page 249, right column, paragraph 3 starting with “As shown in Fig. 5…”; Weight matrix is encoded and is connected to the code and the reconstructed data.),
Examiner notes that regularization terms on weight are added can be added in the auto-encoder stage of the neural network.
when it is determined that a ratio between the average of the plurality of weights and the variance of the plurality of weights is larger than a predetermined threshold, keeping the weight matrix unchanged (Zhang; 4.1.1 Single-layer sparse auto-encoder; page 249, right column, paragraph 3 starting with “As shown in Fig. 5…”; Weight matrix is encoded and is connected to the code and the reconstructed data.),
when it is determined that the ratio between the average of the plurality of weights and the variance of the plurality of weights is not larger than the predetermined threshold, calculating a reference value for the neuron based on the average of the plurality of weights (Zhang; 4.1.1 Single-layer sparse auto-encoder; page 249, right column, paragraph 3 starting with “As shown in Fig. 5…”; Regularization terms on weight are added can be added in the auto-encoder stage of the neural network.),
Examiner notes that Zhang teaches that regularization is used to make tradeoffs if there is a difference between the reconstruction objective and weight decay. It is activated if needed. Ρ acts as the threshold needed to activate.
Zhang does not explicitly teach when it is determined that the reference value is substantially equal to zero, deleting the neuron from the data-recognition model, and when it is determined that the reference value is substantially not equal to zero, substituting the reference value for the weight matrix of the neuron; and for each of the neurons included in the plurality of hidden layers and having the weight matrix, deleting the neuron from the data-recognition model when it is determined that the output of the neuron is substantially equal to zero and retaining the neuron when otherwise.
Molchanov teaches:
when it is determined that the reference value is substantially equal to zero, deleting the neuron from the data-recognition model, (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; The least important neuron is removed or pruned.) and
Under the broadest reasonable interpretation, a reference value of zero if the reference value, which is based on the average of the plurality of weights, is zero, then it essentially has no importance. Molchanov teaches that neurons with no importance are pruned.
when it is determined that the reference value is substantially not equal to zero, substituting the reference value for the weight matrix of the neuron; (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; Neurons that are not the least important are kept and not pruned.) and
Under the broadest reasonable interpretation, the reference value is derived from the average of the weight matrix of the neuron. Furthermore, if the reference value is not 0, then the neuron is still kept.
for each of the neurons included in the plurality of hidden layers and having the weight matrix, deleting the neuron from the data-recognition model when it is determined that the output of the neuron is substantially equal to zero and retaining the neuron when otherwise (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; The least important neuron is removed or pruned, the others are not.).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.

Regarding claim 9, Zhang teaches the method of Claim 8, wherein step G) includes: applying the golden sample to a first hidden layer in the plurality of hidden layers of the  data-recognition model so as to obtain a plurality of outputs respectively from the neurons in the first hidden layer, each of the outputs including a plurality of output values (Zhang; 5.2 Deep neural network parameter decision, pages 253-254, left column paragraphs 1-4, starting with “In this section…”; Zhang teaches that training data sets are used after they have identified the ideal model (D_net1) with 296 hidden layer neurons. Output is classification of groups of images, each group containing multiple images.);
Examiner notes that as previously stated, the golden sample can also be referred to as a training set, where the desired outcome is already known.
applying the outputs from the neurons in a preceding hidden layer of the plurality of hidden layers to a succeeding hidden layer of the plurality of hidden layers that is connected to and succeeds the preceding hidden layer so as to obtain a plurality of outputs from the neurons in the succeeding hidden layer (Zhang; 5.2 Deep neural network parameter decision, pages 253-254, left column paragraphs 1-4, starting with “In this section…”; D_net1 with 296 hidden neurons is trained, using set T, ten groups testing images drawn from testing sets EA and EB.);
Examiner notes that what is being recited in the claims is inherent to all neural networks with hidden layers. Outputs of neurons in a preceding layer must go on to neurons in the succeeding layer as long as they are still connected.
Zhang does not explicitly teach and for each of the neurons, deleting the neuron from the data-recognition model when it is determined that the output values of the outputs from the neuron are all substantially equal to zero, and retaining the neuron when otherwise.
Molchanov teaches and for each of the neurons, deleting the neuron from the data-recognition model when it is determined that the output values of the outputs from the neuron are all substantially equal to zero, and retaining the neuron when otherwise (Molchanov; Figure 1, 2.2 Criteria for Pruning, page 3; The least important neuron is removed or pruned.).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.
Regarding claim 21, Zhang-Simpson-Tecmath further in view of Molchanov [hereinafter Zhang-Simpson-Tecmath-Molchanov] teaches all the limitations and motivations of claim 5 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 5 applies equally as well to those elements of claim 21. The claims additionally recite a computer system, data storage, and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.
Regarding claim 23, Zhang-Simpson-Tecmath-Molchanov teaches all the limitations and motivations of claim 7 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 7 applies equally as well to those elements of claim 23. The claims additionally recite a computer system, data storage, and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.
Regarding claim 24, Zhang teaches computer system of Claim 21, wherein said intra-layer compression module (43) and said inter-layer compression module (44) are implemented using software stored in said data storage (1) (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…” ; The networks were tested on the testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM).
Examiner notes that the system Zhang describes maps to the limitations as claimed. The system described must contain data storage to store software.
Regarding claim 25, Zhang teaches computer system of Claim 21, wherein said intra-layer compression module (43) and said inter-layer compression module (44) are implemented using one of firmware included in a microcontroller, an application-specific integrated circuit (ASIC) chip 10and a programmable logic device (PLD) (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…” ; The networks were tested on the testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM).
Examiner notes that the system Zhang describes maps to the limitations as claimed. The system described must contain microcontrollers and circuit chips and programmable logic devices.
Regarding claim 26, Zhang-Simpson-Tecmath-Molchanov teaches all the limitations and motivations of claim 8 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 8 applies equally as well to those elements of claim 26. The claims additionally recite a computer system, data storage, and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.
Regarding claim 27, Zhang-Simpson-Tecmath-Molchanov teaches all the limitations and motivations of claim 9 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 9 applies equally as well to those elements of claim 27. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivation to combine Zhang-Simpson-Tecmath with Molchanov are the same motivation previously used for the claim 5 rejection.
Regarding claim 29, Zhang teaches computer system of Claim 26, wherein said intra-layer compression module (43) and said inter-layer compression module (44) are implemented using software stored in said data storage (1) (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…” ; The networks were tested on the testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM).
Examiner notes that the system Zhang describes maps to the limitations as claimed. The system described must contain data storage to store software.
Regarding claim 30, Zhang teaches computer system of Claim 26, wherein said intra-layer compression module (43) and said inter-layer compression module (44) are implemented using one of firmware included in a microcontroller, an application-specific integrated circuit (ASIC) chip 10and a programmable logic device (PLD) (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…” ; The networks were tested on the testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM).
Examiner notes that the system Zhang describes maps to the limitations as claimed. The system described must contain microcontrollers and circuit chips and programmable logic devices.

Claims 6, 10, 22, 28 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang-Simpson-Tecmath-Molchanov further in view of StackOverflow, et al. “How to Get Mean, Median, and Other Statistics over Entire Matrix, Array or Dataframe?” Stack Overflow, 27 Mar. 2013 [hereinafter StackOverflow].
Regarding claim 6, StackOverflow teaches the method of Claim 5, wherein the reference value is the average of the plurality of weights, a number approximating the average of the plurality of weights, or a mode of the plurality of weights that approximates the average of the plurality of weights (StackOverflow; Ways to find the average of matrices are shared.).
Examiner notes that this claim’s “and” is being interpreted as an “or,” as previously noted under the 112(b) rejection. Furthermore, under the broadest reasonable interpretation, the plurality of weights can be in the form of a matrix. StackOverflow teaches that ways of finding the average of values within a matrix are already very well-known and in fact have simple implementations in many programming or scripting languages.
It would have been obvious before the effective filing date to modify the teachings of Zhang-Simpson-Tecmath-Molchanov and combine it with StackOverflow because StackOverflow teaches ways to find the average of values within a matrix with simple function calls rather than brute forcing. 
Regarding claim 10, StackOverflow teaches the method of Claim 9, wherein the reference value is the average of the plurality of weights, a number approximating the average of the plurality of weights, or a mode of the plurality of weights that approximates the average of the plurality of weights (StackOverflow; Ways to find the average of matrices are shared.).
Examiner notes that this claim’s “and” is being interpreted as an “or,” as previously noted under the 112(b) rejection. Furthermore, under the broadest reasonable interpretation, the plurality of weights can be in the form of a matrix. StackOverflow teaches that ways of finding the average of values within a matrix are already very well-known and in fact have simple implementations in many programming or scripting languages.
The motivation to combine Zhang-Simpson-Tecmath-Molchanov with StackOverflow are the same motivation previously used for the claim 6 rejection.
Regarding claim 22, Zhang-Simpson-Tecmath-Molchanov further in view of StackOverflow [hereinafter Zhang-Simpson-Tecmath-Molchanov-StackOverflow] teaches all the limitations and motivations of claim 6 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 6 applies equally as well to those elements of claim 22.
The motivation to combine Zhang-Simpson-Tecmath-Molchanov with StackOverflow are the same motivation previously used for the claim 6 rejection.
Regarding claim 28, Zhang-Simpson-Tecmath-Molchanov further in view of StackOverflow [hereinafter Zhang-Simpson-Tecmath-Molchanov-StackOverflow] teaches all the limitations and motivations of claim 10 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 10 applies equally as well to those elements of claim 28.
The motivation to combine Zhang-Simpson-Tecmath-Molchanov with StackOverflow are the same motivation previously used for the claim 6 rejection.

Claims 11, 13, 14, 16, 31, 33, 34, 36 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang-Simpson-Tecmath further in view of Liang, Ming, Xiaolin Hu, and Bo Zhang. "Convolutional neural networks with intra-layer recurrent connections for scene labeling." Advances in neural information processing systems 28 (2015) [hereinafter Liang] further in view of Wang, Gang. "A novel neural network model specified for representing logical relations." arXiv preprint arXiv:1708.00580 (2017) [hereinafter Wang], further in view of Yellamraju, Suryateja, et al. "Design of various logic gates in neural networks." 2013 Annual IEEE India Conference (INDICON). IEEE, 2013 [hereinafter Yellamraju].
Regarding claim 11, Zhang teaches the method of Claim 1, wherein: the data-recognition model includes a hidden layer set including a plurality of hidden layers that are connected in series, each of the plurality of hidden layers including a plurality of neurons, each of the neurons having a weight matrix containing a plurality of weights respectively for a plurality of multiplication operations (Zhang; Figure 3, 5.2 Deep neural network decision; The base system flowchart has hidden layer with neurons connected in a series… one of the factors for finding the optimal neural network format is by altering the number of hidden layers. For example, networks with two hidden layers or three hidden layers are tested with 50, 100, and 196 neurons.);
The examiner notes that all neural networks have at least one neural network. Furthermore, by definition, deep neural networks have multiple layers of hidden neural networks. The claims and Zhang are both referring to deep neural networks. Furthermore, under the broadest reasonable interpretation, hidden layers connected in series can mean the layers are connected in some sort of series, which they must be, since they are layers within the same deep neural network.
Zhang does not explicitly teach and the method further comprises I) categorizing the plurality of neurons into a plurality of groups, wherein each of the groups includes one representative neuron, and for each neuron included in one of the groups, the weights in the weight matrix of the neuron satisfy a predetermined criterion with reference to the representative neuron of said one of the groups; J) generating, for each of the neurons, a candidate circuit data set representing a plurality of logic circuits that correspond respectively to the multiplication operations, and a total number of logic gates for implementing the logic circuits; and K) for each of the groups, from among the candidate circuit data sets respectively for the neurons, selecting one candidate circuit data set having the smallest number of logic gates as a common circuit data set, and generating a layout for the logic circuits represented by the common circuit data set.
Liang teaches and the method further comprises I) categorizing the plurality of neurons into a plurality of groups, wherein each of the groups includes one representative neuron, and for each neuron included in one of the groups, the weights in the weight matrix of the neuron satisfy a predetermined criterion with reference to the representative neuron of said one of the groups (Liang; 3.1 RCNN, page 4, paragraphs 1-3, Figure 2; All neurons in the same layer are connected in series, where the last neuron (that receives connections) is then connected to the neurons of the next layer.);
Under the broadest reasonable interpretation, a layer can be considered a group of neurons. The last neuron of the layer that receives all the output connections of other neurons within the layer acts as the representative neuron for the group.
It would have been obvious before the effective filing date to modify the teachings of Zhang-Simpson-Tecmath and combine it with Liang because Liang’s method allows seamless integration of feature extraction and context modulation which typically entail separate modules for the two steps. The model outperforms many state-of-the-art models in accuracy and efficiency (Liang; Abstract; page 1).
Liang does not explicitly teach J) generating, for each of the neurons, a candidate circuit data set representing a plurality of logic circuits that correspond respectively to the multiplication operations, and a total number of logic gates for implementing the logic circuits; and K) for each of the groups, from among the candidate circuit data sets respectively for the neurons, selecting one candidate circuit data set having the smallest number of logic gates as a common circuit data set, and generating a layout for the logic circuits represented by the common circuit data set.
Wang teaches J) generating, for each of the neurons, a candidate circuit data set representing a plurality of logic circuits that correspond respectively to the multiplication operations, and a total number of logic gates for implementing the logic circuits; (Wang; III Simulating Logic Gates, pages 2-3, starting with right column, paragraph 5, starting with “In this section…”; The logic AND gate can be emulated by a simple PLDNN having three neurons and one CEL with two PELs.) and
Examiner notes that since each of the representative neurons is in fact made up of a group of connected neurons, the neuron can be a group of at least three neurons. Examiner further notes that in Boolean Algebra, AND operators have been considered synonymous with multiplication (A AND 0 = 0, A AND 1 = A, etc.).
It would have been obvious before the effective filing date to modify the teachings of Zhang-Simpson-Tecmath and Liang and combine it with Wang because Wang’s model allows neural networks to represent logical relations more directly and efficiently (Wang; Abstract, page 1).
Wang does not explicitly teach K) for each of the groups, from among the candidate circuit data sets respectively for the neurons, selecting one candidate circuit data set having the smallest number of logic gates as a common circuit data set, and generating a layout for the logic circuits represented by the common circuit data set.
Yellamraju teaches K) for each of the groups, from among the candidate circuit data sets respectively for the neurons, selecting one candidate circuit data set having the smallest number of logic gates as a common circuit data set, and generating a layout for the logic circuits represented by the common circuit data set (Yellamraju; II Circuit Design, page 3, left column paragraph 5, starting with “The neuron computes…”; The level of neuron activation or threshold is determined by the voltage present on the capacitor in the leaky storage node.).
Examiner notes that the threshold can act as the determinate that determines which circuit has the smallest amount of logic gates, as circuits with more gates will use more energy, thus higher voltage. The lower the threshold, the lower the acceptable voltage is, thus if it is low enough where only one circuit fits, then that is the circuit with the least gates.
It would have been obvious before the effective filing date to modify the teachings of Zhang-Simpson-Tecmath, Liang and Wang and combine it with Yellamraju because Yellamraju’s implementation further improves upon known Boolean operations using neurons (Yellamraju; abstract; page 1).
	Regarding claim 13, Wang teaches the method of Claim 11, wherein each of the logic circuits includes an input port for receiving an input parameter used in the multiplication operation with the weight, (Wang; III Simulating Logic Gates, page 2, right column paragraph 6, starting with “In this section…”; System has input neurons and output neuron.) and
Examiner notes that since the logic circuits are represented by neurons within a neural network, they inherently have input ports. Under the broadest reasonable interpretation, the input neurons that accept the input into the AND logic circuit as taught by Wang would act as the input port.
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.
an output port for outputting a calculation result of the multiplication operation (Wang; III Simulating Logic Gates, page 2, right column paragraph 6, starting with “In this section…”; System has input neurons and output neuron.).
Examiner further notes that since the logic circuits are represented by neurons within a neural network, they inherently have input ports. Under the broadest reasonable interpretation, the output neuron that output the result of the AND logic circuit as taught by Wang would act as the output port. The multiplication is one of the operations that can be done by the neural network.
Regarding claim 14, Zhang teaches the method of Claim 1, wherein: the data-recognition model includes a hidden layer set including a plurality of hidden layers that are connected in series, each of the plurality of hidden layers including a plurality of neurons, each of the neurons having a weight matrix containing a plurality of weights respectively for a plurality of multiplication operations (Zhang; Figure 3, 5.2 Deep neural network decision; The base system flowchart has hidden layer with neurons connected in a series… one of the factors for finding the optimal neural network format is by altering the number of hidden layers. For example, networks with two hidden layers or three hidden layers are tested with 50, 100, and 196 neurons.);
The examiner notes that all neural networks have at least one neural network. Furthermore, by definition, deep neural networks have multiple layers of hidden neural networks. The claims and Zhang are both referring to deep neural networks. Furthermore, under the broadest reasonable interpretation, hidden layers connected in series can mean the layers are connected in some sort of series, which they must be, since they are layers within the same deep neural network.
Zhang does not explicitly teach and the method further comprises L) categorizing the plurality of neurons into a plurality of groups, wherein each of the groups includes one representative neuron, and for each neuron included in one of the groups, the weights in the weight matrix of the neuron satisfy a predetermined criterion with reference to the representative neuron of said one of the groups; M) generating, for each of the groups, a common  circuit data set representing a plurality of logic circuits that correspond respectively to the multiplication operations of the weights of the representative neuron of the group; and N) generating, for each of the groups, a layout for the logic circuits represented by the common circuit data set.
Liang teaches and the method further comprises L) categorizing the plurality of neurons into a plurality of groups, wherein each of the groups includes one representative neuron, and for each neuron included in one of the groups, the weights in the weight matrix of the neuron satisfy a predetermined criterion with reference to the representative neuron of said one of the groups; (Liang; 3.1 RCNN, page 4, paragraphs 1-3, Figure 2; All neurons in the same layer are connected in series, where the last neuron (that receives connections) is then connected to the neurons of the next layer.);
Under the broadest reasonable interpretation, a layer can be considered a group of neurons. The last neuron of the layer that receives all the output connections of other neurons within the layer acts as the representative neuron for the group.
Liang does not explicitly teach M) generating, for each of the groups, a common  circuit data set representing a plurality of logic circuits that correspond respectively to the multiplication operations of the weights of the representative neuron of the group; and N) generating, for each of the groups, a layout for the logic circuits represented by the common circuit data set.
Wang teaches M) generating, for each of the groups, a common circuit data set representing a plurality of logic circuits that correspond respectively to the multiplication operations of the weights of the representative neuron of the group; and (Wang; III Simulating Logic Gates, pages 2-3, starting with right column, paragraph 5, starting with “In this section…”; The logic AND gate can be emulated by a simple PLDNN having three neurons and one CEL with two PELs.) and
Examiner notes that since each of the representative neurons is in fact made up of a group of connected neurons, the neuron can be a group of at least three neurons. Examiner further notes that in Boolean Algebra, AND operators have been considered synonymous with multiplication (A AND 0 = 0, A AND 1 = A, etc.). Collectively, the operators and input/output neurons that are represented by the groups of neurons act as the common circuit data set representing a plurality of logic circuits.
Wang does not explicitly teach N) generating, for each of the groups, a layout for the logic circuits represented by the common circuit data set.
Yellamraju teaches N) generating, for each of the groups, a layout for the logic circuits represented by the common circuit data set. (Yellamraju; II Circuit Design, page 3, left column paragraph 5, starting with “The neuron computes…”; The level of neuron activation or threshold is determined by the voltage present on the capacitor in the leaky storage node.).
Examiner notes that the threshold can act as the determinate that determines which circuit has the smallest amount of logic gates, as circuits with more gates will use more energy, thus higher voltage. The lower the threshold, the lower the acceptable voltage is, thus if it is low enough where only one circuit fits, then that is the circuit with the least gates. By definition, if the ideal circuit with the smallest amount of logic gates has been selected, it must have been generated.
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.
Regarding claim 16, Wang teaches the method of Claim 14, wherein each of the logic circuits of the common circuit data set includes an input port for receiving an input parameter so as to perform a calculation using the weight matrix, and (Wang; III Simulating Logic Gates, page 2, right column paragraph 6, starting with “In this section…”; System has input neurons and output neuron.) and
Examiner notes that since the logic circuits are represented by neurons within a neural network, they inherently have input ports. Under the broadest reasonable interpretation, the input neurons that accept the input into the AND logic circuit as taught by Wang would act as the input port.
an output port for outputting a calculation result. (Wang; III Simulating Logic Gates, page 2, right column paragraph 6, starting with “In this section…”; System has input neurons and output neuron.).
Examiner further notes that since the logic circuits are represented by neurons within a neural network, they inherently have input ports. Under the broadest reasonable interpretation, the output neuron that output the result of the AND logic circuit as taught by Wang would act as the output port. The multiplication is one of the operations that can be done by the neural network.
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.

Regarding claim 31, Zhang-Simpson-Tecmath further in view of Liang [hereinafter Zhang-Simpson-Tecmath-Liang] further in view of Wang [hereinafter Zhang-Simpson-Tecmath-Liang-Wang] further in view of Yellamraju [hereinafter Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju] teaches all the limitations and motivations of claim 11 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 11 applies equally as well to those elements of claim 31. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.
Regarding claim 33, Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju teaches all the limitations and motivations of claim 13 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 13 applies equally as well to those elements of claim 33.
The claims additionally recite a computer system. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.
Regarding claim 34, Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju teaches all the limitations and motivations of claim 14 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 14 applies equally as well to those elements of claim 34. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.
Regarding claim 36, Zhang-Simpson-Tecmath further in view of Wang [hereinafter Zhang-Simpson-Tecmath-Wang] teaches all the limitations and motivations of claim 16 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 16 applies equally as well to those elements of claim 36. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivations to combine Zhang-Simpson-Tecmath with Liang, Wang, and Yellamraju are the same motivations previously used for the claim 11 rejection.

Claims 12, 15, 32, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju further in view of Augasta, M., and Thangairulappan Kathirvalavakumar. "Pruning algorithms of neural networks—a comparative study." Open Computer Science 3.3 (2013): 105-115 [hereinafter Augasta].
Regarding claim 12, Augasta teaches the method of Claim 11, wherein step I) includes, for each of the groups: calculating, for each of the neurons that has not been categorized, a summation of weight differences with respect to the representative neuron of the group, each of the weight differences being calculated as an absolute value of a difference between one of the weights of the neuron and a corresponding one of the weights of the representative neuron; (Augasta; 3 Survey of pruning algorithms, Page 108, paragraph 4 starting with “Zeng and Yeung have proposed…”; Summation of absolute values of its outgoing weights…) 
and when it is determined that a neuron satisfies the predetermined criterion with the representative neuron, assigning the neuron to the group, wherein the predetermined criterion is that the summation is smaller than a predetermined threshold (Augasta; 3 Survey of pruning algorithms, Page 108, paragraph 4 starting with “Zeng and Yeung have proposed…”; method then prunes the hidden neurons with lowest relevance).
Examiner notes that the lowest relevance does not necessarily mean below a threshold. Since relevance can be user defined, the user can modify the “lowest relevance” such that values below or smaller than a threshold is the desired feature.
It would have been obvious before the effective filing date to modify the teachings of Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju and combine it with Augasta because the algorithm takes into account expected input deviation and estimates the relevance of a neuron (Augasta; page 108, paragraph 4, starting with “Zeng and Yeung…”)
Regarding claim 15, Augasta teaches the method of Claim 14, wherein step I) includes, for each of the groups: calculating, for each of the neurons that has not been categorized, a summation of differences of weights, with respect to the representative neuron of the group, in absolute value; and (Augasta; 3 Survey of pruning algorithms, Page 108, paragraph 4 starting with “Zeng and Yeung have proposed…”; Summation of absolute values of its outgoing weights of neurons…) 
when it is determined that a neuron satisfies the criterion with the representative neuron, assigning the neuron to the group, wherein the criterion is that the summation is smaller than a predetermined threshold (Augasta; 3 Survey of pruning algorithms, Page 108, paragraph 4 starting with “Zeng and Yeung have proposed…”; method then prunes the hidden neurons with lowest relevance).
Examiner notes that the lowest relevance does not necessarily mean below a threshold. Since relevance can be user defined, the user can modify the “lowest relevance” such that values below or smaller than a threshold is the desired feature.
The motivations to combine Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju and Augasta are the same motivations previously used for the claim 12 rejection.
Regarding claim 32, Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju further in view of Augasta [hereinafter Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju-Augasta] teaches all the limitations and motivations of claim 12 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 12 applies equally as well to those elements of claim 32. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivations to combine Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju and Augasta are the same motivations previously used for the claim 12 rejection.
Regarding claim 35, Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju further in view of Augasta [hereinafter Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju-Augasta] teaches all the limitations and motivations of claim 15 in system form rather than method form. Therefore, the supporting rationale of the rejection to claim 15 applies equally as well to those elements of claim 35. The claims additionally recite a computer system and a processor. Zhang teaches a testing platform consisting of Windows 7 (64 bit), matlab R2014a, an Inter Xeon(R) E31245 v3 CPU, 3.4 Hz processor and 8GB of RAM (Zhang; 5.1 Dataset and performance evaluation, page 251, right column, paragraph 2 starting with “We evaluate…”).
The motivations to combine Zhang-Simpson-Tecmath-Liang-Wang-Yellamraju and Augasta are the same motivations previously used for the claim 12 rejection.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC WU whose telephone number is (571)272-3380. The examiner can normally be reached Monday-Friday between 9AM and 6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ERIC C WU/Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128