DETAILED ACTION

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/26/2022 has been entered.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 14 – 18, 21 – 25, and 28 – 32 is/are rejected under 35 U.S.C. 103 as being unpatentable over GREER (US 2008/0152217) in view of Yan et al (2018/0218518). 
As to claim 14, GREER teaches an apparatus comprising: 
at least one processor (paragraph [0314]...a single processor or by a large number of processors); and 
a memory (paragraph [0313]...stored onto a volatile or non-volatile storage medium)  to store instructions that, when executed by the at least one processor, cause the at least one processor to: 
access data representing a deep convolutional (paragraph [0150]... convolution kernel) neural network (CNN) model (paragraph [0009]...neural model), wherein the CNN model comprises a plurality of kernels associated with a plurality of kernel elements (paragraph [0203]...reproducing kernel only for case of one-dimensional wavelets .phi..sub.x,s. For the multidimensional case, we note that the definition (20) is expressed as an inner product on R which can be easily extended to the inner product on any spectral manifold J. In general, reproducing kernels require only the mathematical structure of a Hilbert space); 
train the CNN model to sparsify and fine-tune (paragraph [0280]... [0280] The RKE signal can help train the input and latch NMs by serving as an error signal that helps to "fine tune " the NM association formation process so that they generate outputs that are more accurate) the CNN model (paragraph [0049]... [0049] The network weights, learned during the training process, act as long-term memory capable of recalling associations, but cannot serve as the working memory required for computations), wherein the training produces a plurality of intermediate models corresponding to different versions of a sparsified model for the CNN model (paragraph [0185]...the nodes of the neural network model are partitioned into the input layer, the hidden layer and the output layer. In the neural manifold model, the input layer is analogous to the input manifold H and the output layer is analogous to the output manifold Q. Both H and Q represent the continuous distribution of neurotransmitters in physical space. The "hidden" layer is the space N.sub.H,Q, which equals the Cartesian product of two function spaces, the space of all possible measures on H and the space all possible output functions on Q. The individual neurons g.sub.i are points in this infinite-dimensional product space), and each intermediate model has an associated set of kernel elements corresponding to zero values (paragraph [0199]... [0199] From the definition of the reproducing kernel (20), we see that at a fixed position (x.sub.0,s.sub.0) in the spectral manifold, the kernel K(x.sub.0,s.sub.0,.xi.,.eta.) is zero or near zero for values of (.xi.,.eta.) where the spectral functions .phi..sub.x.sub.0.sub.,s.sub.0 and .phi..sub..xi.,.eta. do not overlap); and 
control the training to cause the training to bypass mathematical operations involving the kernel elements corresponding to zero values (paragraph [0276]... [0276] As is the case with a neural network, in a general association NM there is no direct path from the input to the output. We refer to a pathway that bypasses one of the four general association NMs as a shunt. Without shunts, it is not possible to copy one of the inputs [r.sub.i] or [s.sub.i] directly to the outputs p or q. In FIG. 11, we have shown a single shunt for the input NMs of the interior .LAMBDA.-MAP with a heavy dashed line. Shunts on the other three NMs are also possible but have been omitted from the diagram).
GREER fails to explicitly show/teach wherein the training performs multiple training iterations on the CNN model and maintains the CNN model at a given sparsity level during the multiple training iterations, and fine-tunes the sparsified model to produce the plurality of intermediate models. 
However, Yan et al teaches wherein the training performs multiple training iterations on the CNN model and maintains the CNN model at a given sparsity level during the multiple training iterations (paragraph [0034]...Sparsity in a layer of a CNN is defined as the fraction of zeros in the layer's weight and input activation matrices. The primary technique for creating weight sparsity is to prune the network during training. In one embodiment, any weight with an absolute value that is close to zero (e.g. below a defined threshold) is set to zero. In one embodiment, the compaction engine 215 sets weights having absolute values below a defined threshold to zero. If the weights are in a compacted format, the compaction engine 215 reformats the weights as needed after setting one or more weights to zero to produce compacted weights. The pruning process has the effect of removing weights from the filters, and sometimes even forcing an output activation to always equal zero. The remaining network may be retrained, to regain the accuracy lost through naïve pruning. The result is a smaller network with accuracy extremely close to the original network. The process can be iteratively repeated to reduce network size while maintaining accuracy), and fine-tunes the sparsified model to produce the plurality of intermediate models (The pruning process has the effect of removing weights from the filters, and sometimes even forcing an output activation to always equal zero. The remaining network may be retrained, to regain the accuracy lost through naïve pruning. The result is a smaller network with accuracy extremely close to the original network. The process can be iteratively repeated to reduce network size while maintaining accuracy ; Examiner’s Note: each iteration is considered an intermediate model, so multiple iterations will produce multiple intermediate models).
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for GREER’s training to performs multiple training iterations on the CNN model and maintains the CNN model at a given sparsity level during the multiple training iterations, and fine-tunes the sparsified model to produce the plurality of intermediate models, as in Yan et al, for the purpose of improving efficiency of neural network calculations

As to claim 15, GREER teaches the apparatus, wherein the instructions, when executed by the at least one processor, cause the at least one processor to: 
store nonzero values for the kernel elements of the plurality of kernel elements in the memory which do not correspond to zero values (paragraph [0066]...second, wavelet networks are defined in terms of a set of functions that form a wavelet basis; while the methods described herein are defined in terms of the continuous wavelet transform (CWT) or a wavelet frame that contains redundant information. This permits the creation of associations with fewer non-zero coefficients, and allows the use of the reproducing kernel to reduce noise and create stability in the recursive connections); and 
store data in the memory representing a mask (paragraph [0085]...the masking and multiplexing operations and the orthogonal projections) identifying the kernel elements corresponding to zero values.

As to claims 16 and 18, Yan et al teaches an apparatus, wherein the mask (paragraph [0038]...a weight zero mask register 255) comprises bits, and a given bit of the bits represents multiple kernel weights having corresponding zero values (paragraph [0039]...multi-bit values (weights and/or input activations) and a single bit signal indicates which of the multi- bit values equals zero).
It would have been obvious for the mask comprises bits, and a given bit of the bits represents multiple kernel weights having corresponding zero values, as in Yan et al, for the same reasons as above. 

As to claim 17, Yan et al teaches an apparatus, wherein the nonzero values comprises columns or rows of nonzero data (paragraph [0044]...FIG. 3A illustrates a conceptual diagram of input data 305 and a compact data format 300, in accordance with one embodiment. The weights and activations are organized as matrices, such as the 2×4 matrix of the input data 305. The compaction engine 215 may be configured to convert the input data 305 to a compact data format 300. The non-zero values are extracted from the input data 305 to generate the non-zero data 315. A zero bitmask 310 is generated indicating positions of non-zero values and zeros in the input data 305. As shown in FIG. 3, the positions of non-zero values are indicated by bits set TRUE (e.g., logic one) and the positions of zeros are indicated by bits set FALSE (e.g., logic zero). In another embodiment, the positions of zeroes are indicated by bits set TRUE and the positions of non-zero values are indicated by bits set FALSE. Note that each bit in the zero bitmask corresponds to a setting of the single bit signal that is output by the expansion engine)
It would have been obvious for the nonzero values comprises columns or rows of nonzero data, as in Yan et al, for the same reasons as above. 

	Claim 21 has similar limitations as claim 14. Therefore, the claim is rejected for the same reasons as above. 

	Claim 22 has similar limitations as claim 15. Therefore, the claim is rejected for the same reasons as above. 

Claim 23 has similar limitations as claim 16. Therefore, the claim is rejected for the same reasons as above.

Claim 24 has similar limitations as claim 17. Therefore, the claim is rejected for the same reasons as above.

Claim 25 has similar limitations as claim 18. Therefore, the claim is rejected for the same reasons as above.

	Claim 28 has similar limitations as claim 14. Therefore, the claim is rejected for the same reasons as above. 

	Claim 29 has similar limitations as claim 15. Therefore, the claim is rejected for the same reasons as above. 

Claim 30 has similar limitations as claim 16. Therefore, the claim is rejected for the same reasons as above.

Claim 31 has similar limitations as claim 17. Therefore, the claim is rejected for the same reasons as above.

Claim 32 has similar limitations as claim 18. Therefore, the claim is rejected for the same reasons as above.

Response to Arguments
Applicant's arguments filed 8/26/2022 have been fully considered but they are not persuasive. 
GREER fails to explicitly show/teach wherein the training performs multiple training iterations on the CNN model and maintains the CNN model at a given sparsity level during the multiple training iterations, and fine-tunes the sparsified model to produce the plurality of intermediate models. 
However, Yan et al teaches wherein the training performs multiple training iterations on the CNN model and maintains the CNN model at a given sparsity level during the multiple training iterations (paragraph [0034]...Sparsity in a layer of a CNN is defined as the fraction of zeros in the layer's weight and input activation matrices. The primary technique for creating weight sparsity is to prune the network during training. In one embodiment, any weight with an absolute value that is close to zero (e.g. below a defined threshold) is set to zero. In one embodiment, the compaction engine 215 sets weights having absolute values below a defined threshold to zero. If the weights are in a compacted format, the compaction engine 215 reformats the weights as needed after setting one or more weights to zero to produce compacted weights. The pruning process has the effect of removing weights from the filters, and sometimes even forcing an output activation to always equal zero. The remaining network may be retrained, to regain the accuracy lost through naïve pruning. The result is a smaller network with accuracy extremely close to the original network. The process can be iteratively repeated to reduce network size while maintaining accuracy), and fine-tunes the sparsified model to produce the plurality of intermediate models (The pruning process has the effect of removing weights from the filters, and sometimes even forcing an output activation to always equal zero. The remaining network may be retrained, to regain the accuracy lost through naïve pruning. The result is a smaller network with accuracy extremely close to the original network. The process can be iteratively repeated to reduce network size while maintaining accuracy ; Examiner’s Note: each iteration is considered an intermediate model, so multiple iterations will produce multiple intermediate models).
Therefore, GREER  in view of  Yan et al clearly shows all the limitations as claimed. 




Allowable Subject Matter
Claims 19, 20, 26, 27, 33, and 34 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075. The examiner can normally be reached Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRANDON S COLE/           Primary Examiner, Art Unit 2128