DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 17 May 2022, in response to the Office Action mailed 17 February 2022.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.

The rejection of claims 9-13 under 35 U.S.C. 101 has been withdrawn due to the amendments filed.

The rejections of claims 9-13, 16, and 17 under 35 U.S.C. 112, second paragraph, have been withdrawn due to the amendments filed.


Claim Objections
Claims 10-13 are objected to because of the following informalities:  the claims refer to “the at least one machine-readable storage medium of claim 9” but appear as though they should refer to “the at least one non-transitory machine-readable storage medium of claim 9”.  Appropriate correction is required.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aliabadi (US 2018/0096226) in view of David (US 2019/0080243) or, alternatively, over David in view of Aliabadi, both as described below.

As per claim 1, Aliabadi teaches an apparatus comprising: a processor [an implementation of CNNs on a computing device including multiple processors executing SIMD instructions (paras. 0028, 0067, etc.)]; and memory coupled to the processor, the memory comprising instructions [an implementation of CNNs on a computing device including multiple processors executing SIMD instructions (paras. 0028, 0067, etc.) from a memory (para. 0034, etc.)] which, when executed by the processor, cause the processor to: generate, based in part on a receptive field size and a number of learnable parameters, a plurality of filters for a convolutional neural network (CNN) [a kernel of a CNN is divided into multiple runnels (filters) (para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (para. 0031, etc.) and a size of a number of feature maps (receptive field size) (para. 0256, etc.)], wherein the number of learnable parameters is based on a computing characteristic of the apparatus [a kernel of a CNN is divided into multiple runnels (filters) (para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (para. 0031, etc.)]; and train the CNN on a validation set [the CNN may be trained to learn the parameters of the model using training data (para. 0025, etc.)].
While Aliabadi teaches selecting and training filters of a CNN (see above) it does not explicitly teach each filter comprising the number of learnable parameters arranged in different random configurations on the filter, select one filter from the plurality of filters based on a convergence speed, for each of the plurality of filters, of the CNN; and train the CNN on a validation set using the one filter from the plurality of filters.
David teaches an apparatus comprising: a processor [a system including a processor executing instructions from memory (para. 0060, etc.)]; and memory coupled to the processor, the memory comprising instructions [a system including a processor executing instructions from memory (para. 0060, etc.)] which, when executed by the processor, cause the processor to: generate filters, each filter comprising the number of learnable parameters arranged in different random configurations on the filter [filters of a CNN may be modified by selecting and modifying filters, as well as changing random weights with random values (abstract; paras. 0045-47; etc.)], select one filter from the plurality of filters based on a convergence speed, for each of the plurality of filters, of the CNN [the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (paras. 0034, 0045-47, etc.)]; and train the CNN on a validation set using the one filter from the plurality of filters [the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (paras. 0034, 0045-47, etc.), used to train the CNN on a training set (para. 0009, etc.)].
Aliabadi and David are analogous art, as they are within the same field of endeavor, namely optimizing CNNs.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to include random mutation of filter weights and selecting filters for speed of convergence, in the optimization of the CNN, as taught by David, for the selection and modification of weights for optimizing a CNN in the system taught by Aliabadi.
David provides motivation as [choosing filters to improve convergence speed improves and speeds up training of the CNN while randomization allows greater exploration (paras. 0034, 0045-47, etc.)].
Alternatively/additionally it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to include selecting a number of learnable parameters of filters based upon a computing characteristic of the apparatus for SIMD instructions/execution, as taught by Aliabadi, for the selection of filters in the system taught by David.
Aliabadi provides motivation as [by fitting the number of weights in the filters to the size of registers used by the ISA to implement the convolution layers, the system can more efficiently implement the CNN (para. 0031, etc.) while using a SIMD architecture allows for improved parallel computations (para. 0067, etc.)].

As per claim 2, Aliabadi/David teaches wherein the computing characteristic is the ability of the processor to execute a Single Instruction, Multiple Data (SIMD) instruction set [each runnel (filter) is based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.)].

As per claim 3, Aliabadi/David teaches wherein the filter is disposed in a channel of the CNN [each layer of the CNN including the kernels including the filters has M input channels to the runnels (Aliabadi: para. 0030, etc.); and each CNN includes a hierarchy of layers, each including one or more channels (David: para. 0010, etc.)].

As per claim 4, Aliabadi/David teaches wherein the filter is disposed in a layer of the CNN [each layer of the CNN including the kernels including the filters has M input channels to the runnels (Aliabadi: para. 0030, etc.); and each CNN includes a hierarchy of layers, each including one or more channels (David: para. 0010, etc.)].

As per claim 5, while Aliabadi/David teaches various receptive field sizes and numbers of parameters (see, e.g., David: para. 0031; Aliabadi: fig. 1; paras. 0084-85; etc.) it does not explicitly teach wherein the receptive field size is 5 x 5 and the number of learnable parameters is 8.  However, it has been held that where the general conditions of a claim are disclosed in the prior art, discovering the optimum or working ranges involves only routine skill in the art. In re Aller, 105 USPQ 233.  Furthermore, it has been held that a change in size is within the level of ordinary skill in the art.  In re Rose, 105 USPQ 237 (CCPA 1955).

As per claim 6, Aliabadi/David teaches the memory comprising instructions which, when executed by the processor, cause the processor to: generate, based in part on the receptive field size and the number of learnable parameters, a second filter comprising the number of learnable parameters, wherein the learnable parameters are arranged in a second configuration on the second filter [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.); where filters of a CNN may be modified by selecting and modifying filters, as well as changing random weights with random values (David: abstract; paras. 0045-47; etc.)]; and train a second CNN on the validation set using the second filter [the CNN may be trained to learn the parameters of the model using training data (Aliabadi: para. 0025, etc.) where the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (David: paras. 0034, 0045-47, etc.), used to train the CNN on a training set (David: para. 0009, etc.)].

As per claim 7, Aliabadi/David teaches wherein the second CNN converges faster than the CNN [the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (David: paras. 0034, 0045-47, etc.)].

As per claim 8, Aliabadi/David teaches instruction which, when executed by the processor, cause the processor to store the second configuration in a database [the chromosomes comprising the filters may be stored in a database (David: para. 0056, etc.)].

As per claim 9, Aliabadi/David teaches at least one non-transitory machine-readable storage medium comprising instructions [an implementation of CNNs on a computing device including multiple processors executing SIMD instructions (Aliabadi: paras. 0028, 0067, etc.) from a memory (Aliabadi: para. 0034, etc.); and/or a system including a processor executing instructions from memory (David: para. 0060, etc.)] that, when executed by a processor, cause the processor to: define a filter dimension to be used in a convolution layer of a convolutional neural network (CNN), wherein the filter dimension determines a receptive field size of the convolutional layer [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.); filters of a CNN may defined in a number of chromosomes and modified by selecting and modifying filters via mutation (David: abstract; paras. 0045-47; etc.)]; specify a number of learnable parameters based on a computing characteristic of the processor [a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.)]; generate a plurality of filters, each of the plurality of filters comprising the receptive field size and comprising the specified number of learning parameters [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.)], wherein the arrangement of learning parameters is random and distinct for each of the plurality of filters [filters of a CNN may be modified by selecting and modifying filters, as well as changing random weights with random values (David: abstract; paras. 0045-47; etc.)]; and execute the CNN using at least one of the plurality of filters [the CNN may be trained to learn the parameters of the model using training data before execution (Aliabadi: para. 0025, etc.) where the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (David: paras. 0034, 0045-47, etc.), used to train the CNN on a training set before execution (David: para. 0009, etc.)].
Examiner’s Note: the reasoning and motivation for the combination is provided above, in the rejection of claim 1.

As per claim 10, Aliabadi/David teaches instructions that further cause the processor to specify the number of learnable parameters based on a Single Instruction, Multiple Data (SIMD) computing characteristic of the processor [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.)].

As per claim 11, Aliabadi/David teaches instructions that further cause the processor to: use the at least one of the plurality of filters in a channel of the CNN; and use a second of the plurality of filters in a layer of the CNN [each layer of the CNN including the kernels including the filters has M input channels to the runnels (Aliabadi: para. 0030, etc.); and each CNN includes a hierarchy of layers, each including one or more channels (David: para. 0010, etc.)].

As per claim 12, Aliabadi/David teaches instructions that further cause the processor to select the at least one of the plurality of filters based on which one converges the fastest when running the CNN [the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (David: paras. 0034, 0045-47, etc.)].

As per claim 13, Aliabadi/David teaches instructions that further cause the processor to use the at least one of the plurality of filters to perform training and inference of the CNN [the CNN may be trained to learn the parameters of the model using training data before execution (Aliabadi: para. 0025, etc.) where the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (David: paras. 0034, 0045-47, etc.), used to train the CNN on a training set before execution (David: para. 0009, etc.)].

As per claim 14, Aliabadi/David teaches an apparatus comprising: a multi-processor supporting execution of a Single Instruction Multiple Data (SIMD) instruction set [an implementation of CNNs on a computing device including multiple processors executing SIMD instructions (paras. 0028, 0067, etc.) from a memory (Aliabadi: para. 0034, etc.)]; a SIMD register to be used when executing the SIMD instruction set [the SIMD instructions are executed using SIMD registers (Aliabadi: para. 0031, etc.)]; a memory coupled to the multi-processor, the memory comprising instructions [an implementation of CNNs on a computing device including multiple processors executing SIMD instructions (Aliabadi: paras. 0028, 0067, etc.) from a memory (Aliabadi: para. 0034, etc.) and/or a system including a processor executing instructions from memory (David: para. 0060, etc.)] which when executed by the multi-processor cause the multi-processor to: generate, based in part on a receptive field size and a number of learnable parameters, a plurality of filters for a convolutional neural network (CNN) [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.)], each filter comprising a number of learnable parameters arranged in different random configurations on the filter [filters of a CNN may be modified by selecting and modifying filters, as well as changing random weights with random values (David: abstract; paras. 0045-47; etc.)], wherein the number of learnable parameters is based on a computing characteristic of the multi-processor to execute the SIMD instruction set [kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.)]; select a first filter from the plurality of filters based on a convergence speed, for each of the plurality of filters, of the CNN [the modification of filters and creation of new filters includes selecting filter modifications that produce faster convergence (David: paras. 0034, 0045-47, etc.)]; and embed the first filter from the plurality of filters in a channel of the CNN, the CNN comprising a plurality of channels, wherein the CNN is executed by the multi-processor using the first filter and the SIMD instruction set [each layer of the CNN including the kernels including the runnels has M input channels to the runnels (Aliabadi: para. 0030, etc.) and the CNN is executed using SIMD instructions on a SIMD processor (Aliabadi: para. 0067)].
Examiner’s Note: the reasoning and motivation for the combination is provided above, in the rejection of claim 1.

As per claim 15, Aliabadi/David teaches the memory further comprising instructions which, when executed by the multi-processor, cause the multiprocessor to: select a second filter from the plurality of filters based on a convergence speed, for each of the plurality of filters, of the CNN; and embed the second filter from the plurality of filters in a second channel of the CNN [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.); where filters of a CNN may be selected for convergence speed and modified by selecting and modifying filters, as well as changing random weights with random values (David: abstract; paras. 0034, 0045-47; etc.)].

As per claim 16, Aliabadi teaches the memory further comprising instructions which, when executed by the multi-processor, cause the multiprocessor to: select a third filter from the plurality of filters based on a convergence speed, for each of the plurality of filters, of the CNN; and embed the third filter from the plurality of filters in a layer of the CNN [a kernel of a CNN is divided into multiple runnels (filters) (Aliabadi: para. 0006, etc.) each based on a number of weights (learnable parameters) that is based on a size of SIMD registers used for SIMD instructions (Aliabadi: para. 0031, etc.) and a size of a number of feature maps (receptive field size) (Aliabadi: para. 0256, etc.); where filters of a CNN may be selected for convergence speed and modified by selecting and modifying filters, as well as changing random weights with random values (David: abstract; paras. 0034, 0045-47; etc.)].

As per claim 17, Aliabadi/David teaches wherein the receptive field size and the number of learnable parameters for the plurality of filters are saved in the memory [the chromosomes comprising the filters may be stored in a database (David: para. 0056, etc.)].

As per claim 20, see the rejection of claim 8, above


Claim(s) 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aliabadi and David as applied to claim 14 above, and further in view of well-known practices in the art.


As per claim 18, while Aliabadi/David teaches various receptive field sizes and numbers of parameters (see, e.g., Aliabadi: fig. 1; paras. 0084-85; etc.) it does not explicitly teach wherein the receptive field size is 5 x 5 and the number of learnable parameters is 8.  However, it has been held that where the general conditions of a claim are disclosed in the prior art, discovering the optimum or working ranges involves only routine skill in the art. In re Aller, 105 USPQ 233.  Furthermore, it has been held that a change in size is within the level of ordinary skill in the art.  In re Rose, 105 USPQ 237 (CCPA 1955).

As per claim 19, while Aliabadi/David teaches various receptive field sizes and numbers of parameters (see, e.g., David: para. 0031; Aliabadi: fig. 1; etc.) it does not explicitly teach wherein the receptive field size is 10 x 10 and the number of learnable parameters is 64.  However, it has been held that where the general conditions of a claim are disclosed in the prior art, discovering the optimum or working ranges involves only routine skill in the art. In re Aller, 105 USPQ 233.  Furthermore, it has been held that a change in size is within the level of ordinary skill in the art.  In re Rose, 105 USPQ 237 (CCPA 1955).


Response to Arguments
Applicant's arguments filed 17 May 2022 have been fully considered but they are not persuasive.

Applicant argues that the cited art does not teach “each filter comprising the number of learnable parameters arranged in different random configurations on the filter” because David teaches that prior, entirely random mutations “often enlarge the search space too much so that deep CNNs does not converge to optimal values in practical or finite time or converge to sub-optimal values (in comparison to standard backpropagation) given similar amount of training time” (see para. 0041).
However, David also teaches “Some embodiments of the invention balance randomization (achieved by randomly selecting filters during recombination) with a constrained (non-random) recursive or propagating error correction mutation values that correct errors in the neuron weights” (see para. 0042), “Whereas conventional random mutation values cause GAs to oscillate wildly, some embodiments of the invention stabilize GAs by mutating weights by values shifted or corrected based on their errors” (para. 0045), that “the mutation would involve random modifications to the values of learning rate and momentum within the pre-specified reasonable ranges” (para. 0046) and “To expand the relatively smaller search space of the error correction models, some embodiments of the invention may perform additional mutations of the chromosome 304, for example, setting a sparse random subset (e.g., 1%) of the CNN weights to zero, to random values, or adding random values (e.g., noise). Zeroing mutations in mutated chromosome 306 may decrease or reset active connections and regularize the NN to correct false connections and prevent false correlations due to “over-training.” In this way, correct correlations will propagate to mutated chromosomes 306 and incorrect correlations will fade away.” (para. 0047).  This is within the broadest reasonable interpretation of “each filter comprising the number of learnable parameters arranged in different random configurations on the filter”.


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1-20 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Annapureddy (US 2016/0217369) – discloses a system including compression of filters in a CNN.
Ren (US 2018/0268284) – discloses trimming layers of a CNN including reducing filter sizes.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE GIROUX/Primary Examiner, Art Unit 2128