Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 6/3/2022.
Applicants arguments/remarks made in amendment filed 6/3/2022.

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/5/2022 has been entered.
 
Claims 2, 11, 18, and 25 are cancelled.
Claims 1, 5-7, 10, 14-17, 21-24, and 29-30 are amended.
Claims 31-34 are new.
Claims 1, 3-10, 12-17, 19-24, and 26-34 are pending.
Response to Arguments
Applicant presents arguments with regard to priority.  Each is addressed.
Applicant argues “the cited references fail to teach or suggest each and every element recited in the claims” as amended.  (Remarks, page 12, paragraph 5.) The argument is moot in view of new grounds of rejection necessitated by amendment.  See detailed rejection.
Applicant argues “Independent claims 10, 17, and 24 recite the same or analogous elements as claim 1…Applicants submit that the cited references, taken individually or in combination, do not disclose or suggest each and every element recited in the amended independent claims.”  (Remarks, page 13, paragraph 3.) However, claim 1 remains rejected.  Therefore claims 10, 17, and 24 remain rejected as well. 
Applicant argues that “claims 3-9, 12-16, 19-23, and 26-30 are patentable over the cited reference for at least the reasons discussed above with respect to claims 1, 10, 17, and 24, as well as for the elements recited therein.” (Remarks, page 16, paragraph 1.)  However, the independent claims remain rejected.  Therefore the dependent claims remain rejected as well.  See detailed rejection.
	
Claim Interpretation – 35 U.S.C. § 112(f)
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
Claims 24-30, and 34, which explicitly use the word “means” are being interpreted under 35 U.S.C. 112(f).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-10, 12-17, 19-24, and 26-34 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (You Look Twice: GaterNet for Dynamic Filter Selection in CNNs, herein Chen), and Veit et al (Convolutional Networks with Inference Graphs, herein Veit).
Regarding claim 1,
	Chen teaches a method for a neural network, comprising: (Chen, Page 1, column 2, paragraph 2, line 1 “In this paper, we propose a novel framework called GaterNet for input-dependent dynamic filter selection in convolutional neural networks (CNNs), shown in Fig. 1.”

    PNG
    media_image1.png
    455
    560
    media_image1.png
    Greyscale

In other words, novel framework called GaterNet for input-dependent dynamic filter selection in convolutional neural networks is a method for a neural network.)
	receiving, by a processor in a computing device, input in a layer in the neural network, the layer including two or more filters (Chen, page 1, column 1, paragraph 2, line 5 “In machine learning, conditional computation has been proposed to have a similar mechanism in deep learning models.” And, page 1, paragraph 1, line 4 “In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks(CNNs). And, page 3, column 1, paragraph 3, line 2 “Given an input, the gater network decides the set of filters in the backbone network for use while the backbone network does the actual prediction.”  Examiner notes it is implicit that a computer implemented method such as a neural network requires a processor in order to execute.  In other words, computation is processor, input is receiving…input, CNNs have layers is a layer in a neural network and set of filters is layer including two or more filters.)
	and two or more gating functionality components, wherein each of the two or more gating functionality components (Chen, Figure 1, “The gater extracts features and generates sparse binary gates for selecting filters in the backbone network in an input-dependent manner.”  In other words, sparse binary gates is two or more gating functionality components, sparse binary gates for selecting filters is components are configured to determine whether to activate at least one of the two or more filters, and filters is two or more filters.) comprise
	[an estimate relevance portion configured to generate a relevance value that identifies a relevance of at least one of the two or more filters in the layer based on the received input;]
	activating or deactivating the two or more filters (Chen, page 4, column 2, paragraph 2, line 1 “We use binary gates other than attention [31] or other real-valued gates for two reasons.  Firstly, binary gates can completely deactivate some filters for each input, and hence those filters will not be influenced by the irrelevant inputs.” And, page 1, column 1, paragraph 1, line 15 “…a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input.” In other words, selectively activating filters is activating, and deactivate some filters is deactivating filters.); and
	applying the received input to active filters in the layer to generate an activation  (Chen, page 3, column 1, paragraph 3, line 3 “Given an input, the gater network decides the set of filters in the backbone network for use while the backbone network does the actual prediction.” In other words, given and input is received input, decides the set of filters for use in the backbone network is applying the received input to active filters in the layer, and does the actual prediction is generate an activation.).
	Thus far, Chen does not explicitly teach an estimate relevance portion configured to generate a relevance value that identifies a relevance of at least one of the two or more filters in the layer based on the received input;
	Veit teaches an estimate relevance portion configured to generate a relevance value that identifies a relevance of [at least one of the two or more filters (Chen page 4, column 2, paragraph 2, line 1) – page 6 of office action.] in the layer based on the received input (Veit, Fig. 2., and page 4, paragraph 7, line 1  For the gate to be effective, it needs to address a few key challenges.  First to estimate the relevance of its layer, the gate needs to understand its input features.  To prevent mode collapse into trivial solutions that are independent of the input features, such as always or never executing a layer, we found it to be of key importance for the gate to be stochastic.  We achieve this adding noise to the estimated relevance.  Second, the gate needs to make a discrete decision, while still providing gradients for the relevance estimation.  We achieve this with the Gumbel-Max trick and its softmax relaxation.  Third, the gate needs to operate with low computational cost.  Figure 2 provides an overview of the two key components of the proposed gate.  The first one efficiently estimate the relevance of the respective layer for the current image.  The second component makes a discrete decision by sampling using Gumbel-Softmax [18,24].”

    PNG
    media_image2.png
    436
    950
    media_image2.png
    Greyscale

In other words, from Fig. 2, the portion of the diagram labeled “estimating relevance” is estimate relevance portion (See FIG. 6B of the instant application.) configured to generate a relevance value, and relevance estimation is relevance value.)
	Both Chen and Veit are directed to speeding up inference in convolutional neural networks.  Chen teaches using gates at the filter level to evaluate whether to activate filters in a layer, but does not explicitly teach determining whether filters are “relevant”.  Veit teaches using gates at the layer level and Gumbel-Softmax to determine whether a layer is “relevant” and should be executed.  In view of the teaching of Chen it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Veit into Chen.  This would result in applying Gumbel-Softmax at the filter level to estimate whether filters are relevant in order to further speed up inference.
	One of ordinary skill in the art would be motivated to do this because the ever-increasing size of convolutional neural networks has caused a corresponding increase in inference time, thus creating a need for improving speed. (Veit, page 1, paragraph 2, line 4 “To shed light on this, it is important to note that due to this success, ConvNets are used to classify increasingly large sets of visually diverse categories.  Thus, most parameters model high-level features that, in contrast to low-level and many mid-level concepts, cannot be broadly shared across categories.  As a result, the networks become larger and slower as the number of categories rises.  Moreover, for any given input image the number of computed features focusing on unrelated concepts increases.  What if, after identifying that an image contains a bird, a ConvNet could move directly to a layer that can distinguish different bird species, without executing intermediate layers that specialize in unrelated aspects?”)
Regarding claim 3,
	The combination of Chen and Veit teaches the method of claim 1, further comprising 	enforcing conditionality on the gating functionality components by back propagating a loss function to approximate a discrete decision of at least one of the gating functionality components with a continuous representation. (Chen, page 2, column 1, paragraph 3, line 10 “We propose a new framework for dynamic filter selection in CNNs.  The core of the idea is to introduce a dedicated gater network to take a glimpse of the input, and then generate input-dependent binary gates to select filters in the backbone network for processing the input. By using Improved SemHash, the gater network can be jointly trained with the backbone in an end-to-end fashion through back-propagation.” In other words, input-dependent binary gates to select filters is gate functionality, and jointly trained with the backbone in an end-to-end fashion through back-propagation is enforcing conditionality on the gating functionality by back propagating a loss function.)
Regarding claim 4,
	The combination of Chen and Veit teaches the method of claim 3,wherein 
	back propagating the loss function comprises performing Batch-wise conditional regularization operations to match batch-wise statistics of one or more of the gating functionality components to a prior distribution. (Chen, page 4, column 1, paragraph 3 “So far, one important question still remains unanswered: how to generate binary gates g from g’ such that we can back-propagate the error through the discrete gates to the gater?  In this paper, we adopt a method called Improved SemHash [17,18]. During training, we first draw noise from a c-dimensional Gaussian distribution with mean 0 and standard deviation 1.  The noise 
    PNG
    media_image3.png
    13
    10
    media_image3.png
    Greyscale
 is added to g’ to get a noisy version of the vector: 
    PNG
    media_image4.png
    24
    117
    media_image4.png
    Greyscale
 Two vectors are then computed from 
    PNG
    media_image5.png
    26
    22
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    28
    306
    media_image6.png
    Greyscale
 where 
    PNG
    media_image7.png
    18
    20
    media_image7.png
    Greyscale
 is the saturating sigmoid function [19, 16]: 
    PNG
    media_image8.png
    27
    380
    media_image8.png
    Greyscale

with 
    PNG
    media_image9.png
    13
    17
    media_image9.png
    Greyscale
 being the sigmoid function.”
In other words, c-dimensional Gaussian distribution is batch-wise statistics of prior distribution, g is the binary gating function derived from g’, and Improved SemHash is the back-propagation algorithm.)
Regarding claim 5,
	The combination of Chen and Veit teaches the method of claim 1, further comprising: 
	global average pooling the received input to generate a global average pooling result; and applying the global average pooling result to the gating functionality components of each filter to generate a binary value for each filter. (Chen, page 3, column 2, paragraph 5, line 3 “Similar to the backbone network, any existing CNN architectures can be used...” and page 3, column 2, paragraph 1, line 1 “Instead, the gater network processes the input to generate an input-dependent gating mask – a binary vector.  The vector is then used to dynamically select a particular subset of filters in the backbone network for the current input.” And, Equation 3. 

    PNG
    media_image10.png
    79
    488
    media_image10.png
    Greyscale

Here 
    PNG
    media_image11.png
    29
    22
    media_image11.png
    Greyscale
 is the entry in g corresponding to the i-th filter at layer l, and 0 is a 2-D feature map with all its elements being 0. That is, the i-th filter will be applied to Il(x) to extract features only when 
    PNG
    media_image11.png
    29
    22
    media_image11.png
    Greyscale
= 1.  If 
    PNG
    media_image11.png
    29
    22
    media_image11.png
    Greyscale
 = 0, the i-th filter is skipped and 0 is used as the output instead.”  Examiner notes that CNN architectures typically include global average pooling to reduce the spatial size of the representation to reduce the number of parameters and computation in the network. Therefore, “any existing CNN architectures can be used” means global average pooling is included. See FIG. 3A of instant specification. Chen explicitly discloses using a CNN architecture, among others.  Chen then describes generating an input-dependent gating mask- a binary vector.  This is applying the global average pooling result to the gating functionality components.  The gating mask is then applied to equation 3 which generates a binary value for each filter.)  
Regarding claim 6,
	The combination of Chen and Veit teaches the method of claim 5, further comprising: 
	identifying filters that may be ignored without impacting accuracy of the activation. (See mapping of claim 1. Chen teaches identifying filters that may be ignored without impacting accuracy of the activation.)
Regarding claim 7,
	The combination of Chen and Veit teaches the method of claim 1, 
wherein: 
receiving the input in the layer of the neural network comprises receiving the input in a convolution layer of a residual neural network (ResNet) (Chen, page 3, column 1, paragraph 4, line 1 “The backbone network is the main module of our model, which extracts features from input and makes the final prediction.  Any existing CNN architectures such as ResNet [10], Inception [28] and DenseNet [13] can be readily used as the backbone network in our GaterNet.” In other words, the backbone network layer receives input and ResNet architecture can be used as the backbone network.); and
	the method further comprises identifying filters associated with the convolution layer of the ResNet that may be ignored without impacting accuracy of the activation. (Chen, Figure 1, “The gater extracts features and generates sparse binary gates for selecting filters in the backbone network in an input-dependent manner.” And,  page 3, column 1, paragraph 3, line 3 “Given an input, the gater network decides the set of filters in the backbone network for use while the backbone network does the actual prediction.” And, page 3, column 1, paragraph 4, line 3 “Any existing CNN architectures such as ResNet [10], Inception [28] and DenseNet [13] can be readily used as the backbone network in our GaterNet.”  In other words, binary gates are two or more gating functionality components, decides the set of filters for use in the backbone network is deactivate filters that are determined not to be relevant to the received input or activate filters that are determined to be relevant to the received input associated with the convolution layer, CNN is neural network with convolution layers, prediction is generate an activation, and ResNet is ResNet.)
Regarding claim 8,
	The combination of Chen and Veit teaches the method of claim 7, 
	wherein receiving the input in the layer of the neural network comprises receiving a set of three-dimensional input feature maps that form a channel of input feature maps in a layer that includes two or more three- dimensional filters. (Chen, page 3, column 1, paragraph 5, line 1 “Let us first consider a standalone backbone CNN without the gater network.  Given an input image x, the output of the l-th convolutional layer is a 3-D feature map Ol(x). In a conventional CNN, Ol(x) is compute as:

    PNG
    media_image12.png
    35
    424
    media_image12.png
    Greyscale

where 
    PNG
    media_image13.png
    31
    61
    media_image13.png
    Greyscale
 is the i-th channel of feature map Ol(x), 
    PNG
    media_image14.png
    28
    26
    media_image14.png
    Greyscale
 is the i-th 3-D filter, Il(x) is the 3-D input feature map to the l-th layer, 
    PNG
    media_image15.png
    23
    15
    media_image15.png
    Greyscale
 denotes the element-wise nonlinear activation function, and * denotes convolution.” And page, 3, column 1, paragraph 3, line 3 “Given an input, the gater network decides the set of filters in the backbone network for use while the backbone network does the actual prediction.” In other words, 3-D input feature map is three-dimensional input feature map and set of filters is two or more three-dimensional filters.)
Regarding claim 9,
	The combination of Chen and Veit teaches the method of claim 8, 
	wherein applying the received input to active filters in the layer to generate an activation comprises: convolving the channel of input feature maps with one or more of the two or more three-dimensional filters to generate results; and summing the generated results to generate the activation of the convolution layer as a channel of output feature maps. (Chen, See mapping of claim 8, and Equation 1. In other words, equation 1 shows the
i-th channel of input feature maps is convolved with the i-th 3-D filter, l filters is two or more three-dimensional filters, and 
    PNG
    media_image13.png
    31
    61
    media_image13.png
    Greyscale
 is the summation of the generated results.)
Regarding claim 31,
	The combination of Chen and Veit teaches the method of claim 1, wherein:
	each of the two or more gating functionality components further comprise a straight- through Gumbel sampling portion configured to use the generated relevance value to determine whether to activate or deactivate the at least one of the two or more filters in the layer (Veit, See mapping of Claim 1, and Fig. 2.  In other words, from Fig. 2, straight-through Gumbel sampling is straight-through Gumbel sampling portion configured to determine whether to activate or deactivate at least one of the two or more filters in the layer.); and
	activating or deactivating the two or more filters in the layer based on outputs of the two or more gating functionality components comprises activating or deactivating the two or more filters in the layer based on outputs of the straight-through Gumbel sampling portions of the two or more gating functionality components (Veit, See mapping of claim 1, and Fig. 2.   In other words, from Fig. 2 description, second part decides whether to execute … given the estimated relevance is activating or deactivating the two or more filters based on outputs of the straight-through Gumbel sampling portions.).
Claims 10, 12-16, and 32 are computing device claims comprising: a processor configured with processor-executable instructions corresponding to method claims 1, 3-7, and 31, respectively.  Otherwise, they are the same.  It is implicit that a method for a neural network requires a computing device comprising a processor configured with processor-executable instructions in order to execute.  Therefore, claims 10, 12-16, and 32 are rejected for the same reasons as claims 1, 3-7, and 31, respectively.
Claims 17, 19-23, and 33 are non-transitory processor-readable storage medium claims corresponding to method claims 1, 3-7, and 31, respectively.  Otherwise, they are the same.  It is implicit that a method for a neural network requires one or more non-transitory processor-readable storage media in order to execute.  Therefore, claims 17, 19-23, and 33 are rejected for the same reasons as claims 1, 3-7, and 31, respectively.
Claims 24, 26-30, and 34 are computer device claims corresponding to method claims 1, 3-7, and 31, respectively.  Otherwise, they are the same.  Therefore, claims 24, 26-30, and 34 are rejected for the same reasons as claims 1, 3-7, and 31. respectively.

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145