DETAILED ACTION
Status of Claims
This action is in response to the application filed on 6/6/2019 for application 16/434,145. Claim 1 – 11 are pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/6/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawing 
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the recited essence claim limitation regarding, “an output section” must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered. 


Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
a computation section configured to… in Claim 1, 9, 11
a deletion section configured to … in Claim 1, 11
a second learning unit … configured to … in Claim 1, 11
a learning adjustment section configured to … in Claim 2
an output section configured to … in Claim 9

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claim 1 – 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1, 9 – 11 recites the limitation "the next level processing layer”. There is insufficient antecedent basis for this limitation in the claim. For the examination purpose, the term is interpreted as “processing layer at a next level” as recited earlier in the claim. 
All of the dependent claim of Claim 1 are rejected with the same reason as Claim 1.

Claim 2 – 4 recites the limitation "the adjustment value”. There is insufficient antecedent basis for this limitation in the claim or the depending claim. For the examination purpose, the term is interpreted as “the predetermined adjustment value” as recited earlier in the claim.

Claim 6 – 8 recites the limitations including "the plurality of processing layers” and “the plurality of respective processing layers”. There is insufficient antecedent basis for this limitation in the claim or the depending claim. For the examination purpose, the term is interpreted as “a plurality of levels of processing layers” as recited in the depending claim Claim 1. 



Claim 1, 9 – 11 recites the limitation "an attention layer for a neural network including … ”. This claim is deemed to be indefinite since it is not clear if the “including” is linking to  attention layer or neural network. one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  Further clarification is required. For the examination purpose, the including is interpreted to link to the neural network.

In addition, the following claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
a computation section configured to… in Claim 1, 9, 11
a deletion section configured to … in Claim 1, 11
a second learning unit … configured to … in Claim 1, 11
However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions.

   Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. For examination purposes, each of these elements have been interpreted as any structure configured to perform the claimed functions. 

Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 6 – 8, and 10 – 11 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, SCA-CNN Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, arXiv, 2017 in view of He, Channel Pruning for Accelerating Very Deep Neural Networks, arXiv, 2017, McCaffrey, Neural Network L1 Regularization Using Python, Visual Studio Magazine, 2017 and Talby, Lessons learned turning machine learning models into real products and services, 2018.

Regarding Claim 1, Chen discloses: A neural network … device (Chen, page. 4 col 2, para. 3 where we use Caffe for deep neural network evaluation; Caffe is a framework deployed on computer system with GPU [neural network load reduction device]) comprising: an attention module (Chen, fig. 2, lower diagram of attention processing module) including:
an attention layer (Chen, fig. 2, lower diagram of channel-wise attention model) for a neural network including a plurality of levels of processing layer that are connected together by a plurality of channels (Chen, fig. 2, where upper diagram of multi-layer [plurality of levels of processing layer] neural network,  each layer has a number of feature maps [plurality of channels] connected layer-wise), the attention layer being configured to compute an output feature value corresponding to each channel of a first number of channels based on an input feature value from each channel of the first number of channels in a predetermined processing layer and based on a parameter (Chen, fig. 2, where at layer l [predetermined processing layer]; eq. 7, where compute channel wise attention weight B=softmax(wi'b+bi') [feature value for each channel i], b=tanh((Wc*v + bc)+Whcht-1) [based on the input feature value v of each channel and based on parameter Wc] for each of the C channels [a first number of channels])
a computation section configured to multiply the input feature values by the output feature values and to output a computed result obtained to a processing layer at a next level from the predetermined processing layer (Chen, page. 4, col 2, ln. 1 - 3 & eq. 8, where Fc(.) [computed result] is a channel-wise multiplication for feature map channels and corresponding channel weights; fc(.) is send to the spatial attention [processing layer at a next level]  model);
Chen does not explicitly disclose:
a load reduction device
a first learning unit connected to the neural network and configured to perform learning processing on the parameter using an error backpropagation method in a state in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer;
 a channel selection section configured to select, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed by the attention layer after the learning processing has been performed and a predetermined threshold value;
a deletion section configured to change channels of the first number of channels into channels of a second number of channels by deleting the redundant channel from the predetermined processing layer;
and a second learning unit connected to the neural network and configured to perform learning processing on the neural network after the redundant channel has been deleted
He explicitly discloses:
a load reduction device (He, abs. ln. 1 – 2, where channel pruning [load reduction])
we solve this problem [with first learning unit] in two folds … solve W to reconstruction error [perform learning processing on the parameter]) … in a state in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer(He, page. 3, col. 2, para. 3, where our approach could be applied at inference time, i.e., performing learning process on parameter in a stage in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer)
a channel selection section configured to select, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed by the attention layer after the learning processing has been performed and a predetermined threshold value (He, page. 3, col. 1, para. 6, where in the optimization step 1 [channel selection section], we will ignore ith channel [redundant channel] if Bi=0 [satisfying a predetermined relationship between the output feature value B and a predetermined threshold value 0]);
a deletion section configured to change channels of the first number of channels into channels of a second number of channels by deleting the redundant channel from the predetermined processing layer (Chen, page. 3, col. 1, para. 1, where prune [delete] the input channel from c [first number of channels] to desired c’ [second number of channels]);
Chen and He both teach channel weight assignment and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen’s disclosure of channel wise attention by multiplying scale factor to each channel with He’s disclosure of pruning channels of low scale factor to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to accelerate deep neural network with little accuracy loss (He, abs. ln. 1 – 3 & 10 – 14).
Chen in view of He do not explicitly disclose: 
using an error backpropagation method
and a second learning unit connected to the neural network and configured to perform learning processing on the neural network after the redundant channel has been deleted
McCaffrey explicitly discloses:
a first learning unit connected to the neural network and configured to perform learning processing on the parameter using an error backpropagation method (McCaffrey, page. 1, para. 3, where you apply an optimization algorithm, typically back-propagation, to find weights and bias values that minimize some error metric between the computed output values and the correct output values)
Chen (in view of He) and McCaffrey both teach optimization of regression problem in neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen (in view of He)’s disclosure of optimizing linear regression with He’s disclosure of using back propagation to solve optimization of linear regression problem to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification as the combination yield predictable result.
Chen in view of He and McCaffrey do not explicitly disclose:
and a second learning unit connected to the neural network and configured to perform learning processing on the neural network after the redundant channel has been deleted
Talby explicitly discloses: 
and a second learning unit connected to the neural network and configured to perform learning processing on the neural network after the redundant channel has been deleted (Talby, page. 3, para. 4, where no matter what the model are predicting some level of retraining is necessary; i.e., during production inference and after the channel is pruned (He, page. 3, para. 3, ln. 7 – 8), the model still need 
Chen (in view of He and McCaffrey) and Talby both teach neural network in real production environment and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen (in view of He and McCaffrey)’s disclosure of operating neural network in production environment/inference mode with Talby’s disclosure of periodic retraining to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to provide accurate predictions and recommendations (Talby, page. 3, para. 2, ln. 7 – 8).

Regarding Claim 6, depending on Claim 1, Chen in view of He, McCaffrey and Talby discloses the device of Claim 1. Chen in view of He, McCaffrey and Talby further disclose:
the neural network load reduction device further includes one of each of the attention module, the channel selection section, and the deletion section corresponding to each of the plurality of processing layers (Chen, page. 2, col. 2, para. 5, ln. 6,  where SCA-CNN also incorporates the channel-wise attention at multiple layers [attention module corresponding to  each of the processing layers]; He, page. 3, col. 2, para. 4, where we apply our approach layer by layer sequentially [channel selection section, deletion section at each corresponding layers]); and the second learning unit is configured to perform learning processing on the neural network after the redundant channel has been deleted from each of the plurality of processing layers (Talby, page. 3, para. 4, where no matter what the model are predicting some level of retraining is necessary; i.e., during production inference and after the channel is pruned (He, page. 3, para. 3, ln. 7 – 8), the model still need regular retraining during production. The retraining is a different training [second learning unit], not the channel weight training). 


wherein the attention modules corresponding to the plurality of respective processing layers are configured so as to be common to some or all of the processing layers (Chen, page. 2, col. 2, para. 5, ln. 6,  where SCA-CNN also incorporates the channel-wise attention at multiple layers; eq. 1 – eq. 4, where the optimization problem is common among layers that incorporate the channel-wise attention).

Regarding Claim 8, depending on Claim 1, Chen in view of He, McCaffrey and Talby discloses the device of Claim 1. Chen in view of He, McCaffrey and Talby further disclose:
wherein the attention modules corresponding to the plurality of respective processing layers are configured so as to be different modules (He, fig. 3, & page. 4, col. 1, para. 1, where for the first layer, the challenge is that the large input feature map width can't be easily pruned … for the last layer, accumulated error from the shortcut is hard to be recovered … to address these challenges, we propose several variants of our approach; i.e., the pruning approach of different layers use different algorithm [different module]).

Regarding Claim 10, Chen discloses: A neural network … method comprising:
computing, for a neural network including a plurality of levels of processing layers that are connected together by a plurality of channels (Chen, fig. 2, where upper diagram of multi-layer [plurality of levels of processing layer] neural network,  each layer has a number of feature maps [plurality of channels] connected layer-wise), an output feature value corresponding to each channel of a first number of channels based on an input feature value from each channel of the first number of channels in a predetermined processing layer and based on a parameter (Chen, fig. 2, where at layer l [predetermined processing layer]; eq. 7, where compute channel wise attention weight B=softmax(wi'b+bi') [feature value for each channel i], b=tanh((Wc*v + bc)+Whcht-1) [based on the input feature value v of each channel and based on parameter Wc] for each of the C channels [a first number of channels]),
and multiplying the input feature values by the output feature values and outputting a computed result obtained to a processing layer at a next level from the predetermined processing layer (Chen, page. 4, col 2, ln. 1 - 3 & eq. 8, where Fc(.) is a channel-wise multiplication for feature map channels and corresponding channel weights; fc(.) is send to the spatial attention [processing layer at a next level]  model);
Chen does not explicitly disclose:
neural network load reduction method
connecting to the neural network and performing learning processing on the parameter using an error backpropagation method in a state in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer;
selecting, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed after the learning processing has been performed and a predetermined threshold value;
changing channels of the first number of channels into channels of a second number of channels by deleting the redundant channel from the predetermined processing layer;
and connecting to the neural network and performing learning processing on the neural network after the redundant channel has been deleted.
He explicitly discloses:
neural network load reduction method (He, abs. ln. 1 – 2, where channel pruning [load reduction])
we solve this problem in two folds … solve W to reconstruction error [perform learning processing on the parameter]) … in a state in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer (He, page. 3, col. 2, para. 3, where our approach could be applied at inference time, i.e., performing learning process on parameter in a stage in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer)
selecting, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed after the learning processing has been performed and a predetermined threshold value (He, page. 3, col. 1, para. 6, where in the optimization step 1, we will ignore ith channel [redundant channel] if Bi=0 [satisfying a predetermined relationship between the output feature value B and a predetermined threshold value 0]);
changing channels of the first number of channels into channels of a second number of channels by deleting the redundant channel from the predetermined processing layer (Chen, page. 3, col. 1, para. 1, where prune the input channel from c [first number of channels] to desired c’ [second number of channels]);
Chen and He both teach channel weight assignment and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen’s disclosure of channel wise attention by multiplying scale factor to each channel with He’s disclosure of pruning channels of low scale factor to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to accelerate deep neural network with little accuracy loss (He, abs. ln. 1 – 3 & 10 – 14).
Chen in view of He do not explicitly disclose: 
using an error backpropagation method
and connecting to the neural network and performing learning processing on the neural network after the redundant channel has been deleted.
McCaffrey explicitly discloses:
connecting to the neural network and performing learning processing on the parameter using an error backpropagation method (McCaffrey, page. 1, para. 3, where you apply an optimization algorithm, typically back-propagation, to find weights and bias values that minimize some error metric between the computed output values and the correct output values)
Chen (in view of He) and McCaffrey both teach optimization of regression problem in neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen (in view of He)’s disclosure of optimizing linear regression with He’s disclosure of using back propagation to solve optimization of linear regression problem to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification as the combination yield predictable result.
Chen in view of He and McCaffrey do not explicitly disclose:
and connecting to the neural network and performing learning processing on the neural network after the redundant channel has been deleted.
Talby explicitly discloses: 
and connecting to the neural network and performing learning processing on the neural network after the redundant channel has been deleted (Talby, page. 3, para. 4, where no matter what the model are predicting some level of retraining is necessary; i.e., during production inference and after the channel is pruned (He, page. 3, para. 3, ln. 7 – 8), the model still need regular retraining during production. The retraining is a different training [second learning unit], not the channel weight training)


Regarding Claim 11, Claim 11 is the corresponding non-transitory computer-readable storage medium claim of Claim 1. Chen further discloses: a non-transitory computer-readable storage medium storing a program that causes a computer to function as a neural network load reduction device (Chen, page. 4 col 2, para. 3 where we use Caffe for deep neural network evaluation; Caffe is a software framework deployed on computer system which stored in memory [non-transitory computer-readable storage medium] with instruction to perform neural network functions). Claim 11 is rejected with the same reason as Claim 1.

Claim 5 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, SCA-CNN Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, arXiv, 2017 in view of He, Channel Pruning for Accelerating Very Deep Neural Networks, arXiv, 2017, McCaffrey, Neural Network L1 Regularization Using Python, Visual Studio Magazine, 2017 and Talby, Lessons learned turning machine learning models into real products and services, 2018 further in view of Liu, Learning Efficient Convolutional Networks through Network Slimming, arXiv, 2017.


wherein the channel selection section is configured to select, as the redundant channel, a channel in which the output feature value is below the predetermined threshold value.
Liu explicitly discloses:
wherein the channel selection section is configured to select, as the redundant channel, a channel in which the output feature value is below the predetermined threshold value (Liu, page. 4, col. 2, para. 2, where we prune 70% channel with lower scaling factor [output feature value]. i.e., prune the channels that the scaling factor is lower than the scaling factor [threshold value] of the channel at 70% among all the channels).
Chen (in view of He, McCaffrey and Talby) and Liu both teach channel pruning technique in neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen (in view of He, McCaffrey and Talby)’s disclosure of channel pruning technique with Liu’s disclosure of pruning by a set threshold to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to set a target percentile (Liu, page 4, col. 2, para. 2, ln. 6 – 10).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Chen, SCA-CNN Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, arXiv, 2017 in view of He, Channel Pruning for Accelerating Very Deep Neural Networks, arXiv, 2017, and McCaffrey, Neural Network L1 Regularization Using Python, Visual Studio Magazine, 2017.

we use Caffe [information processing unit] for deep neural network evaluation) comprising: an attention module (Chen, fig. 2, lower diagram of attention processing module) including:
an attention layer (Chen, fig. 2, lower diagram of channel-wise attention model) for a neural network including a plurality of levels of processing layer that are connected together by a plurality of channels (Chen, fig. 2, where upper diagram of multi-layer [plurality of levels of processing layer] neural network,  each layer has a number of feature maps [plurality of channels] connected layer-wise), the attention layer being configured to compute an output feature value corresponding to each channel of a first number of channels based on an input feature value from each channel of the first number of channels in a predetermined processing layer and based on a parameter (Chen, fig. 2, where at layer l [predetermined processing layer]; eq. 7, where compute channel wise attention weight B=softmax(wi'b+bi') [feature value for each channel i], b=tanh((Wc*v + bc)+Whcht-1) [based on the input feature value v of each channel and based on parameter Wc,] for each of the C channels [a first number of channels] )
a computation section configured to multiply the input feature values by the output feature values and to output a computed result obtained to a processing layer at a next level from the predetermined processing layer (Chen, page. 4, col 2, ln. 1 - 3 & eq. 8, where Fc(.) is a channel-wise multiplication for feature map channels and corresponding channel weights; fc(.) is send to the spatial attention [processing layer at a next level]  model);
and an output section configured to perform output according to the output feature values computed by the attention layer after the learning processing has been performed (Chen, tbl 1 & fig. 2, where the model produce output [by output layer] after the training of attention is done).
Chen does not explicitly disclose:

 a channel selection section configured to select, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed by the attention layer after the learning processing has been performed and a predetermined threshold value;
He explicitly discloses:
a first learning unit connected to the neural network and configured to perform learning processing on the parameter (He, page. 3, col. 1, para. 5, where we solve this problem [with first learning unit] in two folds … solve W to reconstruction error [perform learning processing on the parameter]) … in a state in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer(He, page. 3, col. 2, para. 3, where our approach could be applied at inference time, i.e., performing learning process on parameter in a stage in which learning processing has been suspended at least for the predetermined processing layer and the next level processing layer)
a channel selection section configured to select, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed by the attention layer after the learning processing has been performed and a predetermined threshold value (He, page. 3, col. 1, para. 6, where in the optimization step 1 [channel selection section], we will ignore ith channel [redundant channel] if Bi=0 [satisfying a predetermined relationship between the output feature value B and a predetermined threshold value 0]);
Chen and He both teach channel weight assignment and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen’s disclosure of channel wise attention by multiplying scale factor to each 
Chen in view of He do not explicitly disclose: 
a first learning unit connected to the neural network and configured to perform learning processing on the parameter using an error backpropagation method
McCaffrey explicitly discloses:
a first learning unit connected to the neural network and configured to perform learning processing on the parameter using an error backpropagation method (McCaffrey, page. 1, para. 3, where you apply an optimization algorithm, typically back-propagation, to find weights and bias values that minimize some error metric between the computed output values and the correct output values)
Chen (in view of He) and McCaffrey both teach optimization of regression problem in neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Chen (in view of He)’s disclosure of optimizing linear regression with He’s disclosure of using back propagation to solve optimization of linear regression problem to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification as the combination yield predictable result.

Allowable Subject Matter
Claim 2 – 4 would be allowable if rewritten or amended to overcome the rejection under 35 U.S.C. 112(b) and in independent form including all of the limitations of the base claim and any intervening claims.


Closest prior art, Molchanov, Pruning Convolutional Neural Networks for Resource Efficient Inference, arXiv,2017, discloses a layer-wise normalization step to return “raw” values, whose scale varies with the depth of the parameter. However, the scaling does not depend on the number of channels. 
Claim 3 and 4  include  allowable subject matter for the same reason as pointed out with respect to  claim  2.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-




/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122