DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 11/07/2022. In the current amendments, claims 1, 2, 5, 6, 9, 11, 12, 14, 16, and 18-22 are amended. Claims 1-22 are pending and have been examined.
In response to amendments and remarks filed on 11/07/2022, the objection to claims 1-22 and the 35 U.S.C. 112(b) rejection to claims 5, 6, and 9 made in the previous Office Action have been withdrawn.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/05/2022 was filed after the mailing date of the Non-Final Rejection on 08/16/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“REVISITING DYNAMIC CONVOLUTION VIA MATRIX DECOMPOSITION”) in view of Duan et al. (“MADNN: A Multi-scale Attention Deep Neural Network for Arrhythmia Classification”) and further in view of Bello et al. (“Attention Augmented Convolutional Networks”).
Regarding Claim 1,
Li et al. teaches a relative attention mechanism that is configured to...determine a sum of a static convolution kernel with an adaptive attention matrix (Figure 1 and pg. 1 last full paragraph to pg. 2: 
    PNG
    media_image1.png
    628
    634
    media_image1.png
    Greyscale
teach a dynamic attention mechanism (corresponds to relative attention mechanism) that determines a sum of a static convolution kernel                         
                            
                                
                                    W
                                
                                
                                    0
                                
                            
                        
                     with a dynamic attention (corresponds to adaptive attention) matrix).
Li et al. does not appear to explicitly teach providing, by the computing system, the input data to a machine-learned convolutional attention network, the machine-learned convolutional attention network comprising two or more network stages, the two or more network stages comprising one or more attention stages and one or more convolutional stages,...in response to providing the input data to the machine-learned convolutional attention network, receiving, by the computing system, a machine-learning prediction from the machine-learned convolutional attention network.
However, Duan et al. teaches providing, by the computing system, the input data to a machine-learned convolutional attention network, the machine-learned convolutional attention network comprising two or more network stages, the two or more network stages comprising one or more attention stages and one or more convolutional stages (pg. 3 Section 2.4: “Overall, MADNN consisted of a stem module in ResNeXt, four modified multi-scale attention modules attached one by another, a global averaged pooling layer, and a fully-connected output layer (Figure 3)” and Fig. 3:

    PNG
    media_image2.png
    689
    487
    media_image2.png
    Greyscale


teach the Multi-scale Attention Deep Neural Network (MADNN) architecture (corresponds to computing system) wherein inputs are provided to the MADNN (corresponds to a machine-learned convolutional attention network), and the MADNN comprises of at least two network stages that include four multi-scale attention modules (correspond to attention stages) and a stem module in ResNeXt (corresponds to convolutional stage); pg. 3 Section 2.4: “The kernel size was changed to 3 from 7 in the stem convolutional layers in comparison to ResNeXt” teaches the stem in Fig. 3 refers to processing in stem convolutional layers; pg. 3 Section 2.5 teaches training the MADNN to produce a trained MADNN),
...in response to providing the input data to the machine-learned convolutional attention network, receiving, by the computing system, a machine-learning prediction from the machine-learned convolutional attention network (Fig. 3 and pg. 4 Section 3: “MADNN without ensemble learning was tested on the hidden datasets in the PhysioNet/Computing in Cardiology Challenge 2020. Scores were calculated according to a new scoring metric that awards partial credit to misdiagnoses as cardiologists [10]. The MADNN officially achieved a validation score of 0.446. It was also tested on a full hidden test set and officially achieved a score of 0.236” teach the MADNN (corresponds to a machine-learned convolutional attention network) of the MADNN architecture (corresponds to computing system) provides a machine-learning prediction output from the MADNN in response to inputs).
Li et al. and Duan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Duan et al. to the disclosed invention of Li et al. 
One of ordinary skill in the arts would have been motivated to make this modification “to design a novel DNN that each type of attention module can be optimized separately with diverse attention weights to the outputs of different convolutional kernels. Thus, we proposed MADNN: a network sequentially combines kernel-wise attention modules in SENet and kernel-selective modules in SKNet” (Duan et al. pg. 1 Section 1).
Li et al. in view of Duan et al. does not appear to explicitly teach A computer-implemented method for performing computer vision with reduced computational cost and improved accuracy, the computer-implemented method comprising: obtaining, by a computing system comprising one or more computing devices, input data comprising an input tensor having one or more dimensions;...wherein at least one of the one or more attention stages comprise a relative attention mechanism that is configured to perform a Softmax normalization.
However, Bello et al. teaches A computer-implemented method for performing computer vision with reduced computational cost and improved accuracy, the computer-implemented method comprising (pg. 3289 Section 4.2: “Table 2. Image classification performance of different attention mechanisms on the ImageNet dataset. Δ refers to the increase in latency times compared to the ResNet50 on a single Tesla V100 GPU with Tensorflow using a batch size of 128” teaches GPU-based implementation (corresponds to computer-implemented) to perform image classification (corresponds to computer vision task); Table 2 and pg. 3289 Section 4.2: “Attention Augmentation offers a competitive accuracy/computational trade-off compared to previously proposed attention mechanisms” teach latency time increase associated with the reference’s method (Attention Augmented (AA) Convolutional Networks) is reduced (corresponds to reduced computational cost) in comparison to other methods; pg. 3290 Section 4.3: “Our experiments show that Attention Augmentation yields accuracy improvements across all width multipliers” teaches the Attention Augmented (AA) Convolutional Networks method has improved accuracy):
obtaining, by a computing system comprising one or more computing devices, input data comprising an input tensor having one or more dimensions (Figure 2 and pg. 3287 Section 3 and 3.1: “We now formally describe our proposed Attention Augmentation method. We use the following naming conventions: H,W and Fin refer to the height, width and number of input filters of an activation map...Given an input tensor of shape (H, W, Fin)” teach obtaining input data comprising an input tensor having multiple dimensions; pg. 3289 Section 4.2: “Table 2. Image classification performance of different attention mechanisms on the ImageNet dataset. Δ refers to the increase in latency times compared to the ResNet50 on a single Tesla V100 GPU with Tensorflow using a batch size of 128” teaches computing system with a GPU (computing device));...
wherein at least one of the one or more attention stages comprise a relative attention mechanism that is configured to perform a Softmax normalization (pg. 3286 first full paragraph: “We develop a novel two-dimensional relative self-attention mechanism...that maintains translation equivariance while being infused with relative position information, making it well suited for images” teaches the attention stage comprises a relative attention mechanism; pg. 3287 Section 3.1:
    PNG
    media_image3.png
    176
    559
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    236
    547
    media_image4.png
    Greyscale
 teaches the relative self-attention mechanism that is infused with relative position information (corresponds to relative attention mechanism) performs softmax normalization).
Li et al., Duan et al., and Bello et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Bello et al. to the disclosed invention of Li et al. in view of Duan et al.
One of ordinary skill in the arts would have been motivated to make this modification because “Attention Augmentation offers a competitive accuracy/computational trade-off compared to previously proposed attention mechanisms” and “Attention Augmentation yields accuracy improvements across all width multipliers” (Bello et al. pg. 3289 Section 4.2 & pg. 3290 Sections 4.3).
Regarding Claim 2,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 1.
Duan et al. further teaches wherein the two or more network stages comprise an S0 stage, an S1 stage, an S2 stage, an S3 stage, and an S4 stage (pg. 3 Section 2.4: “Overall, MADNN consisted of a stem module in ResNeXt, four modified multi-scale attention modules attached one by another, a global averaged pooling layer, and a fully-connected output layer (Figure 3)” and Fig. 3:

    PNG
    media_image2.png
    689
    487
    media_image2.png
    Greyscale


teach the MADNN comprises of at least two network stages that include a stem module in ResNeXt (corresponds to S0 stage) and four multi-scale attention modules (correspond to an S1 stage, an S2 stage, an S3 stage, and an S4 stage)).
Li et al. and Duan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Duan et al. to the disclosed invention of Li et al. 
One of ordinary skill in the arts would have been motivated to make this modification “to design a novel DNN that each type of attention module can be optimized separately with diverse attention weights to the outputs of different convolutional kernels. Thus, we proposed MADNN: a network sequentially combines kernel-wise attention modules in SENet and kernel-selective modules in SKNet” (Duan et al. pg. 1 Section 1).
Regarding Claim 3,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 2.
Duan et al. further teaches wherein the S0 stage comprises a two-layer convolutional stem network (pg. 3 Section 2.4: “Overall, MADNN consisted of a stem module in ResNeXt, four modified multi-scale attention modules attached one by another, a global averaged pooling layer, and a fully-connected output layer (Figure 3)” and Fig. 3:

    PNG
    media_image2.png
    689
    487
    media_image2.png
    Greyscale


and pg. 3 Section 2.4: “The kernel size was changed to 3 from 7 in the stem convolutional layers in comparison to ResNeXt” teach the stem in Fig. 3 refers to processing in stem convolutional layers, thus rendering the S0 stage of “stem module in ResNeXt” comprises at least a two-layer convolutional stem network).
Li et al. and Duan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Duan et al. to the disclosed invention of Li et al. 
One of ordinary skill in the arts would have been motivated to make this modification “to design a novel DNN that each type of attention module can be optimized separately with diverse attention weights to the outputs of different convolutional kernels. Thus, we proposed MADNN: a network sequentially combines kernel-wise attention modules in SENet and kernel-selective modules in SKNet” (Duan et al. pg. 1 Section 1).
Regarding Claim 13,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 1.
Bello et al. further teaches wherein the machine-learning prediction comprises a computer vision output (pg. 3289 Section 4.2: “Table 2. Image classification performance of different attention mechanisms on the ImageNet dataset. Δ refers to the increase in latency times compared to the ResNet50 on a single Tesla V100 GPU with Tensorflow using a batch size of 128” teaches the machine-learned prediction comprises image classification output (corresponds to computer vision output)).
Li et al., Duan et al., and Bello et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Bello et al. to the disclosed invention of Li et al. in view of Duan et al.
One of ordinary skill in the arts would have been motivated to make this modification because “Attention Augmentation offers a competitive accuracy/computational trade-off compared to previously proposed attention mechanisms” and “Attention Augmentation yields accuracy improvements across all width multipliers” (Bello et al. pg. 3289 Section 4.2 & pg. 3290 Sections 4.3).
Regarding Claim 14,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 1.
Bello et al. further teaches wherein the machine-learning prediction comprises a classification output (pg. 3289 Section 4.2: “Table 2. Image classification performance of different attention mechanisms on the ImageNet dataset. Δ refers to the increase in latency times compared to the ResNet50 on a single Tesla V100 GPU with Tensorflow using a batch size of 128” teaches the machine-learned prediction comprises image classification output).
Li et al., Duan et al., and Bello et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Bello et al. to the disclosed invention of Li et al. in view of Duan et al.
One of ordinary skill in the arts would have been motivated to make this modification because “Attention Augmentation offers a competitive accuracy/computational trade-off compared to previously proposed attention mechanisms” and “Attention Augmentation yields accuracy improvements across all width multipliers” (Bello et al. pg. 3289 Section 4.2 & pg. 3290 Sections 4.3).
Regarding Claim 15,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 1.
Duan et al. further teaches wherein the one or more convolutional stages are sequentially prior to the one or more attention stages in the two or more network stages (Fig. 3:

    PNG
    media_image2.png
    689
    487
    media_image2.png
    Greyscale


teach the MADNN comprises of at least two network stages wherein a stem module in ResNeXt (corresponds to convolutional stages) is sequentially prior to the four multi-scale attention modules (correspond to attention stages)).
Li et al. and Duan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Duan et al. to the disclosed invention of Li et al. 
One of ordinary skill in the arts would have been motivated to make this modification “to design a novel DNN that each type of attention module can be optimized separately with diverse attention weights to the outputs of different convolutional kernels. Thus, we proposed MADNN: a network sequentially combines kernel-wise attention modules in SENet and kernel-selective modules in SKNet” (Duan et al. pg. 1 Section 1).
Regarding Claim 16,
Li et al. teaches a relative attention mechanism, the relative attention mechanism configured to...determine a sum of a static convolution kernel with an adaptive attention matrix (Figure 1 and pg. 1 last full paragraph to pg. 2: 
    PNG
    media_image1.png
    628
    634
    media_image1.png
    Greyscale
teach a dynamic attention mechanism (corresponds to relative attention mechanism) that determines a sum of a static convolution kernel                         
                            
                                
                                    W
                                
                                
                                    0
                                
                            
                        
                     with a dynamic attention (corresponds to adaptive attention) matrix).
Li et al. does not appear to explicitly teach providing, by the computing system, the input data to a machine-learned convolutional attention network, the machine-learned convolutional attention network comprising: a downsampling stage configured to reduce a spatial resolution relative to the input...; and one or more attention blocks comprising,...in response to providing the input data to the machine-learned convolutional attention network, receiving, by the computing system, a machine-learning prediction from the machine-learned convolutional attention network.
However, Duan et al. teaches providing, by the computing system, the input data to a machine-learned convolutional attention network, the machine-learned convolutional attention network comprising: a downsampling stage configured to reduce a spatial resolution relative to the input...; and one or more attention blocks comprising (pg. 3 Section 2.4: “Overall, MADNN consisted of a stem module in ResNeXt, four modified multi-scale attention modules attached one by another, a global averaged pooling layer, and a fully-connected output layer (Figure 3)” and Fig. 3:

    PNG
    media_image2.png
    689
    487
    media_image2.png
    Greyscale

teach the Multi-scale Attention Deep Neural Network (MADNN) architecture (corresponds to computing system) wherein inputs are provided to the MADNN (corresponds to a machine-learned convolutional attention network), and the MADNN comprises of at least two network stages that include four multi-scale attention modules (correspond to attention blocks) and a stem module in ResNeXt (corresponds to convolutional stage); pg. 3 Section 2.4: “For the purpose of extracting features from 1-dimensional signals, the original 2-dimensional CNNs in SENet and SKNet were modified to 1-dimensional CNNs correspondingly. Inspired by ResNeXt, each branch of convolutional layers in the proposed multi-scale attention module shared the same kernel size. The kernel size was changed to 3 from 7 in the stem convolutional layers in comparison to ResNeXt. Applying convolutional layers with large kernel size will supress the features of high-frequency in ECG signals” teaches the stem module in ResNeXt performs downsampling of the input by suppressing certain features in the input data (corresponds to reduce a spatial resolution relative to the input); pg. 3 Section 2.5 teaches training the MADNN to produce a trained MADNN)...
in response to providing the input data to the machine-learned convolutional attention network, receiving, by the computing system, a machine-learning prediction from the machine-learned convolutional attention network (Fig. 3 and pg. 4 Section 3: “MADNN without ensemble learning was tested on the hidden datasets in the PhysioNet/Computing in Cardiology Challenge 2020. Scores were calculated according to a new scoring metric that awards partial credit to misdiagnoses as cardiologists [10]. The MADNN officially achieved a validation score of 0.446. It was also tested on a full hidden test set and officially achieved a score of 0.236” teach the MADNN (corresponds to a machine-learned convolutional attention network) of the MADNN architecture (corresponds to computing system) provides a machine-learning prediction output from the MADNN in response to inputs).
Li et al. and Duan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Duan et al. to the disclosed invention of Li et al. 
One of ordinary skill in the arts would have been motivated to make this modification “to design a novel DNN that each type of attention module can be optimized separately with diverse attention weights to the outputs of different convolutional kernels. Thus, we proposed MADNN: a network sequentially combines kernel-wise attention modules in SENet and kernel-selective modules in SKNet” (Duan et al. pg. 1 Section 1).
Li et al. in view of Duan et al. does not appear to explicitly teach A computer-implemented method for performing computer vision with reduced computational cost and improved accuracy, the computer-implemented method comprising: obtaining, by a computing system comprising one or more computing devices, input data comprising an input tensor having one or more dimensions;...input tensor...one or more attention blocks comprising a relative attention mechanism, the relative attention mechanism configured to perform a Softmax normalization.
However, Bello et al. teaches A computer-implemented method for performing computer vision with reduced computational cost and improved accuracy, the computer-implemented method comprising (pg. 3289 Section 4.2: “Table 2. Image classification performance of different attention mechanisms on the ImageNet dataset. Δ refers to the increase in latency times compared to the ResNet50 on a single Tesla V100 GPU with Tensorflow using a batch size of 128” teaches GPU-based implementation (corresponds to computer-implemented) to perform image classification (corresponds to computer vision task); Table 2 and pg. 3289 Section 4.2: “Attention Augmentation offers a competitive accuracy/computational trade-off compared to previously proposed attention mechanisms” teach latency time increase associated with the reference’s method (Attention Augmented (AA) Convolutional Networks) is reduced (corresponds to reduced computational cost) in comparison to other methods; pg. 3290 Section 4.3: “Our experiments show that Attention Augmentation yields accuracy improvements across all width multipliers” teaches the Attention Augmented (AA) Convolutional Networks method has improved accuracy):
obtaining, by a computing system comprising one or more computing devices, input data comprising an input tensor having one or more dimensions;...input tensor (Figure 2 and pg. 3287 Section 3 and 3.1: “We now formally describe our proposed Attention Augmentation method. We use the following naming conventions: H,W and Fin refer to the height, width and number of input filters of an activation map...Given an input tensor of shape (H, W, Fin)” teach obtaining input data comprising an input tensor having multiple dimensions; pg. 3289 Section 4.2: “Table 2. Image classification performance of different attention mechanisms on the ImageNet dataset. Δ refers to the increase in latency times compared to the ResNet50 on a single Tesla V100 GPU with Tensorflow using a batch size of 128” teaches computing system with a GPU (computing device))...
one or more attention blocks comprising a relative attention mechanism, the relative attention mechanism configured to perform a Softmax normalization (pg. 3286 first full paragraph: “We develop a novel two-dimensional relative self-attention mechanism...that maintains translation equivariance while being infused with relative position information, making it well suited for images” teaches the attention block comprises a relative attention mechanism; pg. 3287 Section 3.1:
    PNG
    media_image3.png
    176
    559
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    236
    547
    media_image4.png
    Greyscale
 teaches the relative self-attention mechanism that is infused with relative position information (corresponds to relative attention mechanism) performs softmax normalization).
Li et al., Duan et al., and Bello et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Bello et al. to the disclosed invention of Li et al. in view of Duan et al.
One of ordinary skill in the arts would have been motivated to make this modification because “Attention Augmentation offers a competitive accuracy/computational trade-off compared to previously proposed attention mechanisms” and “Attention Augmentation yields accuracy improvements across all width multipliers” (Bello et al. pg. 3289 Section 4.2 & pg. 3290 Sections 4.3).
Regarding Claim 17,
Li et al. in view of Duan et al. in view of Bello et al. teaches the computer-implemented method of claim 16.
Duan et al. further teaches wherein the downsampling stage comprises a convolution stem (pg. 3 Section 2.4: “For the purpose of extracting features from 1-dimensional signals, the original 2-dimensional CNNs in SENet and SKNet were modified to 1-dimensional CNNs correspondingly. Inspired by ResNeXt, each branch of convolutional layers in the proposed multi-scale attention module shared the same kernel size. The kernel size was changed to 3 from 7 in the stem convolutional layers in comparison to ResNeXt. Applying convolutional layers with large kernel size will supress the features of high-frequency in ECG signals” teaches the stem module in ResNeXt  (corresponds to downsampling stage) comprises stem convolutional layers (correspond to convolution stem)).
Li et al. and Duan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Duan et al. to the disclosed invention of Li et al. 
One of ordinary skill in the arts would have been motivated to make this modification “to design a novel DNN that each type of attention module can be optimized separately with diverse attention weights to the outputs of different convolutional kernels. Thus, we proposed MADNN: a network sequentially combines kernel-wise attention modules in SENet and kernel-selective modules in SKNet” (Duan et al. pg. 1 Section 1).

Claims 4, 6, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“REVISITING DYNAMIC CONVOLUTION VIA MATRIX DECOMPOSITION”) in view of Duan et al. (“MADNN: A Multi-scale Attention Deep Neural Network for Arrhythmia Classification”) in view of Bello et al. (“Attention Augmented Convolutional Networks”) and further in view of Su et al. (“Concrete Cracks Detection Using Convolutional Neural Network Based on Transfer Learning”).
Regarding Claim 4,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 2.
Li et al. in view of Duan et al. in view of Bello et al. does not appear to explicitly teach wherein the S1 stage comprises one or more convolutional blocks with squeeze excitation.
However, Su et al. teaches wherein the S1 stage comprises one or more convolutional blocks with squeeze excitation (Fig. 4 teaches a neural network architecture with multiple MBConv blocks in various stages of the architecture (including a stage S1); Fig. 5 and pg. 4 first full paragraph: “The core component of the network is a mobile inverted bottleneck convolution module (MBConv). Figure 5 shows the framework of this module. The design of this module is inspired by inverted residual and residual structure. Before performing on 3 × 3 or 5 × 5 convolution, the dimension of images is increased via 1 × 1 convolution in order to extract more feature information. The Squeeze-and-Excitation (SE)...model is added after 3 × 3 or 5 × 5 convolution operation to further improve performance. Finally, 1 × 1 convolution operation is used to reduce the dimension, and a residual connection is added” teach the mobile inverted bottleneck convolution module (MBConv) (corresponds to one or more convolutional blocks) is implemented with squeeze excitation).
Li et al., Duan et al., Bello et al., and Su et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Su et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al.
One of ordinary skill in the arts would have been motivated to make this modification because “The core component of the network is a mobile inverted bottleneck convolution module (MBConv)...Before performing on 3 × 3 or 5 × 5 convolution, the dimension of images is increased via 1 × 1 convolution in order to extract more feature information. The Squeeze-and-Excitation (SE)...model is added after 3 × 3 or 5 × 5 convolution operation to further improve performance” (Su et al. pg. 4 first full paragraph).
Regarding Claim 6,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 2.
Li et al. in view of Duan et al. in view of Bello et al. does not appear to explicitly teach wherein each of the S2 stage, S3 stage, or S4 stage comprising a convolutional stage comprise a mobile inverted bottleneck convolution (MBConv) block.
However, Su et al. teaches wherein each of the S2 stage, S3 stage, or S4 stage comprising a convolutional stage comprise a mobile inverted bottleneck convolution (MBConv) block (Fig. 4 teaches a neural network architecture with multiple MBConv blocks in various stages of the architecture including multiple consecutive “MBConv1, 3 × 3” blocks and “MBConv6, 5 × 5” blocks wherein each of these blocks can be considered a convolutional stage comprising a MBConv block in an architecture with more than 5 stages).
Li et al., Duan et al., Bello et al., and Su et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Su et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al.
One of ordinary skill in the arts would have been motivated to make this modification because “The core component of the network is a mobile inverted bottleneck convolution module (MBConv)...Before performing on 3 × 3 or 5 × 5 convolution, the dimension of images is increased via 1 × 1 convolution in order to extract more feature information. The Squeeze-and-Excitation (SE)...model is added after 3 × 3 or 5 × 5 convolution operation to further improve performance” (Su et al. pg. 4 first full paragraph).

Regarding Claim 9,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 2.
Li et al. in view of Duan et al. in view of Bello et al. does not appear to explicitly teach wherein each of the S0 stage, the S1 stage, and the S4 stage comprises two blocks, and wherein the S2 stage and the S3 stage each comprise greater than two blocks.
However, Su et al. teaches wherein each of the S0 stage, the S1 stage, and the S4 stage comprises two blocks, and wherein the S2 stage and the S3 stage each comprise greater than two blocks (Fig. 4 teaches a neural network architecture with multiple MBConv blocks in various stages of the architecture wherein each of the two consecutive “MBConv6, 5 × 5” blocks can be considered the S0 stage, the S1 stage, and the S5 stage, and wherein each of the three consecutive “MBConv6, 3 × 3” blocks can be considered the S2 stage and the S3 stage).
Li et al., Duan et al., Bello et al., and Su et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Su et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al.
One of ordinary skill in the arts would have been motivated to make this modification because “The core component of the network is a mobile inverted bottleneck convolution module (MBConv)...Before performing on 3 × 3 or 5 × 5 convolution, the dimension of images is increased via 1 × 1 convolution in order to extract more feature information. The Squeeze-and-Excitation (SE)...model is added after 3 × 3 or 5 × 5 convolution operation to further improve performance” (Su et al. pg. 4 first full paragraph).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“REVISITING DYNAMIC CONVOLUTION VIA MATRIX DECOMPOSITION”) in view of Duan et al. (“MADNN: A Multi-scale Attention Deep Neural Network for Arrhythmia Classification”) in view of Bello et al. (“Attention Augmented Convolutional Networks”) in view of Su et al. (“Concrete Cracks Detection Using Convolutional Neural Network Based on Transfer Learning”) and further in view of Pendse et al. (“Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation”).
Regarding Claim 5,
Li et al. in view of Duan et al. in view of Bello et al. in view of Su et al. teaches computer-implemented method of claim 4.
Su et al. further teaches wherein the one or more convolutional blocks of the S1 stage comprise mobile inverted bottleneck convolution (MBConv) blocks, the MBConv blocks configured to expand channel size from an original channel size of an input to the one or more convolutional blocks (Fig. 4 teaches a neural network architecture with multiple MBConv blocks in various stages of the architecture (including a stage S1); Fig. 5 and pg. 4 first full paragraph: “The core component of the network is a mobile inverted bottleneck convolution module (MBConv). Figure 5 shows the framework of this module. The design of this module is inspired by inverted residual and residual structure. Before performing on 3 × 3 or 5 × 5 convolution, the dimension of images is increased via 1 × 1 convolution in order to extract more feature information. The Squeeze-and-Excitation (SE)...model is added after 3 × 3 or 5 × 5 convolution operation to further improve performance. Finally, 1 × 1 convolution operation is used to reduce the dimension, and a residual connection is added” teach the MBConv blocks increases (expands) the channel size from an original channel size of the input wherein “the dimension of images is increased via 1 × 1 convolution in order to extract more feature information”).
Li et al., Duan et al., Bello et al., and Su et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Su et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al.
One of ordinary skill in the arts would have been motivated to make this modification because “The core component of the network is a mobile inverted bottleneck convolution module (MBConv)...Before performing on 3 × 3 or 5 × 5 convolution, the dimension of images is increased via 1 × 1 convolution in order to extract more feature information. The Squeeze-and-Excitation (SE)...model is added after 3 × 3 or 5 × 5 convolution operation to further improve performance” (Su et al. pg. 4 first full paragraph).
Li et al. in view of Duan et al. in view of Bello et al. in view of Su et al. does not appear to explicitly teach and subsequently project the expanded channel size back to the original channel size.
However, Pendse et al. teaches and subsequently project the expanded channel size back to the original channel size (Fig. 3 and pg. 393 Section 2.3: “Our architecture (Fig. 3) consists of a U-Net with multiple levels of contraction in the encoder (through 2 × 2 × 2 max pooling) and the same number of levels of expansion in the decoder (through trilinear interpolation for upsampling instead of transposed convolutions as was shown to be preferable in [6]). Each level consists of two convolutional blocks. In the encoder, the first block is a pointwise convolution that increases the number of channels and the second block is a reversible block where each of the components (F and G in Fig. 1) is a MBConvBlock with half the number of channels” teach the MBConv blocks increasing (expanding) number of channels and subsequently reversing the expansion to the original channel size).
Li et al., Duan et al., Bello et al., Su et al., and Pendse et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Pendse et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al. in view of Su et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “benefits of replacing a standard convolutional block with a MobileNet inverted residual with linear bottlneck block inside the reversible block of the encoder. This more parameter efficient MBConvBlock results in faster convergence while still fitting in a 16GB GPU” (Pendse et al. pg. 395 Section 4 to pg. 396).

Claims 7 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“REVISITING DYNAMIC CONVOLUTION VIA MATRIX DECOMPOSITION”) in view of Duan et al. (“MADNN: A Multi-scale Attention Deep Neural Network for Arrhythmia Classification”) in view of Bello et al. (“Attention Augmented Convolutional Networks”) and further in view of Pendse et al. (“Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation”).
Regarding Claim 7,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 2.
Li et al. in view of Duan et al. in view of Bello et al. does not appear to explicitly teach wherein a number of channels is doubled for at least one of the S1 stage, the S2 stage, the S3 stage, or the S4 stage.
However, Pendse et al. teaches wherein a number of channels is doubled for at least one of the S1 stage, the S2 stage, the S3 stage, or the S4 stage (Fig. 3 teaches a number of channels is doubled for multiple levels (stages) of MBConv blocks in the encoder (for example, from 60 to 120) wherein the architecture has at least five levels of MBConv blocks (correspond to at least one of the S1 stage, the S2 stage, the S3 stage, or the S4 stage).
Li et al., Duan et al., Bello et al., and Pendse et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Pendse et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al. 
One of ordinary skill in the arts would have been motivated to make this modification to leverage “benefits of replacing a standard convolutional block with a MobileNet inverted residual with linear bottlneck block inside the reversible block of the encoder. This more parameter efficient MBConvBlock results in faster convergence while still fitting in a 16GB GPU” (Pendse et al. pg. 395 Section 4 to pg. 396).
Regarding Claim 10,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 1.
Li et al. in view of Duan et al. in view of Bello et al. does not appear to explicitly teach wherein a spatial resolution gradually decreases over the two or more network stages.
However, Pendse et al. teaches wherein a spatial resolution gradually decreases over the two or more network stages (Fig. 3 and caption: “Our reversible U-Net architecture with MBConv blocks in the encoder and regular convolutional blocks in the decoder. The downsampling and upsampling stages are depicted by red and yellow arrows, respectively” and pg. 393 Section 2.3: “Our architecture (Fig. 3) consists of a U-Net with multiple levels of contraction in the encoder (through 2 × 2 × 2 max pooling) and the same number of levels of expansion in the decoder” teach the encoder performs downsampling gradually over multiple levels (correspond to at least two stages) of MBConv blocks by using max pooling to reduce the spatial resolution of input images; pg. 394 first paragraph teaches images are used as input data).
Li et al., Duan et al., Bello et al., and Pendse et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Pendse et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al. 
One of ordinary skill in the arts would have been motivated to make this modification to leverage “benefits of replacing a standard convolutional block with a MobileNet inverted residual with linear bottlneck block inside the reversible block of the encoder. This more parameter efficient MBConvBlock results in faster convergence while still fitting in a 16GB GPU” (Pendse et al. pg. 395 Section 4 to pg. 396).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“REVISITING DYNAMIC CONVOLUTION VIA MATRIX DECOMPOSITION”) in view of Duan et al. (“MADNN: A Multi-scale Attention Deep Neural Network for Arrhythmia Classification”) in view of Bello et al. (“Attention Augmented Convolutional Networks”) and further in view of Tan et al. (“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”).
Regarding Claim 8,
Li et al. in view of Duan et al. in view of Bello et al. teaches computer-implemented method of claim 2.
Li et al. in view of Duan et al. in view of Bello et al. does not appear to explicitly teach wherein a width of the S0 stage is less than or equal to a width of the S1 stage.
However, Tan et al. teaches wherein a width of the S0 stage is less than or equal to a width of the S1 stage (Fig. 2(e):

    PNG
    media_image5.png
    438
    228
    media_image5.png
    Greyscale

teaches width of the two lowest blocks (correspond to S0 stage) is less than or equal to the width of the rest of the blocks (correspond to S1 stage)).
Li et al., Duan et al., Bello et al., and Tan et al. are analogous art to the claimed invention because they are directed to data processing using convolutional neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate limitation(s) above as taught by Tan et al. to the disclosed invention of Li et al. in view of Duan et al. in view of Bello et al. 
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a simple yet effective compound scaling method” that “uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients” (Tan et al. pg. 1 last paragraph to pg. 2).

Response to Arguments
Applicant's arguments filed on 11/07/2022 with respect to the 35 U.S.C. 103 rejection to claims 1-10 and 13-17 have been fully considered but they are not persuasive. 
Regarding claims 1 and 16, Applicant asserts “On allowing dependent claims 11 and 19 (see page 35 of the Action), the Office appears to admit that the references do not disclose, or suggest, "wherein the sum of the static convolution kernel with the adaptive attention matrix is applied prior to a SoftMax normalization by the relative attention mechanism." Furthermore, Applicant respectfully submits that the references fail to disclose applying the sum of the static convolution kernel with the adaptive attention matrix subsequent to a SoftMax normalization by the relative attention mechanism. Accordingly, the references fail to disclose, or suggest, suggests "a relative attention mechanism that is configured to perform a Softmax normalization and determine a sum of a static convolution kernel with an adaptive attention matrix."” (Remarks, pg. 11).
Examiner’s Response:
The Examiner respectfully disagrees. As pointed out by Applicant’s remarks, the limitation in claim 1 and 16 is “a relative attention mechanism that is configured to perform a Softmax normalization and determine a sum of a static convolution kernel with an adaptive attention matrix”. The scope of this limitation is different from that of original claims 11 and 19 as well as amended claims 11 and 19, which specify an order of operation with the recitation of “...applied prior to...”. Therefore, indication of allowance of claims 11 and 19 do not affect the rejection of claims 1 and 16. For similar reasons, the recitation of “...applied subsequent to...” in claims 12 and 18 specifies an order of operation, thus rendering the scope of those claims to be different from that of claims 1 and 16. 

Regarding claims 1 and 16, Applicant asserts “At most, the Bello reference discusses augmenting convolutions with a self-attention mechanism (see Section 1 of Bello)...However, note that equation (1) does not include "a sum of a static convolution kernel," as recited by amended claim 1... Accordingly, equation (1) of Bello does not appear to disclose, or suggest, "a relative attention mechanism that is configured to perform a Softmax normalization and determine a sum of a static convolution kernel with an adaptive attention matrix."”; Applicant further compares Bello to the present Specification (Remarks, pg. 11-12).
Examiner’s Response:
The Examiner respectfully disagrees. MPEP 2111.01(I)-(II) provides the following, 
“Under a broadest reasonable interpretation (BRI), words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification. The plain meaning of a term means the ordinary and customary meaning given to the term by those of ordinary skill in the art at the time of the invention... "Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim. For example, a particular embodiment appearing in the written description may not be read into a claim when the claim language is broader than the embodiment." Superguide Corp. v. DirecTV Enterprises, Inc., 358 F.3d 870, 875, 69 USPQ2d 1865, 1868 (Fed. Cir. 2004)” (emphasis added).

The limitation in question is the following, “a relative attention mechanism that is configured to perform a Softmax normalization and determine a sum of a static convolution kernel with an adaptive attention matrix.” The broadest reasonable interpretation of the limitation requires a relative attention mechanism performing two functions, including “perform a Softmax normalization and determine a sum of a static convolution kernel with an adaptive attention matrix”. The prior art rejection of claims 1 and 16 asserts that Bello teaches “a relative attention mechanism that is configured to perform a Softmax normalization”, but does not assert that Bello teaches “a relative attention mechanism that is configured to...determine a sum of a static convolution kernel with an adaptive attention matrix.” Indeed, the prior art rejection provides that Li et al. teaches a relative attention mechanism that is configured to...determine a sum of a static convolution kernel with an adaptive attention matrix (Figure 1 and pg. 1 last full paragraph to pg. 2). The prior art rejection further provides that Bello et al. teaches a relative attention mechanism that is configured to perform a Softmax normalization (pg. 3286 first full paragraph and pg. 3287 Section 3.1). The prior art rejection further provides a rationale to combine Li et al., Duan et al., and Bello et al. for which the remarks have not provided arguments. It is noted that in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Regarding the prior art rejection of dependent claims 2-10, 13-15, and 17, the Applicant relies on arguments above regarding independent claims 1 and 16. Therefore, the above responses are applicable to dependent claims 2-10, 13-15, and 17.


Allowable Subject Matter
Claims 20-22 are allowed.
Claims 11-12 and 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/Primary Examiner, Art Unit 2125