DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
Claims 1-18 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-18 are rejected under 35 U.S.C. 103(a) as being unpatentable over Leeman-Munk et al. (US PGPub US 2018/0096078 A1), hereby referred to as “Leeman-Munk”, in view of Woo et al. (Sanghyun Woo et al., "CBAM: Convolutional Block Attention Module", arXiv:1807.06521v2, July 18, 2018), hereby referred to as “Woo”. Woo was cited by applicant in IDS submitted on January 12, 2021.
Consider Claims 1 and 10. 
Leeman-Munk teaches: 
- 1. A method of measuring an interaction force, the method comprising: / 10. An interaction force measuring apparatus comprising: (Leeman-Munk: abstract, [0007] [0007] A method for visualizing convolutional neural networks can include generating a matrix of symbols to be positioned in a graphical user interface. Each symbol in the matrix can indicate a feature-map value that represents a likelihood of a particular feature being present or absent at a location in an input to a convolutional neural network. Each column in the matrix can have feature-map values generated by convolving the input to the convolutional neural network with a respective filter for identifying a specific feature in the input. The method can include generating a node-link diagram to be positioned in the graphical user interface. [0080] Although network devices 204-209 are shown in FIG. 2 as a mobile phone, laptop computer, tablet computer, temperature sensor, motion sensor, and audio sensor respectively, the network devices may be or include sensors that are sensitive to detecting aspects of their environment. For example, the network devices may include sensors such as water sensors, power sensors, electrical current sensors, chemical sensors, optical sensors, pressure sensors, geographic or position sensors (e.g., GPS), velocity sensors, acceleration sensors, flow rate sensors, among others.)
- 10. a memory storing instructions; and at least one processor configured to execute the instructions to at least: (Leeman-Munk: [0005], [0067] Network-attached data stores 110 can store data to be processed by the computing environment 114 as well as any intermediate or final data generated by the computing system in non-volatile memory. But in certain examples, the configuration of the computing environment 114 allows its operations to be performed such that intermediate and final data results can be stored solely in volatile memory ( e.g., RAM), without a requirement that intermediate or final data results be stored to non-volatile types of memory ( e.g., disk).)
- 1. generating feature maps corresponding to a plurality of sequential images; / 10. generate feature maps corresponding to a plurality of sequential images;  (Leeman-Munk: [0007] Each column in the matrix can have feature-map values generated by convolving the input to the convolutional neural network with a respective filter for identifying a specific feature in the input. The method can include generating a node-link diagram to be positioned in the graphical user interface. The node-link diagram can represent a feed forward neural network that forms part of the convolutional neural network. [0057] A filter can be a two-dimensional matrix of weights that can be trained or tuned. During a forward pass, each of the filters slides across ( e.g., is convolved with) an input to the convolutional neural network. As a filter slides over the input, dot products are computed between the filter's weights and each position in the input. This can result in a feature map that includes the filter's responses at every spatial position in the input. This process can be repeated for each filter to produce separate feature maps for each filter. The convolutional neural network can learn which filters activate when the filter "sees" a particular feature in the input.)
- 1. generating pooling maps respectively corresponding to feature map groups including a predetermined number of feature maps among the feature maps; / 10. generate pooling maps corresponding to feature map groups comprising a predetermined number of feature maps among the feature maps; (Leeman-Munk: [0130] An ESP application may embed an ESPE with its own dedicated thread pool or pools into its application space where the main application thread can do application-specific work and the ESPE processes event streams at least by creating an instance of a model into processing objects. [0196] The GUI can also include a node-link diagram 2508. The node-link diagram 2508 can visually represent a pooling layer (e.g., in 2510 of the convolutional neural network, a hidden layer 2512 of the convolutional neural network, an output layer 2514 of the convolutional neural network, or any combination of these. An example of the pooling layer 2510 can be a maxpooling layer that determines the maximum activation value in each column in the matrix of cells 2506. The pooling layer 2510, hidden layer 2512, and output layer 2514 may collectively form a fully connected region of the convolutional neural network.)
- 1. and outputting interaction force information. (Leeman-Munk: [0156] In the example shown in FIG. 11, the GUI 1100 includes a node-link diagram 1102 that visually represents nodes (neurons) in a deep neural network and connections (links) between the nodes. The nodes can be visually represented using circles or any other symbol. For example, the node-link diagram 1102 can visually represent an input layer 1104 of the deep neural network as one row of circles and an output layer 1110 of the deep neural network as another row of circles. The node-link diagram 1102 can visually represent hidden layers 1106, 1108 of the deep neural network as rows of circles between the input layer 1104 and the output layer 1110. The connections between the nodes in the deep neural network can be visually represented using lines or other symbols.)
Leeman-Munk does not teach: 
- 1. generating attention maps corresponding to the pooling maps; / 10. generate attention maps corresponding to the pooling maps; 

Woo teaches: 
- 1. A method of measuring an interaction force, the method comprising: / 10. An interaction force measuring apparatus comprising: a memory storing instructions; and at least one processor configured to execute the instructions to at least: (Woo: Abstract. We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.)
- 1. generating feature maps corresponding to a plurality of sequential images; / 10. generate feature maps corresponding to a plurality of sequential images;  (Woo: page 2 Contribution. Our main contribution is three-fold. 1. We propose a simple yet effective attention module (CBAM) that can be widely applied to boost representation power of CNNs. 2. We validate the effectiveness of our attention module through extensive ablation studies. 3. We verify that performance of various networks is greatly improved on the multiple benchmarks (ImageNet-1K, MS COCO, and VOC 2007) by plugging our light-weight module. Page 4 section 3 Convolutional Block Attention Module Given an intermediate feature map F 2 RC×H×W as input, CBAM sequentially infers a 1D channel attention map Mc 2 RC×1×1 and a 2D spatial attention map Ms 2 R1×H×W as illustrated in Fig. 1.)
- 1. generating pooling maps respectively corresponding to feature map groups including a predetermined number of feature maps among the feature maps; / 10. generate pooling maps corresponding to feature map groups comprising a predetermined number of feature maps among the feature maps; (Woo: Page 4 section 3 Convolutional Block Attention Module page 5 Fig. 2: Diagram of each attention sub-module. As illustrated, the channel sub-module utilizes both max-pooling outputs and average-pooling outputs with a shared network; the spatial sub-module utilizes similar two outputs that are pooled along the channel axis and forward them to a convolution layer. Beyond the previous works, we argue that max-pooling gathers another important clue about distinctive object features to infer finer channel-wise attention. Thus, we use both average-pooled and max-pooled features simultaneously. We empirically confirmed that exploiting both features greatly improves representation power of networks rather than using each independently (see Sec. 4.1), showing the effectiveness of our design choice.)
- 1. generating attention maps corresponding to the pooling maps; / 10. generate attention maps corresponding to the pooling maps; (Woo: Page 4 section 3 Convolutional Block Attention Module page 5 We first aggregate spatial information of a feature map by using both average-pooling and max-pooling operations, generating two different spatial context descriptors: Fcavg and Fcmax, which denote average-pooled features and max-pooled features respectively. Both descriptors are then forwarded to a shared network to produce our channel attention map Mc 2 RC×1×1. The shared network is composed of multi-layer perceptron (MLP) with one hidden layer. To reduce parameter overhead, the hidden activation size is set to RC/r×1×1, where r is the reduction ratio. After the shared network is applied to each descriptor, we merge the output feature vectors using element-wise summation.)
- 1. and sequentially receiving feature maps modified based on the attention maps / and receive feature maps modified based on the attention maps and output interaction force information. (Woo: Page 4 section 3 Convolutional Block Attention Module page 5 We first aggregate spatial information of a feature map by using both average-pooling and max-pooling operations, generating two different spatial context descriptors: Fcavg and Fcmax, which denote average-pooled features and max-pooled features respectively. Both descriptors are then forwarded to a shared network to produce our channel attention map Mc 2 RC×1×1. The shared network is composed of multi-layer perceptron (MLP) with one hidden layer. To reduce parameter overhead, the hidden activation size is set to RC/r×1×1, where r is the reduction ratio. After the shared network is applied to each descriptor, we merge the output feature vectors using element-wise summation.)
- 1. and outputting interaction force information. (Woo: page 7 Fig. 3: CBAM integrated with a ResBlock in ResNet[5]. This figure shows the exact position of our module when integrated within a ResBlock. We apply CBAM on the convolution outputs in each block. page 8 Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments.)
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the method and system for neural network visualization proposed by Leeman-Munk with the convolutional block attention algorithm taught by Woo, as they are both directed towards the same field of endeavor. The determination of obviousness is predicated upon the following findings: One skilled in the art would have been motivated to modify Leeman-Munk in order to improve the overall neural network architecture to ensure important features are weighted higher.  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Leeman-Munk, while the teaching of Woo continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of more efficiently applying neural network learning techniques on features that are to be emphasized.  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.

Consider Claims 2 and 11. 
The combination of Leeman-Munk and Woo teaches: 
2. The method of claim 1, wherein the pooling maps comprises a first type of pooling maps and a second type of pooling maps respectively corresponding to the feature map  (Woo: Page 4 section 3 Convolutional Block Attention Module page 5 We first aggregate spatial information of a feature map by using both average-pooling and max-pooling operations, generating two different spatial context descriptors: Fcavg and Fcmax, which denote average-pooled features and max-pooled features respectively. Both descriptors are then forwarded to a shared network to produce our channel attention map Mc 2 RC×1×1. The shared network is composed of multi-layer perceptron (MLP) with one hidden layer. To reduce parameter overhead, the hidden activation size is set to RC/r×1×1, where r is the reduction ratio. After the shared network is applied to each descriptor, we merge the output feature vectors using element-wise summation. page 7 Fig. 3: CBAM integrated with a ResBlock in ResNet[5]. This figure shows the exact position of our module when integrated within a ResBlock. We apply CBAM on the convolution outputs in each block. page 8 Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments.)

Consider Claims 3 and 12. 
The combination of Leeman-Munk and Woo teaches: 
3. The method of claim 2, wherein the first type of pooling map is generated by performing a convolution operation on a predetermined number of concatenated feature maps with a filter kernel of a predetermined size for each channel, and then linearly combining results of the convolution operation, and the second type of pooling map is generated by relocating a predetermined number of concatenated feature maps based on a spatial location, performing a convolution operation on the relocated feature maps with a filter kernel of a predetermined size for each channel, and then linearly combining results of the convolution operation. / 12. The interaction force measuring (Leeman-Munk: more color differentiation among nodes and connections. [0159] In the example shown in FIG. 11, the symbols in the node-link diagram 1102 are color coded to represent how the deep neural network responded to a user input (specifically, the word "animation"). A user may be able to provide any desired input via input box 1132. A computing device can receive the user input, feed the user input into the deep neural network, and update the color coding of the node-link diagram 1102 based on the results. For example, a representation of a node in the input layer 1104 can be color coded to indicate a weight of the node in the deep neural network (e.g., in response to user input). A representation of a connection between a node in the input layer 1104 and another node in the hidden layer 1106 can be color coded to indicate the result of multiplying a first weight of the connection (in the deep neural network) by a second weight of the node from the input layer 1104. A representation of a node in a hidden layer 1106, 1108 can be color coded to indicate a value determined by summing the weights of all of the connections to the node and passing the result through a rectified linear unit function. A representation of a connection between a node in the hidden layer 1108 and another node in the output layer 1110 can be color coded to indicate the result of multiplying a first weight of the connection by a second weight of the node from the hidden layer 1108. A representation of a node in the output layer 1110 can be color coded to indicate a value determined by summing the weights of all the connections coming into the node and then normalizing the result to represent the probability. The node-link diagram 1102 can be color coded to represent any number and combination of information, which can be generated in response to any number and combination of inputs to the deep neural network. Woo: page 8 Table 3: Combining methods of channel and spatial attention. Using both attention is critical while the best-combining strategy (i.e. sequential, channel-first) further improves the accuracy. Experimental results with various pooling methods are shown in Table 1. We observe that max-pooled features are as meaningful as average-pooled features, comparing the accuracy improvement from the baseline. In the work of SE [28], however, they only exploit the average-pooled features, missing the importance of max-pooled features. We argue that max-pooled features which encode the degree of the most salient part can compensate the average-pooled features which encode global statistics softly. Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments. Page 9 In addition, we investigate the effect of a kernel size at the following convolution layer: kernel sizes of 3 and 7. In the experiment, we place the spatial attention module after the previously designed channel attention module, as the final goal is to use both modules together. Table 2 shows the experimental results. We can observe that the channel pooling produces better accuracy, indicating that explicitly modeled pooling leads to finer attention inference rather than learnable weighted channel pooling (implemented as 1 × 1 convolution). In the comparison of different convolution kernel sizes, we find that adopting a larger kernel size generates better accuracy in both cases. It implies that a broad view (i.e. large receptive field) is needed for deciding spatially important regions. Considering this, we adopt the channel- pooling method and the convolution layer with a large kernel size to compute spatial attention.)

Consider Claims 4 and 13. 
The combination of Leeman-Munk and Woo teaches: 
4. The method of claim 1, wherein, after the attention maps and corresponding feature maps are multiplied, the feature maps are modified by adding feature maps to the multiplied result. / 13. The interaction force measuring apparatus of claim 10, wherein, after the attention maps and corresponding feature maps are multiplied, the feature maps are modified by adding feature maps to the multiplied result. (Leeman-Munk: more color differentiation among nodes and connections. [0159] In the example shown in FIG. 11, the symbols in the node-link diagram 1102 are color coded to represent how the deep neural network responded to a user input (specifically, the word "animation"). A user may be able to provide any desired input via input box 1132. A computing device can receive the user input, feed the user input into the deep neural network, and update the color coding of the node-link diagram 1102 based on the results. For example, a representation of a node in the input layer 1104 can be color coded to indicate a weight of the node in the deep neural network (e.g., in response to user input). A representation of a connection between a node in the input layer 1104 and another node in the hidden layer 1106 can be color coded to indicate the result of multiplying a first weight of the connection (in the deep neural network) by a second weight of the node from the input layer 1104. A representation of a node in a hidden layer 1106, 1108 can be color coded to indicate a value determined by summing the weights of all of the connections to the node and passing the result through a rectified linear unit function. A representation of a connection between a node in the hidden layer 1108 and another node in the output layer 1110 can be color coded to indicate the result of multiplying a first weight of the connection by a second weight of the node from the hidden layer 1108. A representation of a node in the output layer 1110 can be color coded to indicate a value determined by summing the weights of all the connections coming into the node and then normalizing the result to represent the probability. The node-link diagram 1102 can be color coded to represent any number and combination of information, which can be generated in response to any number and combination of inputs to the deep neural network. Woo: page 8 Table 3: Combining methods of channel and spatial attention. Using both attention is critical while the best-combining strategy (i.e. sequential, channel-first) further improves the accuracy. Experimental results with various pooling methods are shown in Table 1. We observe that max-pooled features are as meaningful as average-pooled features, comparing the accuracy improvement from the baseline. In the work of SE [28], however, they only exploit the average-pooled features, missing the importance of max-pooled features. We argue that max-pooled features which encode the degree of the most salient part can compensate the average-pooled features which encode global statistics softly. Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments. Page 9 In addition, we investigate the effect of a kernel size at the following convolution layer: kernel sizes of 3 and 7. In the experiment, we place the spatial attention module after the previously designed channel attention module, as the final goal is to use both modules together. Table 2 shows the experimental results. We can observe that the channel pooling produces better accuracy, indicating that explicitly modeled pooling leads to finer attention inference rather than learnable weighted channel pooling (implemented as 1 × 1 convolution). In the comparison of different convolution kernel sizes, we find that adopting a larger kernel size generates better accuracy in both cases. It implies that a broad view (i.e. large receptive field) is needed for deciding spatially important regions. Considering this, we adopt the channel- pooling method and the convolution layer with a large kernel size to compute spatial attention.)

Consider Claims 5 and 14. 
The combination of Leeman-Munk and Woo teaches: 
5. The method of claim 1, wherein the outputting of the interaction force information comprises: inputting the modified feature maps to a recurrent neural network (RNN) in order; and obtaining the interaction force information output from a fully connected  (Leeman-Munk: [0184] Some or all of the abovementioned features may enable the GUI 1100 to scale for use with deep neural networks having varying numbers of layers and nodes. The GUI 1100 can work with any type of deep neural network, including but not limited to deep neural networks for determining parts-of-speech, image classification, and natural language processing. The GUI 1100 can work with convolutional neural networks, recurrent neural networks, etc. Woo: page 3 Network engineering. “Network engineering” has been one of the most important vision research, because well-designed networks ensure remarkable performance improvement in various applications. A wide range of architectures has been proposed since the successful implementation of a large-scale CNN [19]. An intuitive and simple way of extension is to increase the depth of neural networks. Woo: page 8 Table 3: Combining methods of channel and spatial attention. Using both attention is critical while the best-combining strategy (i.e. sequential, channel-first) further improves the accuracy. Experimental results with various pooling methods are shown in Table 1. We observe that max-pooled features are as meaningful as average-pooled features, comparing the accuracy improvement from the baseline. In the work of SE [28], however, they only exploit the average-pooled features, missing the importance of max-pooled features. We argue that max-pooled features which encode the degree of the most salient part can compensate the average-pooled features which encode global statistics softly. Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments. Page 9 In addition, we investigate the effect of a kernel size at the following convolution layer: kernel sizes of 3 and 7. In the experiment, we place the spatial attention module after the previously designed channel attention module, as the final goal is to use both modules together. Table 2 shows the experimental results. We can observe that the channel pooling produces better accuracy, indicating that explicitly modeled pooling leads to finer attention inference rather than learnable weighted channel pooling (implemented as 1 × 1 convolution). In the comparison of different convolution kernel sizes, we find that adopting a larger kernel size generates better accuracy in both cases. It implies that a broad view (i.e. large receptive field) is needed for deciding spatially important regions. Considering this, we adopt the channel- pooling method and the convolution layer with a large kernel size to compute spatial attention.)

Consider Claims 6 and 15. 
The combination of Leeman-Munk and Woo teaches: 
6. The method of claim 5, wherein the RNN receives the modified feature maps sequentially from a first feature map to the last feature map and outputs a first output value corresponding thereto, and receives the modified feature maps sequentially from (Leeman-Munk: [0184] Some or all of the abovementioned features may enable the GUI 1100 to scale for use with deep neural networks having varying numbers of layers and nodes. The GUI 1100 can work with any type of deep neural network, including but not limited to deep neural networks for determining parts-of-speech, image classification, and natural language processing. The GUI 1100 can work with convolutional neural networks, recurrent neural networks, etc. Woo: page 3 Network engineering. “Network engineering” has been one of the most important vision research, because well-designed networks ensure remarkable performance improvement in various applications. A wide range of architectures has been proposed since the successful implementation of a large-scale CNN [19]. An intuitive and simple way of extension is to increase the depth of neural networks. Woo: page 8 Table 3: Combining methods of channel and spatial attention. Using both attention is critical while the best-combining strategy (i.e. sequential, channel-first) further improves the accuracy. Experimental results with various pooling methods are shown in Table 1. We observe that max-pooled features are as meaningful as average-pooled features, comparing the accuracy improvement from the baseline. In the work of SE [28], however, they only exploit the average-pooled features, missing the importance of max-pooled features. We argue that max-pooled features which encode the degree of the most salient part can compensate the average-pooled features which encode global statistics softly. Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments. Page 9 In addition, we investigate the effect of a kernel size at the following convolution layer: kernel sizes of 3 and 7. In the experiment, we place the spatial attention module after the previously designed channel attention module, as the final goal is to use both modules together. Table 2 shows the experimental results. We can observe that the channel pooling produces better accuracy, indicating that explicitly modeled pooling leads to finer attention inference rather than learnable weighted channel pooling (implemented as 1 × 1 convolution). In the comparison of different convolution kernel sizes, we find that adopting a larger kernel size generates better accuracy in both cases. It implies that a broad view (i.e. large receptive field) is needed for deciding spatially important regions. Considering this, we adopt the channel- pooling method and the convolution layer with a large kernel size to compute spatial attention.)

Consider Claims 7 and 16. 
The combination of Leeman-Munk and Woo teaches: 
th image and n-1 feature maps corresponding to n-1 images before the tth image. / 16. The interaction force measuring apparatus of claim 10, wherein the predetermined number of feature maps comprise a feature map corresponding to a tth image and n-1 feature maps corresponding to n-1 images before the tth image (Woo: page 7 section 4.1 Ablation studies In this subsection, we empirically show the effectiveness of our design choice. For this ablation study, we use the ImageNet-1K dataset and adopt ResNet-50 [5] as the base architecture. The ImageNet-1K classification dataset [1] consists of 1.2 million images for training and 50,000 for validation with 1,000 object classes. We adopt the same data augmentation scheme with [5,37] for training and apply a single-crop evaluation with the size of 224×224 at test time. The learning rate starts from 0.1 and drops every 30 epochs. We train the networks for 90 epochs. Following [5,37,38], we report classification errors on the validation set. Our module design process is split into three parts. We first search for the effective approach to computing the channel attention, then the spatial attention. Finally, we consider how to combine both channel and spatial attention modules. We explain the details of each experiment below. Pages 10-11, section 4.3 Network Visualization with Grad-CAM [18] For the qualitative analysis, we apply the Grad-CAM [18] to different networks using images from the ImageNet validation set. Grad-CAM is a recently proposed visualization method which uses gradients in order to calculate the importance of the spatial locations in convolutional layers.).

Consider Claims 8 and 17. 
The combination of Leeman-Munk and Woo teaches: 
(Leeman-Munk: [0156] In the example shown in FIG. 11, the GUI 1100 includes a node-link diagram 1102 that visually represents nodes (neurons) in a deep neural network and connections (links) between the nodes. The nodes can be visually represented using circles or any other symbol. For example, the node-link diagram 1102 can visually represent an input layer 1104 of the deep neural network as one row of circles and an output layer 1110 of the deep neural network as another row of circles. The node-link diagram 1102 can visually represent hidden layers 1106, 1108 of the deep neural network as rows of circles between the input layer 1104 and the output layer 1110. The connections between the nodes in the deep neural network can be visually represented using lines or other symbols. Woo: page 7 Fig. 3: CBAM integrated with a ResBlock in ResNet[5]. This figure shows the exact position of our module when integrated within a ResBlock. We apply CBAM on the convolution outputs in each block. page 8 Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments.)

Consider Claims 8 and 17. 
The combination of Leeman-Munk and Woo teaches:
9. The method of claim 1, further comprising: when the interaction force is greater than or equal to a predetermined magnitude, outputting information indicating this. / 18. The interaction force measuring apparatus of claim 10, wherein the processor is configured to execute further to: when the interaction force is greater than or equal to a predetermined magnitude, output information indicating this. (Leeman-Munk: [0156], [0160] In some examples, visual clutter can accumulate when all the symbols representing nodes and connections are visible in the node-link diagram 1102, making it difficult to locate potentially useful information. To help reduce visual clutter, the GUI 1100 can include one or more threshold controls 1124a-d. The threshold controls 1124a-d can enable a user to select a number of nodes to color code for a particular layer of the deep neural network. For example, threshold control 1124a can enable a user to input a threshold number of nodes to color code for the input layer 1104 of the deep neural network. Threshold control 1124b can enable a user to input a threshold number of nodes to color code for the hidden layer 1106 of the deep neural network. Threshold control 1124c can enable a user to input layer 1108 of the deep neural network. Threshold control 1124d can enable a user to input a threshold number of nodes to color code for the output layer 1110 of the deep neural network. Woo: page 7 Fig. 3: CBAM integrated with a ResBlock in ResNet[5]. This figure shows the exact position of our module when integrated within a ResBlock. We apply CBAM on the convolution outputs in each block. page 8 Thus, we suggest to use both features simultaneously and apply a shared network to those features. The outputs of a shared network are then merged by element-wise summation. We empirically show that our channel attention method is an effective way to push performance further from SE [28] without additional learnable parameters. As a brief conclusion, we use both average- and max-pooled features in our channel attention module with the reduction ratio of 16 in the following experiments. Page 13 We conduct object detection on the Microsoft COCO dataset [3]. This dataset involves 80k training images (“2014 train”) and 40k validation images (“2014 val”). The average mAP over different IoU thresholds from 0.5 to 0.95 is used for evaluation. According to [39,40], we trained our model using all the training images as well as a subset of validation images, holding out 5,000 examples for validation.)

Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. 
HWANG; Won Jun et al., US 20200380297 A1, METHOD AND APPARATUS FOR MEASURING INTERACTION FORCE BASED ON SEQUENTIAL IMAGES
RASH; William et al., US 20210089316 A1, DEEP LEARNING IMPLEMENTATIONS USING SYSTOLIC ARRAYS AND FUSED OPERATIONS
DESOLI; Giuseppe et al., US 20180189229 A1, DEEP CONVOLUTIONAL NETWORK HETEROGENEOUS ARCHITECTURE
SINGH; Surinder Pal et al., US 20190266784 A1, DATA VOLUME SCULPTOR FOR DEEP LEARNING ACCELERATION


If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600.



2662
/Tahmina Ansari/

August 6, 2021
/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662