Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED OFFICE ACTION

Status of Claims

Claims 1-20 are pending in this Office Action.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b) (2) (C) for any potential 35 U.S.C. 102(a) (2) prior art against the later invention.

1.	Claims 1, 8 and 9  are rejected under 35 U.S.C 103 as being patentable over Soldevila et al.  ( USPUB 20170011280)  in view of Weihua Chen ( NPL DOC:  " Beyond triplet loss: a deep quadruplet network for person re-identification," July 2017, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Pages- 403-407) .

As per Claim 1, Soldevila et al. teaches A computer-implemented method ( Paragraphs [0043] and [0045] ) for image categorization( image categories taught within Paragraph [0058]) , comprising: training a network to determine features of a pair of training images based on respective channel weight matrixes of the pair of training images ( Paragraphs [0055-0056]) , constructing, via the network, a channel weight matrix of an unlabeled image( Paragraphs [0025] and [0034])  ; and predicting, based on the channel weight matrix of the unlabeled image, a class label for the unlabeled image( Paragraph [0034]- “… even though this is how the neural network is trained, it is not primarily used for label prediction in the exemplary embodiment, but rather to use the predictions 60 to generate a weight gradient-based representation of a new image….”) .  
Soldevila et al. does not explicitly teach the network being further trained by a contrastive constraint based on inter-sample channel correlations between the pair of training images; at least one of the respective channel weight matrixes being constructed based on intra-sample channel correlations within a training image of the pair of training images,
 	However, within analogous art, Weihua Chen teaches the network being further trained by a contrastive constraint based on inter-sample channel correlations between the pair of training images (Page 406- Col. 1- “…With the help of this constraint, the minimum inter-class distance is required to be larger than the maximum intra-class distance regardless of whether pairs contain the same probe….” AND  Page 407-Col. 2- “…The margin in the contrastive loss can partly enhance the generalization ability of the classifier from the training set to the testing set. Because in general the larger the margin, the lower the generalization error of
the classifier [5]. So in this section, we mainly compare our quadruplet loss with the contrastive loss, which contains a margin threshold consistently with ours. The contrastive loss can be formulated as follows:…”) ; at least one of the respective channel weight matrixes being constructed based on intra-sample channel correlations within a training image of the pair of training images ( Page 406- Col. 1- lines 10-13, 23-25 AND Page 407- Col. 1- lines 13-15), 
	One of ordinary skill in the art would have been motivated to combine the teaching of Weihua Chen within the modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  because the Beyond triplet loss: a deep quadruplet network for person re-identification mentioned Weihua Chen provides a system and method for implementing higher performance of detection within image utilizing deep neural network. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Beyond triplet loss: a deep quadruplet network for person re-identification mentioned Weihua Chen within the modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al. for implementation of a system and method  for a higher performance of detection within image utilizing deep neural network.




As per Claim 8, Combination of Soldevila et al. and Weihua Chen teach claim 1,
Soldevila et al. teaches wherein the constructing further comprises constructing the channel weight matrix of the unlabeled image based on semantically complementary channel ( Paragraph [0025]- “… the set of labeled training images 58 comprises a database of images of vehicles each labeled to indicate a vehicle type using a labeling scheme of interest (such as class labels corresponding to “passenger vehicle,” “light commercial truck,” “semi-trailer truck,” “bus,” etc., although finer-grained labels or broader class labels are also contemplated). The supervised layers of the neural network 56 are trained on the training images 58 and their labels 59 to generate a prediction 60 (e.g., in the form of class probabilities) for a new, unlabeled image, such as image 14….”)  information of the unlabeled image ( Paragraph [0034]- “…At S102, the neural network 56 is optionally trained on the set of training images 58 and their labels 59 to predict labels for unlabeled images. The training images may include at least some images of objects that are labeled with labels which are expected to be encountered during the test phase. The training includes learning the weight matrices and tensors of the neural network through backward passes of the neural network…”) .  

As per Claim 9, Combination of Soldevila et al. and Weihua Chen teach claim 1,
Soldevila et al. teaches wherein the unlabeled image comprises a product, and the class label comprises a product identifier corresponding to the product ( Paragraph [0025]- “….labeling scheme of interest (such as class labels corresponding to “passenger vehicle,” “light commercial truck,” “semi-trailer truck,” “bus,” etc., although finer-grained labels or broader class labels are also contemplated). The supervised layers of the neural network 56 are trained on the training images 58 and their labels 59 to generate a prediction 60 (e.g., in the form of class probabilities) for a new, unlabeled image, such as image 14. In some embodiments, the neural network 56 may have already been pre-trained for this task and thus the training component 40 can be omitted….”) .  


2.	Claim 2  is  rejected under 35 U.S.C 103 as being patentable over Soldevila et al.  ( USPUB 20170011280)  in view of Weihua Chen ( NPL DOC:  " Beyond triplet loss: a deep quadruplet network for person re-identification," July 2017, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 403-407) in further view of Wu Zuobin ( NPL DOC:  " Effective Feature Fusion for Pattern Classification Based on Intra-Class and Extra-Class Discriminative Correlation Analysis," 15 August 2017,  2017 20th International Conference on Information Fusion (Fusion) - July 10-13, 2017, Pages- 1-5).

As per Claim 2, Combination of Soldevila et al. and Weihua Chen teach claim 1, 
Combination of Soldevila et al. and Weihua Chen does not explicitly teach further comprising: modeling the intra-sample channel correlations to emphasize discriminative features of the training image  , wherein the training image has only an image-level label.
Within analogous art, Wu Zuobin teaches further comprising: modeling the intra-sample channel correlations to emphasize discriminative features of the training image ( Page 3- Col. 1- “…INTRA-CLASS AND EXTRA-CLASS DISCRIMINATIVE CORRELATION ANALYSIS…” AND “…The basic idea of Intra-Class and Extra-Class Discriminative Correlation Analysis (IEDCA) is illustrated in Fig. 1. IEDCA attempts to find transforms where classes are separated within each feature set while the discriminating structure is inherited in the correlation analysis….”) , wherein the training image has only an image-level label ( Page 4-Col. 1- “…To utilize intra-class
correlation, we introduce a class matrix …(c denotes the number of the classes, while n denotes the number of training samples). Each row of the class matrix 􀜮 represents the
sample label….”) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Wu Zuobin  within the combined modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  and  the Beyond triplet loss: a deep quadruplet network for person re-identification mentioned Weihua Chen because the Effective Feature Fusion for Pattern Classification Based on Intra-Class and Extra-Class Discriminative Correlation Analysis mentioned by Wu Zuobin provides a system and method for implementing  feature fusion algorithm for improved intra-class correlation analysis within neural networks. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Effective Feature Fusion for Pattern Classification Based on Intra-Class and Extra-Class Discriminative Correlation Analysis mentioned by Wu Zuobin within the combined modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  and  the Beyond triplet loss: a deep quadruplet network for person re-identification mentioned Weihua Chen for implementation of a system and method  for feature fusion algorithm for improved intra-class correlation analysis within neural networks.

3.	Claims 3 and 6  are rejected under 35 U.S.C 103 as being patentable over Soldevila et al.  ( USPUB 20170011280)  in view of Weihua Chen ( NPL DOC:  " Beyond triplet loss: a deep quadruplet network for person re-identification," July 2017, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 403-407) in further view of Jianlong Fu ( NPL DOC:  "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition," July 2017, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Pages- 4438-4445).

As per Claim 3, Combination of Soldevila et al. and Weihua Chen teach claim 1,
Combination of Soldevila et al. and Weihua Chen does not explicitly teach further comprising: utilizing the intra-sample channel correlations as a soft attention mechanism   to learn discriminative features of the training image  , wherein the soft attention mechanism is applied to a first-order feature of the training image derived from a neural network of the network.  
Within analogous art, Jianlong Fu teaches  further comprising: utilizing the intra-sample channel correlations as a soft attention mechanism ( Page 4439-Col. 1- “….the recurrent network is alternatively optimized by an intra-scale softmax loss for classification and an inter-scale pairwise ranking loss for attention proposal network….”)  to learn discriminative features of the training image ( Page 4439- Col. 1- “…RA-CNN can gradually attend on the most discriminative regions from coarse to fine (e.g., from body to head, then to beak for birds). Note that the accurate region localization can help discriminative region-based feature learning, and vice versa. Thus the proposed network can benefit from the mutual reinforcement between region localization and feature learning….”) , wherein the soft attention mechanism is applied to a first-order feature of the training image derived from a neural network of the network ( Page 4440- Figure 2 shows the softmax operation within first leg of the CNN ( neural network) ) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Jianlong Fu within the combined modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  and  the Beyond triplet loss: a deep quadruplet network for person re-identification mentioned Weihua Chen because the Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition mentioned by Jianlong Fu provides a system and method for implementing  a  neural network for optimization of region attention and fine-grained representation within object recognition within images. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition mentioned by Jianlong Fu within the combined modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  and  the Beyond triplet loss: a deep quadruplet network for person re-identification mentioned Weihua Chen for implementation of a system and method  for a  neural network for optimization of region attention and fine-grained representation within object recognition within images.

As per Claim 6, Combination of Soldevila et al. and Weihua Chen teach claim 1,
Combination of Soldevila et al. and Weihua Chen does not explicitly teach further comprising: modeling the inter-sample channel correlations between the pair of training images with an inter-sample channel weight matrix that emphasizes distinct channel relationships specific to the pair of training images.  
Within analogous art, Jianlong Fu teaches further comprising: modeling the inter-sample channel correlations between the pair of training images with an inter-sample channel weight matrix that emphasizes distinct channel relationships specific to the pair of training images ( Page 4440,Col. 1- “…fully-connected and softmax layers (c1 to c3) and a region attention by an attention proposal network (d1, d2). The proposed RA-CNN is optimized to convergence by alternatively learning a softmax classification loss at each scale and a pairwise ranking loss across neighboring scales…” AND Figure 2- “…alternatively optimized by classification losses Lcls between label predictionY(s) and ground truthY_ at each scale, and pairwise ranking losses Lrank between p(s)t and p(s+1)t from neighboring scales,…”) .  


4.	Claims 10,11,16 and 19  are rejected under 35 U.S.C 103 as being patentable over Soldevila et al.  ( USPUB 20170011280)  in view of Yin Cui ( NPL DOC:  " Kernel Pooling for Convolutional Neural Networks," July 2017, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Pages 2921-2928) .

As per Claim 10,  Soldevila et al. teaches A computer-readable storage device encoded ( encoding taught within Paragraphs [0025], [0096] and [0108] )  with instructions that,
 when executed, cause one or more processors of a computing system to perform operations of image categorization ( Paragraphs [0065] and [0110]) , comprising: extracting a first-order feature of an unlabeled image from a neural network ( Paragraph [0025]-“…The NN training component 40 trains a neural network (NN) 56, such as a ConvNet. The neural network includes an ordered sequence of supervised operations (i.e., layers) that are learned on a set of labeled training objects 58, such as images and their true labels 59. The training image labels 59 are drawn from a set of two or more predefined class labels….”) ;
Soldevila et al. does not explicitly teach determining a high-order feature of the unlabeled image based on the first-order feature of the unlabeled image and intra-sample channel correlations within the unlabeled image; and predicting a class label for the unlabeled image based on the high-order feature of the unlabeled image.  
However, within analogous art, Yin Cui teaches determining a high-order feature of the unlabeled image based on the first-order feature of the unlabeled image and intra-sample channel correlations within the unlabeled image ( Page 2928- Col. 2- “…fully-connected layers.
So removing the fully-connected layers significantly degrade the original 1st order feature…. These experiments verify the effectiveness of using high-order feature interactions in the context of CNN.” AND Page 2922- Figure 2, Page 2922-Col. 1- lines 7- 13, lines 21- 25.) ; and predicting a class label for the unlabeled image based on the high-order feature of the unlabeled image ( Page 2921- Col. 2- “…applying kernel pooling,the inner product between two features can capture high order feature interactions as in Eqn. 1. This makes the subsequent linear classifier highly discriminative…” AND Page 2922- Col. 1- “…When learning and applying the subsequent linear classifier on extracted features, kernel methods such as Gaussian RBF or exponential χ2 kernel are often adopted to capture higher order information and make linear classifier more discriminative….”) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Yin Cui within the modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  because the Kernel Pooling for Convolutional Neural Networks mentioned Yin Cui provides a system and method for implementing a general pooling framework that captures higher order interactions of features for improved visual recognition accuracy.
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Kernel Pooling for Convolutional Neural Networks mentioned Yin Cui within the modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al. for implementation of a system and method  for a general pooling framework that captures higher order interactions of features for improved visual recognition accuracy.

As per Claim 11,  Combination of Soldevila et al. and Yin Cui teach claim 10, 
Soldevila et al. does not explicitly teach wherein the operations further comprising: applying a softmax loss to the high-order feature of the unlabeled image to predict the class label.
Within analogous art, Yin Cui teaches wherein the operations further comprising: applying a softmax loss to the high-order feature of the unlabeled image to predict the class label ( Page 2925- Col. 2- “….Combined with a CNN, the loss from the softmax layer can go through the proposed kernel pooling layer and be propagated back to the preceding fully convolution layers…” AND Page 2928- Col. 1 – “…We found that high order feature interactions, especially 2nd and 3rd order, are weighted more in VGG compared with ResNet….”) .  

As per Claim 16,  Soldevila et al.  teaches  A system, comprising: a camera to capture an image of a product; and a neural network ( Paragraph [0023]- “…The capture device 28 may include a camera, which supplies the image 14, such as a photographic image or frame of a video sequence, to the system 10 for processing. Hardware components 16, 20, 24, 26 of the system communicate via a data/control bus 32….”) , operatively connected to the camera, trained to: 
derive a first-order feature of the image ( Paragraph [0025]-“…The NN training component 40 trains a neural network (NN) 56, such as a ConvNet. The neural network includes an ordered sequence of supervised operations (i.e., layers) that are learned on a set of labeled training objects 58, such as images and their true labels 59. The training image labels 59 are drawn from a set of two or more predefined class labels….”); 
Soldevila et al. does not explicitly teach determine a high-order feature of the image based on the first-order feature of the image and a channel weight matrix having semantically complementary channel information of the image; and recognize the product based on the high-order feature of the image.  
However, within analogous art, Yin Cui teaches determine a high-order feature of the image based on the first-order feature of the image and a channel weight matrix having semantically complementary channel information of the image ( Page 2928- Col. 2- “…fully-connected layers.So removing the fully-connected layers significantly degrade the original 1st order feature…. These experiments verify the effectiveness of using high-order feature interactions in the context of CNN.” AND Page 2927- “…pre-trained weights for the neural network.The intial weights of the convolutional layers are pretrained on ImageNet classification dataset, and the initial weights of the final linear classifier is obtained by training a logistic regression classifier on the compact kernel pooling of pre-trained CNN features….” AND Page 2928- Col. 2- lines 1- 18) ; and recognize the product based on the high-order feature of the image ( Page 2928- Col . 2- “…a novel deep kernel pooling method as a high-order representation for visual recognition. The proposed method captures high order
and non-linear feature interactions via compact explicit feature mapping. The approximated representation is fully differentiable, thus the kernel composition can be learned together with a CNN in an end-to-end ….”) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Yin Cui within the modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al.  because the Kernel Pooling for Convolutional Neural Networks mentioned Yin Cui provides a system and method for implementing a general pooling framework that captures higher order interactions of features for improved visual recognition accuracy.
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Kernel Pooling for Convolutional Neural Networks mentioned Yin Cui within the modified teaching of the Extracting gradient features from neural networks mentioned by Soldevila et al. for implementation of a system and method  for a general pooling framework that captures higher order interactions of features for improved visual recognition accuracy.

As per Claim 19, Combination of Soldevila et al. and Yin Cui teach claim 16,
Soldevila et al. does not explicitly teach wherein the neural network is further trained to: apply a softmax loss to the high-order feature of the image to predict a class label of the image.
Within analogous art, Yin Cui teaches wherein the neural network is further trained to: apply a softmax loss to the high-order feature of the image to predict a class label of the image ( Page 2925- Col. 2- “….Combined with a CNN, the loss from the softmax layer can go through the proposed kernel pooling layer and be propagated back to the preceding fully convolution layers…” AND Page 2928- Col. 1 – “…We found that high order feature interactions, especially 2nd and 3rd order, are weighted more in VGG compared with ResNet….”) .  
  

It is noted that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. See MPEP 2123. 

Allowable Subject Matter

5.          Claims 4,5,7,12-15,17,18 and 20  are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

6.         The following is an examiner’s statement of reasons for objecting the claims as allowable subject matter: 

As to claim 4, prior art of record does not teach or suggest the limitation mentioned within claim 4: “…determining, based on the respective channel weight matrixes,
 the features of the pair of training images as respective high-order features from respective first-order features of the pair of training images, the respective first-order features being directly derived from a neural network of the network.” 

As to claim 5, prior art of record does not teach or suggest the limitation mentioned within claim 5: “…modeling the inter-sample channel correlations between the pair of training images based on a subtraction operation between the respective weight matrixes of the pair of training images.” 
 
As to claim 7, prior art of record does not teach or suggest the limitation mentioned within claim 7: “…determining respective high-order features of the pair of training images based on respective inter-sample channel weight matrixes of the pair of training images; and constructing the contrastive constraint based on a contrastive loss, a triplet loss, or another loss of metric learning applied to the respective high-order features of the pair of training images.” 

As to claim 12, prior art of record does not teach or suggest the limitation mentioned within claim 12: “…determining a second-order feature of the unlabeled image based on a matrix multiplication operation between the first-order feature and a transpose of the first-order feature; and determining a third-order feature of the unlabeled image based on a matrix multiplication operation between the first-order feature and the second-order feature.  ” 

As to claim 13 ,  Claim 13 depends on objected allowable claim 12, therefore claim  13  considered  objected over prior art of record. 

As to claim 14, prior art of record does not teach or suggest the limitation mentioned within claim 14: “…determining respective high-order features of a pair of training images based on respective inter-sample channel weight matrixes of the pair of training images; constructing a contrastive constraint based on a contrastive loss applied to the respective high-order features of the pair of training images; and training the neural network based on the contrastive constraint. ” 

As to claim 15 ,  Claim 15 depends on objected allowable claim 14, therefore claim  15  considered  objected over prior art of record. 

As to claim 17, prior art of record does not teach or suggest the limitation mentioned within claim 17: “…wherein the neural network is further trained to: learn the semantically complementary channel information from channel-wise interactions in the image; and encode the semantically complementary channel information into the first-order feature of the image. ” 

As to claim 18, prior art of record does not teach or suggest the limitation mentioned within claim 18: “wherein the channel weight matrix has a first weight for a first channel that is negatively correlated to a reference channel, and a second weight for a second channel that is positively correlated to the reference channel, and the first weight is greater than the second weight. ” 

As to claim 20, prior art of record does not teach or suggest the limitation mentioned within claim 20: “wherein the camera is configured to capture the image of the product on a checkout machine of a store, and the system further comprising: a display to present product information based on the class label of the image. ” 




Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion

7. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMAR S. ISMAIL whose telephone number is (571)272-9799 and Fax # (571)273-9799. The examiner can normally be reached on M-F: 9:00 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http:/ If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, David C. Payne can be reached on (571)272-3024. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OMAR S ISMAIL/Primary Examiner, Art Unit 2637