DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 are pending.


Claim Objections
Claim(s) 20 is/is/are objected to because of the following reasons:

Claim(s) 20 shall be ended with period “.”.

Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Examiner’s notes: the corresponding text descriptions of any figure(s)  and table(s) cited from the prior art are incorporated herein for further details associated with the examiner’s review comments on the corresponding claims below.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al (US10269125) in view of Wang et al (Self-supervised Sample Mining, 2018) and further in view of Fisher et al (US20190043003).

Regarding claims 1, 8 and 15, Kim teaches a method for generating a neural network for detecting one or more objects in images, comprising:
(Kim, Fig. 2; updating ROI pooling and seed box selecting layers of a neural network NN (FCN branch + RPN branch) using the NN output “for tracking the object by using the CNN learned by an object detector”, c2:35-40; “Object tracking, also called visual tracking, is the process that detects, extracts, identifies and locates the target in a sequence of images or a video”, c1:45-50; object tracking using CNN+RPN requires object identifications/classifications) 
generating one or more region proposals that may contain objects for each image of a set of unlabeled images;
(Kim, Fig. 2; RPN 122, proposal boxes; during training, training video frames/images (unlabeled, e.g., as indicated in Wang, Fig. 1) are inputted to the FCN branch and the RPN branch for generating two separate regions, respectively; these two regions are compared to generate a loss function from which FCN parameters are learnt by backpropagation, c3:45-end)
determining one or more proposal features for each of the region proposals and corresponding proposal feature predictions;
(Kim, Figs. 2 and 4; “by referring to FIG. 2, the testing device 100 may instruct a seed box selecting layer 124 to select at least one specific proposal box among the proposal boxes PB1, PB2, PB3, and PB4 by referring to the estimated bounding box EBB, as the seed box SB”, c12:35-45)
generating one or more 
(Kim, Fig. 2; “generate at least one estimated bounding box EBB which is a bounding box, tracked from a previous bounding box, whose position is estimated as at least one position of the object in the current video frame VF by using the Kalman filter algorithm 123, where the previous bounding box is a bounding box corresponding to the object located in the previous video frame”, c10:50-60; “the testing device 100 may generate at least one estimated error covariance of the current video frame by referring to at least one previous error covariance of the previous video frame, by using the Kalman filter algorithm 123”, c11:1-10; the error (loss) is estimated based on the seed box SB (selected proposed region); Fig. 6)
Kim does not expressly disclose but Wang teaches:
… self-supervised proposal learning losses …
(Wang, Fig. 1, Self- supervised Sample Mining (SSM) process; Eq. (1) describes the overall learning loss for region proposals. It includes three contributed losses, p3:c2, p4:c1: the 1st loss term L(loc, W) denotes the bounding box regression loss based on object categories; the 2nd loss term L(cls, AL, W) implies the classification loss for the AL (Active Learning) process, and the 3rd loss term L(cls, SSM, W, V) implies the classification loss for the SSM (Self-supervised Sample Mining, or Self-supervised Learning) process; L(cls, SSM, W, V) => “self-supervised proposal learning losses”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Wang into the system or method of Kim in order to include a self-supervised learning process as an additional RPN learning process for fast estimating a region proposal error (e.g., boundary box error) by considering object categories (W) together with latent variables (V) in a self-supervised fashion (SSM). The combination of Kim and Wang also teaches other enhanced capabilities.
generating one or more consistency-based proposal learning losses 
(Wang, Fig. 1, at least the 2nd and the 3rd loss terms in eq. (1), L(cls, AL, W) and L(cls, SSM, W, V), are related to ϕ(j, xi; W), the probability of belonging to the j-th category for each region proposal xi, eqs. (1) and (2); it is obvious that ϕ(j, xi; W) is generally noise dependent, meaning that L(cls, AL, W) and L(cls, SSM, W, V) are also noise dependent; these two loss terms are derived from cross image validation and prediction consistency evaluation, so they are consistency-based losses; the consistency score si is given by eq. (6)) 
	The combination of Kim and Wang does not expressly disclose but Fisher teaches:
… based on noisy proposal feature predictions and the corresponding proposal predictions without noise;
(Fisher, “The purpose of image augmentation is to diversify the training data resulting in better performance of models. The image augmentation includes… random Gaussian noise, random contrast changes … The augmented images are classified by WhatCNN 1506 during training. The classification is compared with ground truth and coefficients or weights of WhatCNN 1506 are updated by calculating gradient loss function and multiplying the gradient with a learning rate”, [0229]; noise is added/augmented to an image; calculate a gradient loss function by comparing the augmented image (with noise and the ground truth image (without noise); the three loss terms in eq. (1) of Wang, e.g., L(cls, AL, W), may be evaluated with or without added noise to test the susceptibility of an RPN to the added noise during learning)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Fisher into the modified system or method of Kim and Wang in order to test and learn noise susceptibility of an RPN to added noise. The combination of Kim, Wang and Fisher also teaches other enhanced capabilities.
	The combination of Kim, Wang and Fisher further teaches:
generating a combined loss using the one or more self-supervised proposal learning losses and one or more consistency-based proposal learning losses; and
(Wang, Fig. 1; overall loss = L(loc, W) + L(cls, AL, W) + L(cls, SSM, W, V) , eq. (1))
updating the neural network based on the combined loss.
(Wang, Fig. 1; updating the CNN detector with L(cls, AL, W) and L(cls, SSM, W, V))

Regarding claims 2, 9 and 16, the combination of Kim, Wang and Fisher teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the determining the one or more self-supervised proposal learning losses includes:
generating one or more proposal location losses using the unlabeled images to learn context-aware features;
(Wang, see comments on claim 1; Fig. 1, eqs. (1) and (2), L(cls, SSM, W, V) => “self-supervised proposal learning losses”)
generating one or more contrastive losses using the unlabeled images to learn noise-robust proposal features; and
(Fisher, see comments on claim 1; gradient loss based on comparison between augmented image with random contrast changes and ground truth, “The image augmentation includes… random Gaussian noise, random contrast changes …”, [0229]; this type of loss may be one of the contributions to the self-supervised loss of Wang (Fig. 1, eqs. (1) and (2)) via probability ϕ(j, xi; W); from another point of view, adding/augmenting different levels of random noise to input images is a process to produce contrastive losses of object feature classifications due to the added noise)
generating a first self-supervised proposal learning loss based on the one or more proposal location losses and one or more contrastive losses.
(Wang, see comments on claim 1; Fig. 1, eq. (1); different losses may be considered together to generate an overall loss; Fisher, [0229])

Regarding claims 3, 10 and 17, the combination of Kim, Wang and Fisher teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the generating the one or more contrastive losses include:
adding noise to the proposal features to generate noisy proposal features; and
generating a first contrastive loss using the noisy proposal features.
(Fisher, see comments on claims 1 and 2)

Regarding claims 4, 11 and 18, the combination of Kim, Wang and Fisher teaches its/their respective base claim(s).
The combination further teaches the method of claim 3, wherein the generating the one or more contrastive losses include:
adding noise to the unlabeled images or intermediate features to generate noisy unlabeled images or noisy intermediate features respectively; and
generating a second contrastive loss using at least one of the noisy unlabeled images and noisy intermediate features.
(Fisher, see comments on claims 1 and 2; “In addition to image augmentation (used in training of WhatCNN), temporal augmentation is also applied to image frames during training of the WhenCNN. Some examples include mirroring, adding Gaussian noise,…”, [0232]; adding random noise to different frames as temporal noise augmentation creates different contrastive losses among the frames)

Regarding claims 5, 12 and 19, the combination of Kim, Wang and Fisher teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the generating the one or more consistency-based proposal learning losses includes:
generating a first consistency loss using the unlabeled images for bounding box classification predictions;
generating a second consistency loss using the unlabeled images for bounding box regression predictions; and
generating the consistency-based proposal learning loss based on the first consistency loss and the second consistency loss.
(Wang, see comments on claim 1; Fig. 1, at least the 2nd and the 3rd loss terms are derived from cross image validation and prediction consistency evaluation, so they are consistency-based losses with consistency score si is given by eq. (6); various consistence losses are added together to produce an overall loss as given in eq. (1))

Regarding claims 6, 13 and 20, the combination of Kim, Wang and Fisher teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, further comprising:
determining one or more fully-supervised losses of the neural network using a set of labeled images; and
generating the combined loss using the one or more self-supervised proposal learning losses, the one or more consistency-based proposal learning losses, and the one or more fully-supervised losses.
(Wang, see comments on claim 1; Fig. 1,  two consistence losses on the right (“yes”: use active learning (AL) or fully-supervised learning; “no”: self-supervised (SSM)) are fed back to CNN detector as a combined loss (eq. (1))

Regarding claims 7 and 14, the combination of Kim, Wang and Fisher teaches its/their respective base claim(s).
The combination further teaches the method of claim 6, wherein each of the labeled images includes at least one of an image-level class label and a bounding box label.
(Kim, Fig. 5; box label EBB)
 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIANXUN (JAMES) YANG whose telephone number is (571)272-9874. The examiner can normally be reached on MON-FRI: 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571)272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/JIANXUN YANG/Primary Examiner, Art Unit 2664                                                                                                                                                                                                        
8/30/2022