Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claim 21-40  rejected on the ground of nonstatutory double patenting as being unpatentable over claim1-20 of U.S. Patent No. 11,354778. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims  of the patent disclose  all the features of the claims of the application.

Re claim 21 Claim 1 discloses  A computing system to perform contrastive learning of visual representations, the computing system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a base encoder neural network configured to process an input image to generate an intermediate representation of the input image; a projection head neural network configured to process the intermediate representation of the input image to generate a projected representation of the input image, wherein to generate the projected representation, the projection head neural network is configured to perform at least one non-linear transformation; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining a training image; performing a plurality ofone or more first augmentation operations on the training image to obtain a first augmented image, wherein the plurality ofone or more first augmentation operations comprise at leastone or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image; separate from performing the plurality ofone or more first augmentation operations, performing a plurality ofone or more second augmentation operations on the training image to obtain a second augmented image, wherein the plurality ofone or more second augmentation operations comprise at leastone or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image;

Re claim 21 claim 1 discloses 

A computing system to perform contrastive learning of visual representations, the computing system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a base encoder neural network configured to process an input image to generate an intermediate representation of the input image; a projection head neural network configured to process the intermediate representation of the input image to generate a projected representation of the input image, wherein to generate the projected representation, the projection head neural network is configured to perform at least one non-linear transformation; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining a training image; 

See claim 1 ” A computing system to perform contrastive learning of visual representations, the computing system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a base encoder neural network configured to process an input image to generate an intermediate representation of the input image; a projection head neural network configured to process the intermediate representation of the input image to generate a projected representation of the input image, wherein to generate the projected representation, the projection head neural network is configured to perform at least one non-linear transformation; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining a training image;“


performing one or more first augmentation operations on the training image to obtain a first augmented image, wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image;

 separate from performing the one or more first augmentation operations, performing one or more second augmentation operations on the training image to obtain a second augmented image, wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image; 

Re claim 1 “performing a plurality of first augmentation operations on the training image to obtain a first augmented image, wherein the plurality of first augmentation operations comprise at least a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image; separate from performing the plurality of first augmentation operations, performing a plurality of second augmentation operations on the training image to obtain a second augmented image, wherein the plurality of second augmentation operations comprise at least a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image;” 



respectively processing, with the base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with the projection head neural network, the first intermediate representation and the second intermediate representation to respectively obtain a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.

See claim 1 “respectively processing, with the base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image;  respectively processing, with the projection head neural network, the first intermediate representation and the second intermediate representation to respectively obtain a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.”



Re claims 22-37, claims 2-17 of the patent disclose the features of claims 22-37 of the application respectively.


Re claim 38, claim 18 discloses A computer-implemented method to perform contrastive learning of visual representations, method comprising: obtaining a training image; performing one or more first augmentation operations on the training image to obtain a first augmented image, wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image; 

See claim 18” computer-implemented method to perform contrastive learning of visual representations, method comprising: obtaining a training image; performing a plurality of first augmentation operations on the training image to obtain a first augmented image, wherein the plurality of first augmentation operations comprise at least a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image;”

separate from performing the one or more first augmentation operations, performing one or more second augmentation operations on the training image to obtain a second augmented image, wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image; 

see claim 18 “separate from performing the plurality of first augmentation operations, performing a plurality of second augmentation operations on the training image to obtain a second augmented image, wherein the plurality of second augmentation operations comprise at least a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image;”

respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.

See claim 18 ”  respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image;  respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.”

Re claim 39 the additional features of claim 39 are taught by claim 13 of the patent. 

Re claim 40 Claim 20 of the patent discloses:

One or more non-transitory computer-readable media that collectively store a base encoder neural network that has been trained by a training method, the training method comprising: obtaining a training image; performing one or more first augmentation operations on the training image to obtain a first augmented image, wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image;

See claim 20 “One or more non-transitory computer-readable media that collectively store a base encoder neural network that has been trained by a training method, the training method comprising: obtaining a training image; performing a plurality of first augmentation operations on the training image to obtain a first augmented image, wherein the plurality of first augmentation operations comprise at least a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image;”

separate from performing the one or more first augmentation operations, performing one or more second augmentation operations on the training image to obtain a second augmented image, wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image;

See claim 20 “separate from performing the one or more first augmentation operations, performing one or more second augmentation operations on the training image to obtain a second augmented image, wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image;”


respectively processing, with the base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of the base encoder neural network based at least in part on the loss function.


See claim 20 “respectively processing, with the base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of the base encoder neural network based at least in part on the loss function.”




Claim 38  is  rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1  of U.S. Patent No. 11386302 in view of Bachman et al “Learning Representations by Maximizing Mutal Information Across Views” 2019 (cited in the IDS).. 


A computer-implemented method to perform contrastive learning of visual representations, method comprising:  (see claim 1 “A computer-implemented method for performing semi-supervised contrastive learning of visual representations, the method comprising:” )

obtaining a training image; (see claim 1 “obtaining a training image”)

 performing one or more first augmentation operations on the training image to obtain a first augmented image,  (see claim 1 “performing a plurality of first augmentation operations on the training image to obtain a first augmented image” )


separate from performing the one or more first augmentation operations, performing one or more second augmentation operations on the training image to obtain a second augmented image,  (see claim 1 “separate from performing the plurality of first augmentation operations, performing a plurality of second augmentation operations on the training image to obtain a second augmented image;”

respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.

See claim 1 “respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network comprising a plurality of layers, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function;” 

Claim 1 does not discloses 
wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image 
wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image 

Bachman discloses:

wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image (see section 3.4 data augmentation note data augmentation can include random cropping and random jitter in the color space);;
wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image(see section 3.4 data augmentation  note data augmentation can include random cropping and random jitter in the color space see figure 1 elements a note that two training images are used with different augmentations );

One of ordinary skill in the art could have used the more detailed  data augmentation of Bachman to modify the data augmentation of claim 1. The motivation to combine is that “Our model extends local DIM in three key ways: it predicts features across independently-augmented versions of each input, it predicts features simultaneously across multiple scales, and it uses a more powerful encoder. Each of these modifications provides improvements over local DIM. Predicting across independently-augmented copies of an input and predicting at multiple scales are two simple ways of producing multiple views of the context provided by a single image.” (see section 1 3rd paragraph).  

Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine claim 1 with Bachman to reach the aforementioned advantage. 


Claim 40  is  rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1  of U.S. Patent No. 11386302 in view of Bachman et al “Learning Representations by Maximizing Mutal Information Across Views” 2019 (cited in the IDS) in further view of Skala US 2021/0049346. 

Re claim 40 claim 1of the patent  discloses 

a base encoder neural network that has been trained by a training method, the training method comprising: (see claim 1  “modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function;”  note this step is training a base encoder see also remainder of claim note the claim is a method for training a base encoder  )

 obtaining a training image; (see claim 1 “obtaining a training image”)

 performing one or more first augmentation operations on the training image to obtain a first augmented image,  (see claim 1 “performing a plurality of first augmentation operations on the training image to obtain a first augmented image” )


separate from performing the one or more first augmentation operations, performing one or more second augmentation operations on the training image to obtain a second augmented image,  (see claim 1 “separate from performing the plurality of first augmentation operations, performing a plurality of second augmentation operations on the training image to obtain a second augmented image;”

respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.

See claim 1 “respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image; respectively processing, with a projection head neural network comprising a plurality of layers, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation; modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function;” 

Claim 1 does not discloses 
wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image 
wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image 

Bachman discloses:

wherein the one or more first augmentation operations comprise one or both of: a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image (see section 3.4 data augmentation note data augmentation can include random cropping and random jitter in the color space);;
wherein the one or more second augmentation operations comprise one or both of: a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image(see section 3.4 data augmentation  note data augmentation can include random cropping and random jitter in the color space see figure 1 elements a note that two training images are used with different augmentations );

One of ordinary skill in the art could have used the more detailed  data augmentation of Bachman to modify the data augmentation of claim 1. The motivation to combine is that “Our model extends local DIM in three key ways: it predicts features across independently-augmented versions of each input, it predicts features simultaneously across multiple scales, and it uses a more powerful encoder. Each of these modifications provides improvements over local DIM. Predicting across independently-augmented copies of an input and predicting at multiple scales are two simple ways of producing multiple views of the context provided by a single image.” (see section 1 3rd paragraph).  

Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine claim 1 with Bachman to reach the aforementioned advantage. 


Claim 1 and Bachman do not expressly disclose One or more non-transitory computer-readable media that collectively store a neural network. Skala discloses One or more non-transitory computer-readable media that collectively store a neural network (*see paragraph 69 note that the neural network is stored on a computer readable medium). One or ordinary skill in the art could have easily used a computer readable medium as disclosed in Skala to store the neural network trained by the combination to Claim 1 and Bachman and the results would merely by the neural network is stored on a computer readable medium and very predictable. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention  to combine Claim 1 Bachman and Skala to reach the aforementioned advantage.


Allowable Subject Matter
Claim 21-40  would be allowable if a terminal disclaimer was filed to remove the double patenting rejections.


Re claim 38 Bachman et al “Learning Representations by Maximizing Mutal Information Across Views” 2019 (cited in the IDS).

Bachman discloses 
A computer-implemented method to perform contrastive learning of visual representations, method comprising: 

obtaining a training image; 
performing a plurality of first augmentation operations on the training image to obtain a first augmented image, wherein the plurality of first augmentation operations comprise at least a first random crop operation that randomly crops the training image and a first random color distortion operation that randomly modifies color values of the training image to the training image (see section 3.4 data augmentation note data augmentation can include random cropping and random jitter in the color space); 
separate from performing the plurality of first augmentation operations, performing a plurality of second augmentation operations on the training image to obtain a second augmented image, wherein the plurality of second augmentation operations comprise at least a second random crop operation that randomly crops the training image and a second random color distortion operation that randomly modifies color values of the training image (see section 3.4 data augmentation  note data augmentation can include random cropping and random jitter in the color space see figure 1 elements a note that two training images are used with different augmentations ); 
respectively processing, with a base encoder neural network, the first augmented image and the second augmented image to respectively generate a first intermediate representation for the first augmented image and a second intermediate representation for the second augmented image (see section 3.6 encoder);  



evaluating a loss function  (see section 3.2)

and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function (see section 3.2 note that training is performed to minimize loss function).


The prior art of record does not disclose 37respectively processing, with a projection head neural network, the first intermediate representation and the second intermediate representation to respectively generate a first projected representation for the first augmented image and a second projected representation for the second augmented image; 
evaluating a loss function that evaluates a difference between the first projected representation and the second projected representation;
 and modifying one or more values of one or more parameters of one or both of the base encoder neural network and the projection head neural network based at least in part on the loss function.


The remaining claims contain similar allowable  features.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T MOTSINGER whose telephone number is (571)270-1237. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571)272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SEAN T MOTSINGER/Primary Examiner, Art Unit 2669