Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the amendments and remarks filed on 02/24/2021.
Claims 1-9 and 11-21 are pending.
Claims 1-9 and 11-13 have been amended.

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1 under 35 U.S.C. 112(b), have been fully considered and are persuasive. Therefore, the previous rejections set forth in the previous office action have been withdrawn.

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1-9 and 11-21 under 35 U.S.C. 103, have been fully considered and are persuasive.  The rejection(s) under 35 U.S.C. 103 of claim(s) 1-9 and 11-21 have been withdrawn. 

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: 
The most similar prior art of record do not disclose the examiner amendments and the combination of the prior art would not have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention.

More specifically, the prior art of record Shetty et al, “Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training”, 2017, hereinafter Shetty, discloses, in section 3.1, “Image features”, section 3.2, and Fig. 4, a CNN image encoder outputting image feature vectors apart of a “generator” and an LSTM sentence encoder outputting sentence feature vectors in a “discriminator”; where the “discriminator” uses both of these feature vectors to determine “distances between” them for how well the captions match the image (see “Img to Sent Distance Kernals” in Fig. 4). However, the amended claims state “vectors based on the plurality of training captions with the predicted image feature vector within the semantic space to determine a semantic similarity loss representing a difference in semantic meaning between the plurality of predicted caption feature vectors and the predicted image feature vector; and modifying parameters of the image encoder neural network and the sentence encoder neural network based on the semantic similarity loss”. In Shetty, while image and sentence encoder feature vector distances are calculated, both the encoders being trained on the same “semantic similarity loss” and constrained as required are not taught this way. In contrast, Shetty, section 3.3 discusses adversarial training calculating a discriminator loss to train the discriminator and then a generator l2 loss in order to train the generator. For at least these reasons, Shetty does not teach the amendments as claimed.

Further, the prior art of record Soldevila et al, US Pub 20170011279, hereinafter Soldevila, discloses, in paragraphs 0014-0016, 0029, and 0045, a CNN embedding vectors based on the plurality of training captions with the predicted image feature vector within the semantic space to determine a semantic similarity loss representing a difference in semantic meaning between the plurality of predicted caption feature vectors and the predicted image feature vector; and modifying parameters of the image encoder neural network and the sentence encoder neural network based on the semantic similarity loss”. In Soldevila, while a CNN embeds image into a semantic space, the required use of a sentence encoder neural network and training both encoders on the same “semantic similarity loss” are not taught in the same way as claimed. The examiner notes that, even when in combination with Shetty, the combination still does not teach the amendments as claimed.

Further, the prior art of record Vosoughi et al, “Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder”, 2016, hereinafter Vosoughi, discloses, in section 1 paragraph 3, section 2 intro, and section 2.1, a CNN sentence encoder operating to the character-level. However, the amended claims state “vectors based on the plurality of training captions with the predicted image feature vector within the semantic space to determine a semantic similarity loss representing a difference in semantic meaning between the plurality of predicted caption feature vectors and the predicted image feature vector; and modifying parameters of the image encoder neural network and the sentence encoder neural network based on the semantic similarity loss”. In Vosoughi, while sentences 

Further, the prior art of record Mao et al, US Pub 20170147910, hereinafter Mao, discloses, in paragraphs 0049, 0056, and Figs. 4-5, a CNN image encoding layer and a LSTM word encoding layer for mapping to the same feature space and further a cost function for training the RNN. However, the amended claims state “vectors based on the plurality of training captions with the predicted image feature vector within the semantic space to determine a semantic similarity loss representing a difference in semantic meaning between the plurality of predicted caption feature vectors and the predicted image feature vector; and modifying parameters of the image encoder neural network and the sentence encoder neural network based on the semantic similarity loss”. In Mao, while use of image and word encoders into a feature space is taught, the required use of training of both encoders on the same “semantic similarity loss” is not taught in the same way as claimed. The examiner notes that, even when in combination with Shetty, the combination still does not teach the amendments as claimed.

Further, the prior art of record Chen et al, US Pub 20180225519, hereinafter Ma Chen, discloses, in paragraphs 0072-0075 and Fig. 6, creating multiple captions for a video image through a CNN “625” processing the image and a series of RNNs 650 vectors based on the plurality of training captions with the predicted image feature vector within the semantic space to determine a semantic similarity loss representing a difference in semantic meaning between the plurality of predicted caption feature vectors and the predicted image feature vector; and modifying parameters of the image encoder neural network and the sentence encoder neural network based on the semantic similarity loss”. In Chen, while captions for a single image using encoders is taught, the required use of a caption encoder and training of both encoders on the same calculated “semantic similarity loss” is not taught in the same way as claimed. The examiner notes that, even when in combination with Shetty, the combination still does not teach the amendments as claimed.

Further still, the prior art of record Karpathy et al, “Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder”, 2016, hereinafter Karpathy, discloses, in sections 3.1.1-3.1.2, 3.2, and Fig. 3, an image embedding CNN and a word embedding BRNN that create representations into “the same h-dimensional embedding space” for calculating similarity scores between image and text; and a combined image caption generator in a CNN encoder-RNN decoder schematic trained on a “cost function”. However, the amended claims state “vectors based on the plurality of training captions with the predicted image feature vector within the semantic space to determine a semantic similarity loss representing a difference in semantic meaning between the plurality of predicted caption feature vectors and the predicted image feature vector; and modifying parameters of the image encoder neural network and the sentence encoder neural network based on the semantic similarity loss”. In Karpathy, while an image caption generator is trained on a cost function, the required use of a “semantic similarity loss” calculation between the predicted vectors of both encoders for training said both encoders is not taught in the same way as claimed. The examiner notes that, even when in combination with Shetty, the combination still does not teach the amendments as claimed.

Finally, as described above, the prior art of record above do not, alone or in combination, teach calculating loss and training image and sentence encoders as set forth in the manner of the claims.

For at least these reasons, Independent Claims 1, 7, and 13, and by virtue of dependency Claims 2-6, 8-9, 11-12, and 14-21 are allowable over the prior art of record.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Allowable Subject Matter
Claims 1-9 and 11-21 (rewritten as 1-20) are allowed over the prior art of record.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116