DETAILED ACTION
This office action is in response to Applicant’s submission filed on 16 February 2021.     THIS ACTION IS NON-FINAL.

Status of Claims

Claims 1-9, 11-15, 18, 20, 22-25 are pending.
Claims 10, 16-17, 19, 21 are cancelled.
Claims 1, 4, 9, 12-15, 18, 22 were amended.
Claims 22-25 are new. 
Claims 1-9, 11-15, 18, 20, 22-25 are rejected for double patenting.
Claims 18 and 20 are rejected under 35 U.S.C. 103 as unpatentable.
There is no art rejection for 1-9, 11-15, 22-25.



Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 5-9, 11-12, 14-15, 18 are rejected on the ground of nonstatutory double patenting as being obvious over claim 1 of U.S. Patent Appl. No.14/997,011, in view of Mei, et al., US-PATENT NO.7,890,512B2 [hereafter Mei] and Kiros, et al., “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”, NIPS 2014, Deep Learning Worship, 2014 [hereafter Kiros].  Although the claims at issue are not identical, they are not patentably distinct from each other because the functional steps are the same and that the instant application recites a method type claim with essentially the same limitation as the co-pending application would be obvious to one of ordinary skill in the art at the time of invention.
Regarding claim 1, the table below shows how the limitations are taught by claims of the co-pending application 14/997,011 and references.
Instant Application:
 14/996,959
Co-pending Application: 14/997,011
References:
Mei, Kiros
Claim 1.  A method implemented by a 


Limitation 1: generating an embedding space representing both images and text labels of a text vocabulary, including: 

Limitation 1: processing a training image having multiple text labels to generate a set of image regions that correspond to the respective multiple text labels;
Embedding, within an embedding space that is configured to embed both text labels and image regions mapped to the text labels, the set of image regions based, in part, on positions at which the multiple text labels that correspond to the image regions of the training image are embedded in the embedding space;

Limitation 2: computing distributions representing semantic concepts in the embedding space rather than representing the semantic concepts as single points, the semantic concepts for which the distributions are computed being described by respective text labels of the text vocabulary and capable of being depicted in image content; 

Mei, FIG.1, C12L56-58, ‘A probability density function (PDF) can be generated which estimates the visual features in the images in the cluster 108/112’, C16L7-10, ‘A set T of images can be given by the equation T = {x1, x2, …., xn}, where xi is a feature vector that describes the low-level visual features in the i-th image’, and Claim 2, ‘the vector of keyword annotations for each training image serves as a metadata tag for the image …. each keyword describes a different low-level visual feature in the image’ shows generating a PDF distribution over an embedding space representing semantic concepts (depicted as annotation keywords) in the images

positioning the distributions in the embedding space based on the semantic relationships determined for the respective text labels; and 

Kiros, p.4, 2.2 Multimodal distributed representations, ‘Given an image description S = {w1, …, wn} with words w1,…, wn, … let x … be the image embedding … define a scoring function s(x, v) … where x and v are first scaled to have unit norm (making s equivalent to cosine similarity)’ shows calculating semantic closeness measure between image labels using a cosine function, and p.5, 2.4 Multiplicative neural language models, ‘models the distribution P(wn = i | w1:n-1, u) of a new word wn’ shows distribution of image label words in embedding space.
Limitation 4: mapping representative images to the distributions of the embedding space, 
Limitation 2: learning a mapping function that maps image regions to the text labels embedded in the embedding space, said learning based, in part, on said embedding the set of image regions within the embedding space;


Limitation 5: wherein the image content depicted by the representative 

Kiros, p.2, Figure 1 shows image examples depicting semantic concepts in the 

Limitation 3: learning a mapping function that maps image regions to the text labels embedded in the embedding space, said learning based, in part, on said embedding the set of image regions within the embedding space;
discovering text labels that correspond to image regions of a query image by mapping the image regions of the query image to the embedding space using the learned mapping function; and annotating the query image with at least two of the discovered text labels.

Claim 5: A method as described in claim 4, wherein processing the plurality of training images includes, for each training image: determining candidate image regions for a respective set of image regions of the training image; and reducing a number of the determined candidate image regions using at least one post-processing technique.
Claim 3: A method as described in claim 1, wherein processing the training image to generate the set of image regions that correspond to the respective multiple text labels includes:  10determining candidate image regions for the set of image regions; and reducing a number of the determined candidate image regions using at least one post-processing technique.

Claim 6: A method as described in claim 5, wherein the candidate image regions are determined using geodesic object proposal.
Claim 4: A method as described in claim 3, wherein the candidate 15image regions are determined using geodesic object proposal.


Claim 5: A method as described in claim 1, wherein the at least one predefined criterion includes a threshold size, and the processing is effective to discard the semantically meaningful image regions of the query image 20 having less than a threshold size.

Claim 8: A method as described in claim 5, wherein the at least one post-processing technique involves enforcing an aspect ratio criterion by discarding candidate image regions having aspect ratios outside predefined allowable aspect ratios.
Claim 6: A method as described in claim 1, wherein the at least one predefined criterion includes a predefined set of allowable aspect 25ratios, and the processing is effective to discard the semantically meaningful image regions of the query image having aspect ratios outside the predefined set of allowable aspect ratios.

Claim 9: A method as described in claim 5, wherein the at least one post-processing technique includes assigning a single candidate image region to each of the respective multiple text labels of the training image based on a single label embedding model.
Claim 7: A method as described in claim 3, wherein the at least one post-processing technique assigns a single candidate image region to each of the multiple text labels of the training image using a single label embedding model.

Claim 11: A method as described in claim 1, wherein determining the at least one text label includes computing distances in the embedding space between embeddings of semantically meaningful image regions of the 


Claim 12: A method as described in claim 11, wherein the distances are computed using vectors that represent respective semantically meaningful image regions of the input image, the vectors extracted from the semantically meaningful image regions of the input image with a Convolutional Neural Network (CNN).
Claim 10: A method as described in claim 1, wherein the distances are computed using vectors that represent respective image regions of the query image, the vectors extracted from the image regions of the query image with a Convolutional Neural Network (CNN).

Claim 14: A method as described in claim 1, further comprising presenting indications of image regions of the input image that correspond to the at least one text label.
Claim 12: A method as described in claim 1, further comprising presenting the image regions of the query image that correspond to the discovered text labels with which the query image is annotated.



 Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Application 14/997,011 (directed to embedding space for images with multiple text labels), Mei (directed Automatic image annotation using semantic distance learning), and Kiros (directed to Unifying visual-semantic embedding with multimodal neural language models) and arrived at a system for modeling semantic concepts in an embedding space as distribution in 
 Claims 15, 18 are substantially similar to claim 1, The arguments as given above for claim 1 are applied, mutatis mutandis, to claims 15, 18, therefore the rejection of claim 1 are applied accordingly.



Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 18, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Karpathy, et al. “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR2015, 2015 [hereafter Karpathy], Mei, et al., US-PATENT NO.7,890,512B2 [hereafter Mei], in view of Kiros, et al., “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”, NIPS 2014, Deep Learning Worship, 2014 [hereafter Kiros], and Lin et al., US-PGPUB NO.2013/0121600A1 [hereafter Lin].

With regards to claim 18, Karpathy in view of Mei and Kiros teaches 
“… perform operations comprising: maintaining an image annotated with at least one text label, … the at least one text label describing at least one respective semantic concept exhibited by image content of the query image (Karpathy, p.3129, 3. Our Model, ‘During training, the input to our model is a set of images and their corresponding descriptions (Figure 2). We first present a model that aligns sentence snippets to the visual regions that they describe through a multimodal embedding’, and p.3133, Figure 5 shows example annotated images with semantic concepts exhibited by image content.


    PNG
    media_image1.png
    367
    1052
    media_image1.png
    Greyscale

) …. , the semantic concepts represented being described by the text labels of a text vocabulary and capable of being exhibited by the image content (Karpathy, Figure 1, Figure 5, Figure 7 shows examples semantic concepts depicted and text labels exhibited with image content.

    PNG
    media_image2.png
    482
    524
    media_image2.png
    Greyscale

)”.
Karpathy does not explicitly detail “One or more computer-readable storage media comprising instructions stored thereon that, responsive to execution by the computing device” and “the at least one text label discovered for the query image using an embedding space representing semantic concepts as distributions rather than representing the semantic concepts as single points”.
However Mei teaches “One or more computer-readable storage media comprising instructions stored thereon that, responsive to execution by the computing device (Mei, FIG.4, C17L39-60, ‘an exemplary system for implementing the AIA  technique and semantic RCS technique embodiment described herein includes one or more computing devices, such as computing device 400 … typically includes at least one processing unit 402 and memory 404 … computing device 400 can include additional storage … The computer storage media provides for storage of various information required to operate the device 400 such as computer readable instructions’)” and “the at least one text label discovered for the query image using an embedding space representing semantic concepts as distributions rather than representing the semantic concepts as single points (Mei, C12L56-58, ‘A probability density function (PDF) can be generated which estimates the visual features in the images in the cluster 108/112’, C16L7-10, ‘A set T of images can be given by the equation T = {x1, x2, …., xn}, where xi is a feature vector that describes the low-level visual features in the i-th image’, and Claim 2, ‘the vector of keyword annotations for each training image serves as a metadata tag for the image …. each keyword describes a different low-level visual feature in the image’ shows generating a PDF distribution over an embedding space representing semantic concepts (depicted as annotation keywords) in the images, and Figure 1 shows an example of text labels depicted in image content.)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Karpathy and Mei before him or her, to modify the visual-semantic annotation on embedding space taught by Karpathy to include generating a distribution of annotated image content over an embedding space as shown in Mei.   
The motivation for doing so would have been for automatic image annotation using semantic distance learning (Mei, Abstract). 

The combined teaching described above will be referred as Karpathy + Mei hereafter.

Karpathy + Mei does not explicitly detail “the distributions in the embedding space positioned based on semantic relationships determined for the text labels, the determination of the semantic relationships being based on meanings of the text labels of the text vocabulary”.
However Kiros teaches “the distributions in the embedding space positioned based on semantic relationships determined for the text labels, the determination of the semantic relationships being based on meanings of the text labels of the text vocabulary (Kiros, p.4, 2.2 Multimodal distributed representations, ‘Given an image description S = {w1, …, wn} with words w1,…, wn, … let x … be the image embedding … define a scoring function s(x, v) … where x and v are first scaled to have unit norm (making s equivalent to cosine similarity)’ shows calculating semantic closeness measure between image labels using a cosine function, and p.5, 2.4 Multiplicative neural language models, ‘models the distribution P(wn = i | w1:n-1, u) of a new word wn’ shows distribution of image label words in embedding space.)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Karpathy + Mei and Kiros before him or her, to modify the visual-semantic annotation on embedding space taught by Karpathy + Mei to include semantic relationship measure and distribution for image labeling words as shown in Kiros.   
The motivation for doing so would have been for image caption generation (Kiros, Introduction). 

 + Mei + Kiros hereafter.

Karpathy + Mei + Kiros does not explicitly detail “the annotating occurring in conjunction with indexing the query image for search”.
However Lin teaches “the annotating occurring in conjunction with indexing the query image for search Lin, FIG.1, [0044], ‘a set of images for indexing and a query image for search’, and FIG.5A-H.)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Karpathy + Mei + Kiros and Lin before him or her, to modify the visual-semantic annotation on embedding space taught by Karpathy + Mei + Kiros to include query image indexing for search as shown in Lin.   
The motivation for doing so would have been for visual search (Lin, Abstract). 

With regards to claim 20, Karpathy in view of Mei, Kiros and Lin teaches 
“One or more computer-readable storage media as described in claim 18”
However Mei teaches “wherein the distributions representing the semantic concepts are at least one of Gaussian distributions or Gaussian mixtures (Mei, C19L17-19, ‘Gaussian mixture model algorithm can alternatively be employed to perform in this semantic clustering’ shows modeling semantic concept distribution with Gaussian mixture model, which is a collection of multiple Gaussian distributions.)”
 before him or her, to modify the visual-semantic annotation on embedding space taught by Karpathy to include generating a distribution of annotated image content over an embedding space as shown in Mei.   
The motivation for doing so would have been for automatic image annotation using semantic distance learning (Mei, Abstract). 



Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TSU-CHANG LEE whose telephone number is 571-272-3567.  The fax number is 571-273-3567.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo, can be reached 571-272-9767.  
 Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TSU-CHANG LEE/
Examiner, Art Unit 2126