DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The IDS(s) has/have been considered and placed in the application file.

CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “performed by a computing device” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 5, 7-8, 11, 13-14, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al. (US 2017/0061250 A1 – hereinafter “Gao”) in view of Zhang et al. (CN-105760507-A – hereinafter “Zhang”).
Claims 1, 7, and 13:
Gao discloses a computer-implemented method for generating an image description performed by a computing device (¶33) the method comprising: 
obtaining a target image (¶56 discloses “an image representation”); 
generating a first global feature vector and a first label vector set of the target image (¶57 discloses “mapping … image vector 306 and … text vector 308 into a semantic space 402. Using a DMSM such as DMSM 230 and/or 300, the image vector 306, represented as β image, is mapped into the semantic space 402.” [emphasis added]; ¶2 discloses “a deep multimodal similarity model ("DMSM")”; ¶58 disclose “the text vector 308, represented as β text+, is more relevant to the image vector 306 than other text vectors, such as the text vector 308, represented as β text-.” i.e. label vector set; ¶78 discloses “In some examples, the DMSM models global similarity between images and text.” [emphasis added]); 
generating a first multi-mode feature vector of the target image through the [[matching]] similarity model (¶56 discloses “The cosine semantic similarity 310 of each of the sentences inputted into the DMSM 300 can be compared to determine a sentence having the highest similarity”[emphasis added]; ¶2 discloses “a deep multimodal similarity model ("DMSM")”[emphasis added]; where, the is a similarity model is comparable to the matching model; ¶98 discloses “the deep multimodal similarity detector module… propagate forward the letter-trigram count vector through a deep convolutional neural network to produce a semantic vector.”; i.e. a first multi-mode feature vector), wherein the [[matching]] similarity  model is a model obtained through training according to a training image and reference image description information of the training image (¶54 discloses “DMSM 230 uses a pair of neural networks, an image model 240 and a text model 242, one for mapping each input modality to a common semantic space, which are trained jointly.”); and 
applying the first multi-mode feature vector, the first global feature vector, and the first label vector set to a computing model  (¶98 discloses a first multi-mode feature vector; ¶78 discloses; ¶78 discloses global similarity model or DMSM; ¶58 discloses the first label vector set; ¶98 discloses “the deep multimodal similarity detector module… propagate forward the letter-trigram count vector through a deep convolutional neural network to produce a semantic vector.”; i.e. the CNN is a computing model), to obtain the target image description information (¶98 discloses “semantic vector”; where, this semantic vector is used for the purpose in ¶2 “generating captions for images” and ¶1 “captions can be used to "explain" or annotate a scene in an image”), 
wherein the computing model is a model obtained through training according to image description information of the training image and the reference image description information (¶98 discloses “a deep convolutional neural network”; ¶59 discloses “semantic vectors can be mapped using a rich convolutional neural network. The neural network can be fine-tuned using various data sets, such as the training set of data 238.”; ¶¶40-41 discloses the description of the training set of data 238 which includes image/caption pairs for reference).
Gao discloses all of the subject matter as described above except for specifically teaching “matching.”  However, Zhang in the same field of endeavor teaches matching (Title discloses “correlation modeling”; where, correlation is a synonym for matching; ¶¶19, 98).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Gao and Zhang before the effective filing date of the claimed invention.  The motivation for this combination of references would have been to better describe the association between visual images and text descriptions (Zhang ¶¶4-7).  This motivation for the combination of Gao and Zhang is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).  
Claims 2, 8, and 14:
The combination of Gao and Zhang discloses the method according to claim 1, wherein the first multi-mode feature vector (Gao ¶98 discloses “the deep multimodal similarity detector module… propagate forward the letter-trigram count vector through a deep convolutional neural network to produce a semantic vector.”; i.e. a first multi-mode feature vector) includes predicted text information of the target image (¶¶48-49 discloses “These features can form a "baseline" system.”; where, Table 1 discloses and “Attribute” is “Predicted word is in the attribute set” and similarly, N-gram+/- deal with predicted words). 
Claims 5, 11, and 17:
The method according to claim 1, further comprising: obtaining a second global feature vector and a second label vector set of the training image (Zhang ¶43 discloses feature vectors; Gao ¶58 disclose “the text vector 308, represented as β text+, is more relevant to the image vector 306 than other text vectors, such as the text vector 308, represented as β text-.” i.e. label vector set; Gao ¶97), and a text feature vector of the reference image description information of the training image (Gao ¶54 discloses “DMSM 230 uses a pair of neural networks, an image model 240 and a text model 242, one for mapping each input modality to a common semantic space, which are trained jointly.”; Gao ¶58 discloses “When trained, the DMSM can recognize that the text vector 308…The relevance space 404 can be determined through the training”; ¶59 discloses “The neural network can be fine-tuned using various data sets, such as the training set of data 238”); and training the [[matching]] similarity model according to the second global feature vector and the text feature vector (Gao ¶58 discloses “When trained, the DMSM can recognize that the text vector 308…The relevance space 404 can be determined through the training”; Zhang ¶43 discloses feature vectors).

Allowable Subject Matter
Claims 3-4, 6, 9-10, 12, 15-16 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ross Varndell whose telephone number is (571)270-1922.  The examiner can normally be reached on M-F, 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached on (571)270-3717.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Ross Varndell/Primary Examiner, Art Unit 2666