Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This communication is a non-Final office action on merit.  Claims 1-20, as originally filed, are presently pending and have been considered below.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 1/3/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement has been considered by the examiner.

Claim Objections
Claims 1, 8-9, 14, 18-20 are objected to because of the following informalities:  
Claims 1, 8-9, 14, 18-20 recite: “an n-dimensional appearance feature vector for the object,…” in which n has no type of number, the boundary and/or range defined.  


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or
    nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 1-5, 7, 11-17, 20 are rejected under 35 U.S.C. 103 as being unpatentable over US 2020/0401835 A1 Zhao et al. (hereinafter Zhao) in view of US 10,198,671 B1, Yang et al. (hereinafter Yang).


As to claim 1, Zhao discloses a method comprising: 

analyzing, via the computing device, the digital content item to detect a plurality of objects depicted in the digital content item (Figs 3, 7; pars 0005, 0018, 0020, 0036-0037, analyzing digital images to detect objects and their relations within the digital images), the analysis comprising determining a bounding box for each object of the plurality (pars 0020, 0029, 0045, identifying the bounding boxes corresponding to the detected objects in the digital image); 
determining, via the computing device and for each object of the plurality, an n-dimensional appearance feature vector for the object, the appearance feature vector determination comprising analyzing content of the digital content item to generate the n-dimensional appearance feature vector (Figs 7-8; pars 0004, 0056, 0059-0060, 0122-0124, determining feature vectors representing the features of the plurality of object proposals and feature maps); 
determining, via the computing device and for each object of the plurality, a set of geometry features for the object using the bounding box determined for the object (pars 0020, 0029, 0032-0033, 0045, 0051-0053, 0076, identifying geometric feature or location of the object based on coordinates of the respective bounding boxes of the object labels); 
generating, via the computing device and using a trained transformer machine model, encoded output using an encoder of the trained transformer, the n-dimensional appearance feature vector and the set of geometry features determined for each object of the plurality (pars 0022, 0069-0070, 0123, claim 3, feeding the embedded vectors (a 
automatically creating, via the computing device and using the trained transformer machine model, a caption comprising a sequence of words (pars 0004-0005, 0018-0019, 0023,  0069-0070), the automatic caption creation comprising using a decoder component of the trained transformer to decode the encoded output to identify the sequence of words (pars 0004-0005, 0018-0019, 0023,  0069-0070).  
Zhao does not expressly disclose using a decoder component of the trained transformer to decode the encoded output.  Yang, in the same or similar field of endeavor, further teaches image captioning models and utilize a recurrent neural network (RNN) as a decoder for predicting a sentence (col 4, lines 9-24).  Therefore, consider Zhao and Yang’s teachings as a whole, it would have been obvious to one of skill in the art before the filing date of invention to incorporate Yang’s encoder in Zhao’s method and properly decode the encoded caption output for predicting a sequence of words (e.g. sentence).

As to claim 2, Zhao as modified discloses the method of claim 1, further comprising: communicating, via the computing device, the digital content item and the automatically- generated caption to a user for display on a device of the user (Zhao: par 0137, I/O interface to provide graphical data/image content to a display for presentation to a user; Yang: col 19, lines 4-30).  

As to claim 3, Zhao as modified discloses the method of claim 1, further comprising: communicating, via the computing device, the digital content item and the automatically- generated caption to a device of a user (see rejection in claim 2), the communicating causing the automatically-generated caption to be output as audio (Zhao: par 0137, presenting out to a user with a display or audio/speaker).  

As to claim 4, Zhao as modified discloses the method of claim 1, the appearance feature vector determination further comprising analyzing a region of content within the bounding box determined for the object (Zhao: pars 0052, the bounding boxes that contain the visible portions of objects (e.g. within the bounding box)) and analyzing one or more regions of content outside the bounding box determined for the object (Yang: col 2, lines 31-46; col 3, lines 21-32, adjusting bounding box/offset based on regions of contents/objects).  

As to claim 5, Zhao as modified discloses the method of claim 1, further comprising using the spatial relationships among the plurality of objects to determine a plurality of geometry-based attention weights, each geometry-based attention weight corresponding to a pair of objects and representing a measure of a spatial relationship between the pair of objects (Zhao: pars 0021, 0030, 0053, 0066, 0072).  

As to claim 7, Zhao as modified discloses the method of claim 1, further comprising: determining, via the computing device, a plurality of appearance-based attention weights, each appearance-based attention weight corresponding to a pair of objects and representing a measure of a visual relationship between the pair of objects (Zhao: pars 0021, 0030, 0053, 0066, 0072), the plurality of appearance-based weights being used by the encoder in the encoded output generation (Zhao: pars 0022, 0058, 0066, 0069-0070, 0072).  

As to claim 11, Zhao as modified discloses the method of claim 1, further comprising training the encoder and decoder of the trained transformer machine model using a training dataset comprising a plurality of digital content items and one or more captions corresponding to each digital content item of the plurality (Yang: col 4, lines 9-24).  

As to claim 12, Zhao as modified discloses the method of claim 1, the transformer comprising a trained neural network architecture (Zhao: Figs 5A-5B; pars 0018-0019, training/utilizing a neural network).  

As to claim 13, Zhao as modified discloses the method of claim 1, further comprising: determining, via the computing device, the bounding box for an object of the plurality (Zhao: pars 0020, 0045, identifying the bounding boxes corresponding to the detected objects); and determining, via the computing device, the set of geometry features for the object using the bounding box (Zhao: pars 0052, 0119).  

As to claim 14, it recites a non-transitory CRM storing instructions executed to perform functions and features of claim 1.  Rejection of claim 1 is therefore incorporated herein.

As to claim 15, it is rejected with the same reason as set forth in claim 2.
As to claim 16, it is rejected with the same reason as set forth in claim 5.
As to claim 17, it is rejected with the same reason as set forth in claim 7.

As to claim 20, it is a device claim encompassed claim 1.  Rejection of claim 1 is therefore incorporated herein.

Allowable Subject Matter
Claims 6, 8-10, 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Reasons for Allowance
Prior art of record, (Zhao and Kim), neither discloses alone nor teaches in combination functions and features recited in claim 6, 8, and 9.  Claim 10 depends from claim 9.  Claims 18-19 recite similar limitations as claim 8-9, respectively.

Examiner’s Note
Examiner has cited particular column, line number, paragraphs and/or figure(s) in the reference(s) as applied to the claims for the convenience of the Applicant. Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the reference(s) in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Qun Shen whose telephone number is (571) 270-7927.  The examiner can normally be reached on Mon-Friday from 9:00-5:00. If attempts to 

/QUN SHEN/
Primary Examiner, Art Unit 2661