Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This communication is a non-Final office action on merit.  Claims 21-41, after preliminary amendment, are presently pending and have been considered below.

Priority
This application discloses and claims only subject matter disclosed in prior application no 16/729,982, filed 12/30/2019, and names the inventor or at least one joint inventor named in the prior application. Accordingly, this application may constitute a continuation or division. Should applicant desire to claim the benefit of the filing date of the prior application, attention is directed to 35 U.S.C. 120, 37 CFR 1.78, and MPEP § 211 et seq.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/13/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement has been considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 21-24 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4 of U.S. Patent No. 11,361550 (parent); claims 40-41 are rejected as being unpatentable over claims 14-15, and 20 of parents. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims in instant application recite limitations with a similar scope with ones in its parent.

Claims 30 is rejected under 35 U.S.C. 101 as claiming the same invention of claim 31 as they are duplicate claims. This is a statutory double patenting rejection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or
    nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 21-32, 37-41 are rejected under 35 U.S.C. 103 as being unpatentable over US 2020/0401835 A1 Zhao et al. (hereinafter Zhao) in view of US 10,198,671 B1, Yang et al. (hereinafter Yang).


1-20. (Canceled)  
As to claim 21, Zhao discloses a method comprising: 
analyzing, via a computing device, a digital content item to detect a plurality of objects depicted in the digital content item (Figs 3, 7; pars 0005, 0018, 0020, 0036-0037, analyzing digital images to detect objects and their relations within the digital images), the analysis comprising determining a respective bounding box for each object of the plurality (pars 0020, 0029, 0045, identifying the bounding boxes corresponding to the detected objects in the digital image); 
determining, via the computing device, a set of geometry features for each object using the object's respective bounding box (Figs 7-8; pars 0004, 0056, 0059-0060, 0122-0124, determining feature vectors representing the features of the plurality of object proposals and feature maps); 
analyzing, via the computing device, the digital content item to determine an appearance vector for each of the plurality of objects (pars 0036, 0052, analyzing images to extract information/feature vector based on the content/object of the image); and
automatically creating, via the computing device and using a trained image captioning machine model (Fig 3A; 0005), a caption comprising a sequence of words (pars 0004-0005, 0018-0019, 0023, 0069-0070, generating a label comprising sequence of words), the automatic caption creation comprising using the trained image captioning machine model to determine the sequence of words of the caption for the digital content item using the appearance feature vector and the set of geometry features determined for each object of the plurality (pars 0004-0005, 0018-0019, 0023,  0069-0070), 
the automatic caption creation taking into account spatial relationships among the plurality of objects identified using each object's set of geometry features (Fig 3A; pars 0002-0004, 0018, 0021-0022, 0030, 0047, taking into account spatial relationship of the objects).  
Zhao does not expressly disclose determining and using an appearance vector for the captioning or labeling. 
Yang, in the same or similar field of endeavor, further teaches providing a high dimensional feature vectors (e.g. appearance vector) using the content information in the image (col 6, line 42- col 7, line 9, providing bounding boxes with different dimensions including high dimensions corresponding to context/content of the region). 
Therefore, consider Zhao and Yang’s teachings as a whole, it would have been obvious to one of skill in the art before the filing date of invention to incorporate Yang’s teachings in Zhao’s method to identify/provide sufficient bounding box size and dimension based on image contents and features.

As to claim 22, Zhao as modified discloses the method of claim 21, the set of geometry feature comprising height, width and center coordinate data determined using a respective object's bounding box (Zhao: par 0052).  

As to claim 23, Zhao as modified discloses the method of claim 21, the appearance features for a respective object represent content of the digital content item within the respective object's box (Zhao: pars 0052, the bounding boxes that contain the visible portions of objects (e.g. within the bounding box)).  

As to claim 24, Zhao as modified discloses the method of claim 23, further comprising using at least one region outside the respective object's bounding box to determine the respective object's appearance features representing the content of the digital content item within the respective object's box (Yang: col 2, lines 31-46; col 3, lines 21-32, adjusting bounding box/offset based on regions of contents/objects).  

As to claim 25, Zhao as modified discloses the method of claim 24, further comprising: analyzing, via the computing device, content within the bounding box of the respective object and the at least one region outside the respective object's bounding box to determine intermediate features for the respective object (Yang: Fig 3-4; col 2, lines 31-46; col 3, lines 21-32; col 7, line 59-col 8, line 2;  claim 1); and using, by the computing device, the intermediate features in determining the appearance vector for the respective object (Figs 2-3, 10-14; col 2, lines 31-46; col 6, lines 24-58).  

As to claim 26, Zhao as modified discloses the method of claim 21, the trained image captioning machine model comprising an encoder and a decoder (Zhao: pars 0022, 0058, 0066, 0069-0070, 0072, 0095, 0123, 0136, encoding and decoding processes; Yang: col 4, lines 9-17; col 9, lines 20-25; col 12, lines 35-45).  

As to claim 27, Zhao as modified discloses the method of claim 26, further comprising: using, by the encoder, the appearance feature vector and the set of geometry features determined for each object of the plurality to generate encoded output (Zhao: Figs 7-8; pars 0004, 0056, 0059-0060, 0122-0124, determining feature vectors representing the features of the plurality of object proposals and feature maps; Yang: col 2, lines 31-46; col 3, lines 21-32, adjusting bounding box/offset based on regions of contents/objects); and using, by the decoder, the encoded output from the encoder to generate the sequence of words of the caption (Zhao: Fig 3A; pars 0022, 0052, 0069-0070, 0119, 0123, claim 3; Yang: Figs 1, 4; col 4, line 59- col 5, line 2).  

As to claim 28, Zhao as modified discloses the method of claim 27, the encoded output for a respective object of the plurality of objects further comprising a feature vector generated using the appearance feature vector and the set of geometry features determined for the respective object (Zhao: pars 0022, 0069-0070, 0123, claim 3; Yang: col 6, line 42- col 7, line 9).  

As to claim 29, Zhao as modified discloses the method of claim 26, the encoder and decoder each comprising at least one trained neural network (Zhao: pars 0018-0020, 0022, 0034, 0069-0070, 0123).  

As to claim 30, Zhao as modified discloses the method of claim 26, the encoder and decoder each comprising at least one self-attention layer (Zhao: pars 0021, 0030, 0053, 0058, 0066, 0072).  

Claim 31 is rejected with the same reason as claim 30.

As to claim 32, Zhao as modified discloses the method of claim 31, the at least one self-attention layer using an attention weight matrix comprising a combined attention weight for each ordered pair of objects from the plurality of objects, the combined attention weight for a respective ordered pair of first and second object being determined using a measure of a visual relationship between the first and second objects and a measure of a spatial relationship between the pair of objects (Zhao: pars 0021, 0030, 0053, 0066, 0072).  

As to claim 37, Zhao as modified discloses the method of claim 21, at least one trained neural network is used to analyze the digital content item to detect the plurality of objects and determine the set of geometry features for each object (Zhao: (Zhao: Figs 7-8; pars 0004, 0056, 0059-0060, 0122-0124).  

As to claim 38, Zhao as modified discloses the method of claim 21, further comprising: causing, by the computing device, the sequence of words of the caption to be output at a user computing device (Zhao: Fig 9: pars 0034, 0137, output to computing device; Yang: Fig 31).  

As to claim 39, Zhao as modified discloses the method of claim 38, the sequence of words of the caption are in an audible format, and the caption is caused to be output in the audible format (Zhao: par 0137; Yang: col 16, lines 14-54, providing audio output for the caption).  Consider Zhao as modified’s teachings as a whole, it would have been obvious to one of skill in the art to convert word captions/labels to audio form should the user choose to do so as an alternative output format.

As to claim 40, it recites a non-transitory CRM with instructions executed to perform functions and features of claim 1. Rejection of claim 1 is incorporated herein.

As to claim 41, it is a device claim encompassed claim 1. Rejection of claim 1 is incorporated herein.

Claims 33-36 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Yang and further in view of US 2017/0061250 A1, Gao et al. (hereinafter Gao).


As to claims 33 and 34, Zhao as modified discloses the method of claim 26, but does not expressly disclose the decoder using a greedy left-to-right approach or a beam search technique to create the sequence of words of the caption.  Gao, in the same or similar field of endeavor, further teaches using a greedy left-to-right approach or a beam search technique to create the sequence of words of the caption (pars 0002, 0015, 0051). Therefore, consider Zhao as modified and Gao’s teachings as a whole, it would have been obvious to one of skill in the art before the filing date of invention to incorporate Gao’s teachings on different search techniques in Zhao’s as modified’s words caption creation to utilize/provide different word search techniques.  

As to claim 35, Zhao as modified discloses the method of claim 34, further comprising: using, by the computing device, the decoder implementing the beam search technique to generate multiple caption alternatives (Zhao: Figs 3A, 4; pars 0021, 0030, 0049, 0065, captions/labels being generated with different proposals, based on different relationships, or training dataset; Gao: pars 0049, 0051), each having a corresponding score (Zhao: Fig 3A, 4; pars 0021, 0030, 0055, 0120; Yang: Gao: par 0051); and using, by the computing device, the decoder to select one of the multiple caption alternatives based on each one's corresponding score (pars 0021, 0030, 0055, 0120, caption/labeling/proposal being determined based on the score).  

As to claim 36, Zhao as modified discloses the method of claim 35, each word in the sequence of words determined for a respective caption alternative having an assigned probability, the corresponding score for the respective caption alternative being determined using the assigned probability of each word in the sequence of words determined for the respective caption alternative (Zhao: par 0063; Yang: col 11, lines 34-51; Gao: pars 0002, 0015, 0043, 0046).  

Examiner’s Note
Examiner has cited particular column, line number, paragraphs and/or figure(s) in the reference(s) as applied to the claims for the convenience of the Applicant. Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the reference(s) in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Qun Shen whose telephone number is (571) 270-7927.  The examiner can normally be reached on Mon-Friday from 9:00-5:00. If attempts to reach the examiner by telephone are unsuccessful, the examiner's Supervisor, Vincent Rudolph can be reached on (571) 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.  Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/QUN SHEN/
Primary Examiner, Art Unit 2661