DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see Remarks, filed 08/02/2021, with respect to the rejection(s) of claim(s) 1-20 under 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Lin and Lubbers.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lin (US PG Pub 20180260698) in view of Lubbers (US PG Pub 20200097604).
	As per claims 1, 11 and 20, Lin discloses a computer implemented method, one or more non-transitory computer readable media and system, comprising: 	one or more memories storing instructions (Lin; Fig. 6, item 604 and 606; p. 0122-0123); and 	one or more processors that are coupled to the one or more memories and, when 

	As per claims 4 and 14, Lin in view of Lubbers discloses the computer-implemented method and system of claims 1 and 11, wherein determining that the first phase matches the first region comprises: generating a first plurality of grounding decisions based on the source sentence, the source image, and a trained phase grounding model that sequentially maps each phrase included in a sentence to a plurality of grounding decisions based on any unmapped phrases included in the sentence (Lin; p. 0018-0022 - The first set of ground truth phrases includes words describing image features in a first set of ground truth images and relationships of the image features in the first set of ground truth images); and performing on or more comparison operations on the plurality of grounding decisions to determine that a first grounding decision included in the plurality of grounding decisions indicates that the first phrase matches the first region (Lin; p. 0018-0022 - the first LSTM is trained to receive the image data as input and provide a skeletal phrase that describes the objects and relationships of objects in the image, without describing the attributes of the object).

	As per claims 6 and 16, Lin in view of Lubbers discloses the computer-implemented method and system of claims 4 and 14, further comprising: performing one or more pre-training operations on an untrained phrase encoder and an untrained visual encoder to generate a pre-trained phrase encoder and a pre-trained visual encoder (Lin; p. 0033 - An LSTM is a type of recurrent feed-forward neural network architecture which can be trained to classify input data, such as to identify a word describing feature data. An LSTM is trained on a training data set, such as a training data set having images and known-accurate respective phrases describing each of the images. Thus, the LSTM can receive input feature data and provide words which describe, with a high level of probability, the salient features of the input feature data; p. 0018 - The CNN is trained to extract the image features based on pixel values (e.g., color, grayscale value) of pixels within locations of the image); and performing one or more training operations on an untrained phrase grounding model that includes both the pre-trained phrase encoder and the pre-trained visual encoder to generate the trained phrase grounding 
	As per claims 7 and 17, Lin in view of Lubbers discloses the computer-implemented method and system of claims 1 and 11, wherein determining that the first phase matches the first region comprises: performing one or more object detection operations on the source image to generate a plurality of bounding boxes, wherein a first bounding box included in the plurality of bounding boxes defines the first region (Lin; p. 0065 - The feature maps are created by processing the image with a convolutional neural network (CNN) which is trained to extract the image features (e.g., data describing objects) based on pixel values (e.g., color, grayscale value) of pixels (bounding boxes) within locations of the image. The CNN produces feature maps from the extracted image features); and determining that the first phrase matches the first bounding box based on the first phrase, the at least the second phrase, and the sequence of bounding boxes (Lin; p. 0066 - The first LSTM neural network is trained to determine the skeletal phrase based on the first set of ground truth phrases. The first LSTM analyzes the feature maps for objects and relationships between the objects, and provides skeletal words describing objects in the image data. A combination of skeletal words forms a skeletal phrase).

	As per claims 9 and 19, Lin in view of Lubbers discloses the computer-implemented method and system of claims 1 and 11, wherein determining that the first phase matches the first region comprises applying a first state of a first recurrent neural network (RNN) and a first state of a second RNN to a first neural network (NN) to generate a grounding decision (Lin; p. 0016-0021 - The second LSTM provides, for each word in the skeletal phrase, associated attributes which further describe the respective skeletal word. The inputs to the second LSTM can include the feature maps and information from the first LSTM, such as the skeletal words, hidden states of the first LSTM that identify potential skeletal words, and/or attention maps that identify the portions of the image having a high probability of having a significant feature).
	As per claim 10, Lin in view of Lubbers discloses the computer-implemented method of claim 1, wherein determining that the first phrase matches the first region comprises applying a first state of a first recurrent neural network (RNN), a first state of a second RNN, and a first state of a bi-directional RNN to a first neural network (NN) to generate a grounding decision .
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139.  The examiner can normally be reached on Monday - Friday 9-6 ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/RODRIGO A CHAVEZ/Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658