DETAILED ACTION
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/18/2020 and 04/18/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. However, it is noted that All Non-Patent Literature (NPL) citations need at least a month and year of publication: MPEP 609.04(a): The date of publication supplied must include at least the month and year of publication, except that the year of publication (without the month) will be accepted if the applicant points out in the information disclosure statement that the year of publication is sufficiently earlier than the effective U.S. filing date and any foreign priority date so that the particular month of publication is not in issue. NPL cited without at least the month and year of publication has been labeled with “no date available”.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 7-10, and 14-17 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chen et al (US 20170124432).
Regarding claim 1, Chen discloses an image question answering method (abstract), comprising: 
extracting a question feature representing a semantic meaning of a question, a global feature of an image, and a detection frame feature of a detection frame encircling an object in the image (¶53-55 a LSTM model is employed to generate a dense question embedding to characterize the semantic meaning of questions; image feature extraction; ¶41-45 a question-guided attention map, m, reflecting the image regions queried by the question, is generated from each image-question pair using a configurable convolutional neural network; question-guided attention map focuses on the regions asked by questions); 
obtaining a first weight of each of at least one area of the image and a second weight of each of at least one detection frame of the image according to the question feature, the global feature, and the detection frame feature (¶60 h=g(W ih I+W rh I r +W sh s+b h); W ih  refers to the first weight, W rh refers to the second weight, the weights are obtained according to the semantic meaning of questions, the image feature extraction and the question-guided attention map)
performing weighting processing on the global feature by using the first weight to obtain an area attention feature of the image (¶59-60 W ih I shows the image feature I weighted by W ih); 
performing weighting processing on the detection frame feature by using the second weight to obtain a detection frame attention feature of the image (¶59-60  a 1x1 convolution is applied on the attention weighted feature map to reduce the number of channels, resulting in a reduced feature map I r, W rh I r, shows the reduced attention weighted feature map is weighed by W rh ); and 
predicting an answer to the question according to the question feature, the area attention feature, and the detection frame attention feature (¶60-65 the question's or query's semantic information s, the image feature map I, and the reduced feature map I.sub.r are fused by a nonlinear projection; an answer to the question is generated in step 430 based on a fusion of the image feature map, the deep question embedding, and the attention weighted image feature map).

Regarding claim 2, Chen discloses the image question answering method according to claim 1, wherein the extracting the question feature representing the semantic meaning of the question (¶34 A recurrent neural network is utilized to predict the next attention region based on the current attention region's location and visual features; ¶53 a LSTM model is employed to generate a dense question embedding to characterize the semantic meaning of questions). 

Regarding claim 3, Chen discloses the image question answering method according to claim 1, wherein the extracting the global feature of the image comprises: extracting the global feature by using a convolutional neural network, wherein the global feature comprises a plurality of area features associated with a plurality of areas of the image (¶56 the visual information in each image is represented as an N.times.N.times.D image feature map; the VGG-19 deep convolutional neural network extracts a D-dimension feature vector for each window).

Regarding claim 7, Chen discloses the image question answering method according to claim 1, wherein the predicting the answer to the question according to the question feature, the area attention feature, and the detection frame attention feature comprises: fusing the question feature and the area attention feature to obtain a first predicted answer to the question (¶39 the answer generation part 220 answers the question using a multi-class classifier on refines of a fusion of the image feature map I 208, the attention weighted image feature map 222, and the dense question embedding 213; ¶59 the answer generation part is a multi-class classifier on the original image feature map, the dense question embedding, and the attention weighted feature map); fusing the question feature and the detection frame attention feature to obtain a second predicted answer to the question (¶39 the answer generation part 220 answers the question using a multi-class classifier on refines of a fusion of the image feature map I 208, the attention weighted image feature map 222, and the dense question embedding 213; ¶59 the answer generation part is a multi-class classifier on the original image feature map, the dense question embedding, and the attention weighted feature map); and obtaining the answer to the question by classifying the first predicted answer to the question and the second predicted answer to the question (¶39 the answer generation part 220 answers the question using a multi-class classifier on refines of a fusion of the image feature map I 208, the attention weighted image feature map 222, and the dense question embedding 213).

Regarding claims 8-10 and 14 (drawn to a device):               
The proposed rejection of Chen, explained in the rejection of method claims 1-3 and 7, anticipates the steps of the device of claims 8-10 and 14 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the argument similar to that presented above for claims 1-3 and 7 are equally applicable to claims 8-10 and 14.

Regarding claims 15-17 (drawn to a CRM):                  
The proposed rejection of Chen, explained in the rejection of method claims 1-3 anticipates the steps of the computer readable medium of claims 15-17 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 1-3 are equally applicable to claims 15-17.

Allowable Subject Matter
Claims 4-6, 11-13 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  

Regarding claim 5, and similarly regarding claims 12 and 19, the prior art of record, alone or in combination, fails to teach at least “wherein the extracting the detection frame feature of the detection frame encircling the object in the image comprises: obtaining a plurality of detection frames encircling the object in the image by using a faster-region convolutional neural network; determining at least one detection frame according to a difference between the object encircled by the plurality of detection frames and a background of the image; extracting at least one detection frame sub-feature according to the at least one detection frame; and obtaining the detection frame feature according to the at least one detection frame sub-feature”.
Claims 6 depends off claim 5, claim 13 depends off claim 12, and claim 20 depends off claim 19, and would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648.  The examiner can normally be reached on Monday-Friday 9-5PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/KEVIN KY/               Primary Examiner, Art Unit 2669