Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is a response to U.S. Patent Application No. 16/750,304 filed on 01/23/2020 in which Claims 1 – 12 were presented for examination. 
This application claim priority to Chinese Patent Application No. 201910185125.9 filed on March 12, 2019.
Status of the Claims
Claims 1 – 12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, Claims 1, 2, 5, 6 and 9 – 12 are rejected under 35 U.S.C. 101, Claims 1, 2, 5, 6, 9 and 10 are rejected under 35 U.S.C. 102(a)(1) and Claims 3, 4, 7, 8, 11 and 12 are rejected under 35 U.S.C. 103.

Examiner Note
 	The Examiner cites particular columns, line numbers and/or paragraph numbers in the references as applied to the claims below for the convenience of the Applicant(s). Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.  

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 06/27/2021 and 01/24/2022 have been entered and considered by the examiner.
The information disclosure statement filed 11/02/2020 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed. It has been placed in the application file, but the information referred to therein has not been considered.
Specifically, NPL cite No. 7 , the provided copy is not legible.

Specification
The Abstract should be an adequate, clear and concise statement of the technical disclosure of the patent application.  Also, the Abstract should include that which is new in the art to which the invention pertains.  Additionally, as expressly stated in 37 C.F.R. 1.72(b), the purpose of the Abstract is to enable the reader thereof to quickly determine, from a cursory inspection, the nature and gist of the technical disclosure.
The Abstract of the disclosure for the present application is objected to because it fails to satisfy these guidelines.    
Correction is required.  See MPEP § 608.01(b).



Drawings
The drawings in Figures 1 and 2 are objected to because of one or more of the following reasons:
the text is small, unfocused and/or difficult to read.
Applicant should amend all figures in the drawings so that all text, icons, elements and/or GUIs are easily readable/seen and/or easily understood.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application.  Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended.  The figure or figure number of an amended drawing should not be labeled as “amended.”  If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency.  Additional replacement sheets may be necessary to show the renumbering of the remaining figures.  Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d).  If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action.  The objection to the drawings will not be held in abeyance.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “text encoder is configured to perform…” and “image encoder is configured to extract an image feature…” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitation “text encoder is configured to perform…” and “image encoder is configured to extract an image feature…” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
The written description fails to clearly link or associate the disclosed structure, material, or acts to the claimed function such that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function.
Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

		Due to at least their dependency upon Claim 1, Claims 2 – 4 are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Claims 1 – 12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
A broad limitation together with a narrow limitation that falls within the broad limitation (in the same claim) may be considered indefinite if the resulting claim does not clearly set forth the metes and bounds of the patent protection desired. See MPEP § 2173.05(c). In the present instance, claim 1 recites the broad recitation “wherein the text encoder is configured to perform pooling on a word vector sequence of a question text inputted”, and the claim also recites “so as to extract a semantic representation vector of the question text” which is the narrower statement of the limitation. The claim(s) are considered indefinite because there is a question or doubt as to whether the feature introduced by such narrower language is (a) merely exemplary of the remainder of the claim, and therefore not required, or (b) a required feature of the claims.
Claims 5 and 9 recites similar claim language as claim 1, accordingly, Claims 5 and 9 are also indefinite.
Due to at least their dependency upon Claims 1, 5 or 9; Claims 2 – 4, 6 – 8 and 10 – 12 are also indefinite.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 5, 6, 9 and 10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without significantly more. The claims recite the analysis of data to obtain semantic representation of the data and analysis of an image to obtain an image feature associated with the semantic representation of the data.
Regarding Claim 1, the limitations “wherein the text encoder is configured to perform pooling on a word vector sequence of a question text inputted, so as to extract a semantic representation vector of the question text” and “the image encoder is configured to extract an image feature of a given image in combination with the semantic representation vector” under the broadest reasonable interpretation, covers performance of these limitations in the mind and/or “by a human using a pen and paper” See MPEP 2106.04(a)(2)(III). The claim limitations are analogous to analyzing text in order to extract a semantic representation of the received text, and analyzing an image to identify a feature of the image associated with the semantic representation of the received text. If a claim limitation, under its broadest reasonable interpretation convers performance of the limitation in the mind but for the recitation of generic computer components, then it falls under the “mental processes” grouping of abstract idea.
This judicial exception is not integrated into a practical application. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, when considered individually or as whole. 
Claims 5 and 9 are rejected under the same rationale as claim 1 above. 
Claims 5 and 9 recites the use of a processor and storage device. However, the elements in both steps are recited at a high level of generality (i.e. as a generic processor and a generic memory) such that it amounts no more than mere instructions to apply the exception using generic computer components. Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea (See MPEP 2106.05(a).
Regarding Claims 2, 6 and 10, the claims do not include any additional elements that are sufficient to amount to significantly more than the judicial exception when considered individually or as whole. For example, Claims 2, 6 and 10 merely perform an type of analysis on the obtained data in order to obtain the semantic representation, which is an insignificant extra solution activity in the form of manipulating the gathered data. SEE MPEP 2106.05(g).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 9 – 12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. 
In summary Claim 9 recites a “computer readable storage medium” having a computer program stored  to perform various functions. In the specification of the present application, the “computer readable storage medium” is expressly defined as including transmission media (See Page 11, lines 6 – 18, The computer readable storage medium may be, but is not limited to, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, component or any combination thereof). Thus, the broadest reasonable interpretation of “computer readable storage medium” encompasses nonstatutory subject matter (transmission media) that is unpatentable under 35 U.S.C. 101.
Accordingly, Claim 9 fails to recite statutory subject matter under 35 U.S.C. 101.
Claims 10 – 12 merely recite either additional functions performed by the instructions or additional descriptions of electronic data. Accordingly, Claims 10 – 12 also fail to recite statutory subject matter under 35 U.S.C. 101.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 5, 6, 9 and 10 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang, P. et al.  “The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions” published on 16 December 2016 (cited in IDS dated 11/02/2020) (hereinafter, Wang).

Regarding Claim 1, Wang teaches a visual question answering model (See Wang’s Abstract and Section 3, VQA model), comprising
an image encoder (Wang in section 3.1 teaches Encoding image regions, the input image is resized and divided to regions) and a text encoder (Wang in Section 3.1 teaches Hierarchy Question Encoding. It is used to effectively capture the information from a question at multiple scales, i.e., word, phrase and sentence level) 
wherein the text encoder is configured to perform pooling on a word vector sequence of a question text inputted, so as to extract a semantic representation vector of the question text (Wang in section 3.1, further teaches that one hot vectors of question words are embedded individually to continuous vectors, using a linear transformation followed by a tanh function. Then 1-D convolutions with different filter sizes are applied to the word level embeddings, followed by a max-pooling over different filters at each word location, to form the phrase level features. Finally the phrase-level features are further encoded by an LSTM. Resulting in the question level features); and 
the image encoder is configured to extract an image feature of a given image (Wang in section 3.1 teaches Encoding image regions, the input image is resized and divided to regions. The output is V=[V1, …, VN](N=196), are taken as image features) in combination with the semantic representation vector (Wang in section 3.2, further teaches that in the co-attention approach, the encoded question/image/fact features are sequentially fed into the attention module as input sequences and weighted features. Finally, the weighted question/image/fact features are further used for answer prediction and the attention weights of the last attention module are used for reasons generation).  

Regarding Claim 2, Wang teaches the limitations contained in parent Claim 1. Wang further teaches:
wherein the text encoder is configured to: perform maxPooling processing or avgPooling processing on the word vector sequence of the question text to extract the semantic representation vector of the question text (Wang in section 3.1, further teaches that one hot vectors of question words are embedded individually to continuous vectors, using a linear transformation followed by a tanh function. Then 1-D convolutions with different filter sizes are applied to the word level embeddings, followed by a max-pooling over different filters at each word location, to form the phrase level features).

Regarding Claim 5, this Claim merely recites an electronic device comprising: one or more processors; and a storage device, configured to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are configured to operate a visual question answering model as similarly disclose in Claim 1. Accordingly, Wang discloses/teaches every limitation of Claim 5, as indicated in the above rejection of Claim 1.

Regarding Claim 6, this Claim merely recites an electronic device comprising: one or more processors; and a storage device, configured to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are configured to operate a visual question answering model as similarly disclose in Claim 2. Accordingly, Wang discloses/teaches every limitation of Claim 6, as indicated in the above rejection of Claim 2.

Regarding Claim 9, this claim merely recites a computer readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the program operates a visual question answering model as similarly disclose in Claim 1. Accordingly, Wang discloses/teaches every limitation of Claim 9, as indicated in the above rejection of Claim 1.

Regarding Claim 10, this claim merely recites a computer readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the program operates a visual question answering model as similarly disclose in Claim 2. Accordingly, Wang discloses/teaches every limitation of Claim 10, as indicated in the above rejection of Claim 2.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 3, 4, 7, 8, 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Lee et al. (US 2020/0097604) (hereinafter, Lee).

Regarding Claim 3, Wang teaches the limitations contained in parent Claim 2. Wang further teaches:
Wang in section 3.1, further teaches that one hot vectors of question words are embedded individually to continuous vectors, using a linear transformation followed by a tanh function. Then 1-D convolutions with different filter sizes are applied to the word level embeddings, followed by a max-pooling over different filters at each word location, to form the phrase level features.
However, Wang does not specifically describe the max pooling function, thus Wang does not specifically disclose wherein the maxPooling processing is expressed by an equation of: 
f(w1, w2, . . . , wk)=max([w1, w2, . . . , wk], dim=1) 
where f represents a function of the maxPooling processing; k is a number of word vectors contained in the question text; wi is an ith word vector obtained by processing the question text with a pre-trained word vector model, and i is a natural number in [1, k]; and max([w1, w2, . . . , wk], dim=1) represents determining a maximum value from word vectors w1, w2, . . . , wk corresponding to dim=1, in which dim=1 represents determining a value by row. 
Lee in par 0017, teaches matching images and text to discover the full latent alignments between an image and a sentence using both the regions in the image and the words in the sentence as context to infer overall similarity between the image and the sentence. Lee in par 0064 - 0066, further teaches pooling functions. The overall similarity between the image and the sentence may be scored by summarizing the plurality of region-sentence relevance scores Ri using a pooling method (e.g., averaging, taking the maximum, or any other pooling technique). A pooling module may pool the region-sentence relevance scores Ri to calculate an image-sentence similarity scores S. The pooling module may aggregate or summarize the plurality of region-sentence scores Ri in various ways including using, for example, a summation function (SUM) or a maximum function (MAX). The image sentence similarity score S(I, T) approximates to maxi=1kR(vi, ait).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data to utilize the teachings as in Lee with the teaching as in Wang, in order to process the vector sequences of Wang with the pooling functions as describe in Lee. The motivation for doing so would have been to sue a function that allow the user to obtain a similarity score that indicates a degree of similarity between the first data and the second data (See Lee’s Abstract and par 0064).

Regarding Claim 4, Wang teaches the limitations contained in parent Claim 2. Wang further teaches:
Wang in section 3.1, further teaches that one hot vectors of question words are embedded individually to continuous vectors, using a linear transformation followed by a tanh function. Then 1-D convolutions with different filter sizes are applied to the word level embeddings, followed by a max-pooling over different filters at each word location, to form the phrase level features.
However, Wang does not specifically describe the avgPooling function, thus Wang does not specifically disclose wherein the avgPooling processing is expressed by an equation of: 
                
                    p
                    (
                    w
                    1
                    ,
                     
                    w
                    2
                    ,
                     
                    …
                    …
                    ,
                     
                    w
                    k
                    )
                    =
                    
                        
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        k
                                    
                                
                                
                                    w
                                    i
                                
                            
                        
                        
                            k
                        
                    
                
            
where p represents a function of the avgPooling processing; k is a number of word vectors contained in the question text; wi is an ith word vector obtained by processing the question text with a pre-trained word vector model, and i is a natural number in [1, k]; and                         
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        k
                                    
                                
                                
                                    w
                                    i
                                
                            
                        
                     represents a sum of values of word vectors w1, w2, . . . , wk in each row.
Lee in par 0017, teaches matching images and text to discover the full latent alignments between an image and a sentence using both the regions in the image and the words in the sentence as context to infer overall similarity between the image and the sentence. Lee in par 0064 - 0068, .further teaches pooling functions. The overall similarity between the image and the sentence may be scored by summarizing the plurality of region-sentence relevance scores Ri using a pooling method (e.g., averaging, taking the maximum, or any other pooling technique). A pooling module may pool the region-sentence relevance scores Ri to calculate an image-sentence similarity scores S. The pooling module may aggregate or summarize the plurality of region-sentence scores Ri in various ways including using, for example, a summation function (SUM) or a maximum function (MAX). The image sentence similarity score S(I, T) approximates to maxi=1kR(vi, ait). The pooling module can calculate the image-sentence similarity score S by summarizing the region-sentence relevance scores R(vi, ait) with an average pooling function (AVG).                          
                            S
                            a
                            v
                            g
                            
                                
                                    I
                                    ,
                                     
                                    T
                                
                            
                            =
                            
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                k
                                            
                                        
                                        
                                            R
                                            (
                                            
                                                
                                                    V
                                                
                                                
                                                    i
                                                
                                            
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    i
                                                
                                                
                                                    t
                                                
                                            
                                            )
                                        
                                    
                                
                                
                                    k
                                
                            
                        
                    
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data to utilize the teachings as in Lee with the teaching as in Wang, in order to process the vector sequences of Wang with the pooling functions as describe in Lee. The motivation for doing so would have been to sue a function that allow the user to obtain a similarity score that indicates a degree of similarity between the first data and the second data (See Lee’s Abstract and par 0064).

Regarding Claim 7, this Claim merely recites an electronic device comprising: one or more processors; and a storage device, configured to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are configured to operate a visual question answering model as similarly disclose in Claim 3. Accordingly, Wang in view of Lee discloses/teaches every limitation of Claim 7, as indicated in the above rejection of Claim 3.

Regarding Claim 8, this Claim merely recites an electronic device comprising: one or more processors; and a storage device, configured to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are configured to operate a visual question answering model as similarly disclose in Claim 4. Accordingly, Wang in view of Lee discloses/teaches every limitation of Claim 8, as indicated in the above rejection of Claim 4.

Regarding Claim 11, this claim merely recites a computer readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the program operates a visual question answering model as similarly disclose in Claim 3. Accordingly, Wang in view of Lee discloses/teaches every limitation of Claim 11, as indicated in the above rejection of Claim 3.

Regarding Claim 12, this claim merely recites a computer readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the program operates a visual question answering model as similarly disclose in Claim 4. Accordingly, Wang in view of Lee discloses/teaches every limitation of Claim 12, as indicated in the above rejection of Claim 4.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARIEL MERCADO VARGAS whose telephone number is (571)270-1701. The examiner can normally be reached M-F 8:00am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on 571-272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ARIEL MERCADO/           Primary Examiner, Art Unit 2176