Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kesorn et al (“Enhanced Sports Image Annotation and Retrieval Based Upon Semantic Analysis of Multimodal Cues”, SRD PACIFIC RIM SYMPOSIUM ON IMAGE AND VIDEO TECHNOLOGY, PSIVT 2009, vol, 5414 Chap.71, no. 558, 13 January 2009 (2009-01-13), - 16 January 2009 (2009-01-16), pages 817-828)

As per claims 1,16, Kesorn et al (Enhance Sports Image Annotation…) teaches:
 a multimodal content processing method, comprising: receiving a content processing request of a user, the content processing request being configured to request semantic understanding of multimodal content to be processed (see Fig. 1, HTML documents input with multimodal features – visual analysis, linguistic analysis, and LS algorithm);
 analyzing the multimodal content to obtain multimodal knowledge nodes corresponding to the multimodal content (Fig. 1, as generating color edge info, annotations, and a LS matrix);
 and determining a semantic understanding result of the multimodal content according to the multimodal knowledge nodes (as generating a semantic metadata – pp823, section 3.4, wherein the metadata is a fused learned matrix based on the semantic linguistic analysis and the semantic visual analysis – the linguistic/visual representing 2 modes), a pre-constructed multimodal knowledge graph and the multimodal content, the multimodal knowledge graph comprising: the multimodal knowledge nodes and an association relationship between the multimodal knowledge nodes (as, the generated semantic metadata noted above, is a knowledge graph – see fig. 1, semantic mode – the illustrated knowledge graph; as well as the relationships are from a combination of the visual and linguistic analysis – see also the detailed mapping process, in section 3.4 and figure 3).

As per claim 2, Kesorn et al (Enhance Sports Image Annotation…) teaches the multimodal content processing method according to claim 1, wherein the determining a semantic understanding result of the multimodal content according to the multimodal knowledge nodes, a pre-constructed multimodal knowledge graph and the multimodal content comprises:
determining the association relationship between the multimodal knowledge nodes according to the multimodal knowledge nodes and the multimodal knowledge graph (as establishing ontology instances and their properties, with the metadata assigned to ontology instances – pp823, section 3.4, numbers 2,3);
determining a basic semantic understanding result of the multimodal content according to the multimodal knowledge nodes and a preset semantic understanding method (as the created group record set metadata – pp 823, section 3.4, number 3); 
and determining the semantic understanding result of the multimodal content according to the association relationship between the multimodal knowledge nodes, the basic semantic understanding result and the multimodal knowledge graph (as generating a semantic model via the RDBMS/RDF mapping – fig. 3).

As per claim 3, Kesorn et al (Enhance Sports Image Annotation…) teaches the multimodal content processing method according to claim 2, wherein the basic semantic understanding result comprises: 
at least one of a first semantic understanding result or a second semantic understanding result (as, a first semantic understanding based on a specific sport type based on the image relationship to the query keyword – pp 825, section 4.3, Query 1; and a second semantic understanding result based on images semantically related to something using knowledge base – pp 825, section4.3, Query 2); the first semantic understanding result is obtained by performing semantic understanding on the multimodal content according to the multimodal knowledge nodes and a preset deep learning method (as, Query 1,shows improvement in a interpolated average precision graph – pp825, section 4.4, first 7 lines); 
the second semantic understanding result is obtained by fusing multiple single-modal semantic understanding results corresponding to the multimodal knowledge nodes according to a preset fusion method (as combining/fusing image/context data -- pp826, first paragraph; reflecting back on the constructed semantic knowledge graph – fig. 1, metadata storage).

As per claim 4, Kesorn et al (Enhance Sports Image Annotation…) teaches the multimodal content processing method according to claim 1, further comprising: 
obtaining a multimodal data set which comprises multiple multimodal content samples; processing the multimodal data set to determine an ontology of the multimodal knowledge graph (as establishing ontology instances and their properties, with the metadata assigned to ontology instances – pp823, section 3.4, numbers 2,3); 
mining multimodal knowledge node samples of each of the multimodal content samples in the multimodal data set; establishing an association relationship between the multimodal knowledge node samples through knowledge graph representation learning (as the created group record set metadata – pp 823, section 3.4, number 3); 
and constructing the multimodal knowledge graph based on the association relationship between the multimodal knowledge node samples and the ontology of the multimodal knowledge graph (as generating a semantic model via the RDBMS/RDF mapping – fig. 3).

As per claim 5, Kesorn et al (Enhance Sports Image Annotation…) teaches the multimodal content processing method according to claim 1, further comprising: outputting a semantic understanding result of the multimodal content based on a semantic representation method of a knowledge graph (as evaluating the retrieval performance – pp 825 section 4.3, using the semantic representation of the knowledge graph – section 3.4).

As per claim 6, Kesorn et al (Enhance Sports Image Annotation…) teaches the multimodal content processing method according to claim 1, further comprising: obtaining a recommended resource of the same type as the multimodal content according to a vector representation of the semantic understanding result (as, using Hypotheses/Query 2 – pp825, Q2 – finding a source that has photographs that are semantically related to something); and pushing the recommended resource to the user (as using Q2 results to recommend a knowledge base that has both textual and visual information – pp825, last 3 lines).

As per claim 7, Kesorn et al (Enhance Sports Image Annotation…) teaches the multimodal content processing method according to claim 1, further comprising: determining a text understanding result of the multimodal content according to the vector representation of the semantic understanding result (as, gleaning relationships between concepts which are note directly mentioned in the text – pp824, section 4.1, hypothesis 3); and performing a search process to obtain a search result for the multimodal content according to the text understanding result (as performing the search using hypothesis 3, and generating an improved retrieval performance – pp 826, first paragraph, last 5 lines).


	Claims 8-14,15 are electronic device/nontransitory computer readable medium claims that perform the method steps of claims 1-7 above and as such, claims 8-14,15 are similar in scope and content to claims 1-7 above; therefore, claims 8-14,15 are rejected under similar rationale as presented against claims 1-7 above.  Further as to claims 8-14, 15, Kesorn et al (Enhance Sports ImageAnnotation…) teaches the execution of the method steps on a computer/computing device –end of Section 4, wherein MTLB is used to perform the steps, as well as Figure 5 operating on images – one of ordinary skill would easily recognize that a computer/computing device is needed to scan in the imagery with the computing device further performing the disclosed multimodal processing.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see related art listed on the PTO-892 form.
The following references were found to calculate cross-semantic information on multimodal data sets:
Eledath (20160378861) teaches multimodal semantic analysis across images and text/natural language recognition (para 0037)

Howard (20200036810) teaches semantic analysis combining visual/image information with a knowledge base in linguistics (0053,0076, 0083-0085).

Natarajan et al (20190325080) teaches processing multimodal input systems for assistant systems (para 0007). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        08/26/2022