DETAILED ACTION
1.	This communication is in response to the Arguments filed on 6/6/2022. Claims 1-20 are pending and have been examined. 
Response to Amendments and Arguments

2.	35 USC 101 rejections of Claims 10-11 are maintained. The examiner acknowledges Specification [0114] “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.” However, as discussed in the interview, the examiner still suggests that the applicant explicitly state “non-transitory computer readable storage media” in Claims 10-11, considering the fact that Claims 10-11 also recite “the one or more computer storage media” which does not exactly match “one or more computer readable storage media.” With the suggested amendment, the rejections will be withdrawn.
35 USC 101 rejections of Claims 12-20 are maintained. As discussed in the interview, the examiner suggested that the applicant consider amending “transmitting” with some hardware terms to avoid the Claims being rejected as all processing steps, interpretable as mathematical manipulations, are performable by a human, with the support of at most a generic computer, where “transmitting” can also be interpreted broadly. With the suggested amendment, the rejections will be withdrawn.
Applicant's arguments with respect to claim rejections under 35 USC 103 have been fully considered, but they are not persuasive. First, with respect to the independent claims 1, 10, 12, the examiner had requested clarification of the difference and relationship among “caption sentence” “textual content word” and “textual content sentence.” This can be very easily done by citing a simple example, however, it appears quite difficult to find such description in the Specification or from the applicant’s response (even with Fig-3). Without that clarification, these three terms are interpreted broadly as simply “text” associated with an image.
 Second, with regard to claims 1, 10-12, the examiner had requested clarification of “continuous semantic relationships.” While the applicant has cited Specification [0020], [0025], [0034], [0068], [0096] and [0100], the key term “continuous” is still insufficiently explained. As such, the term will not be given patentable weight.
Third, the applicant argues that the references do not teach all of the limitations cited in the independent claims. In response, the examiner respectfully disagrees. For such comprehensive arguments, the applicant is referred to the rejection rationale detailed in the 35 USC 103 section. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
3. 	Claims 10-11 reciting a computer program product are rejected under 35 U.S.C. 101 because it is directed to non-statutory subject matter. Based on consideration of all of the relevant factors with respect to the claimed invention as a whole, the claim is held to claim a computer program or software per se. It is noted that a computer program (software, code, instruction, or module) per se, does not fall within any of the four statutory classes and is not eligible for patent protection under 35 USC 101 (see "interim examination instructions for evaluating subject matter eligibility under 35 USC 101” effected on 08/24/2009, and related PTO public documents/guidelines at: http://www.uspto.gov/patents/announce/ bilski guidance.jsp). 
4. 	Claims 12-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
The Supreme Court has long held that “laws of nature, natural phenomena, and abstract ideas are not patentable.” Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014) (quoting Assoc. for Molecular Pathology v. Myriad Genetics, Inc., 133 S. Ct. 2107, 2116 (2013) (internal quotation marks omitted)). The “abstract ideas” category embodies the longstanding rule that an idea, by itself, is not patentable. Alice Corp., 134S. Ct. at 2355 (quoting Gottschalk v. Benson, 409 U.S. 63, 67 (1972)).
In Alice, the Supreme Court sets forth an analytical “framework for distinguishing patents that claim laws of nature, natural phenomena, and abstract ideas [or mental processes[1]] from those that claim patent-eligible applications of those concepts.” Id. at 2355 (citing Mayo Collaborative Servs. v. Prometheus Labs., Inc., 132 S. Ct. 1289, 1296–97 (2012)). The first step in the analysis is to “determine whether the claims at issue are directed to one of those patent-ineligible concepts.” If the claims are directed to a patent-ineligible concept, the second step in the analysis is to consider the elements of the claims “individually and ‘as an ordered combination’” to determine whether there are additional elements that “‘transform the nature of the claim’ into a patent-eligible application.” Id. (quoting Mayo, 132 S. Ct. at 1298, 1297). In other words, the second step is to “search for an ‘inventive concept’—i.e., an element or combination of elements that is ‘sufficient to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself’”. Id. (brackets in original) (quoting Mayo, 132 S. Ct. at 1294). The prohibition against patenting an abstract idea “‘cannot be circumvented by attempting to limit the use of the formula to a particular technological environment’ or adding ‘insignificant post-solution activity.’” Bilski v. Kappos, 561 U.S. 593, 610–11 (2010) (citation omitted).
In applying the framework set out in Alice, and as the first step of the analysis, the examiner finds that the Claims are directed to a patent-ineligible abstract idea which pertains to a human organizing of activities, particularly for converting image and text data into vector representation, where all the steps of the Claims can be performed by a human, with the support of at most a generic computer which does not tie the Claims to a practical application. CyberSource Corp. 654 F.3d at 1372.  “[M]ental processes–or processes of human thinking–standing alone are not patentable even if they have practical application.” In re Comiskey, 554 F.3d 967, 979 (Fed. Cir. 2009); see also Gottschalk v. Benson, 409 U.S. 63, 67 (1972) (“Phenomena of nature . . . , mental processes, and abstract intellectual concepts are not patentable, as they are basic tools of scientific and technological work.”
For the second step of the Alice analysis, the examiner finds that the Claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea to a particular technological environment, such that the claim(s) amounts to significantly more than the abstract idea itself.
Furthermore, as recognized by the Federal Circuit in Ultramercial Inc. v. Hulu, LLC, 772 F.3d 709, 714 (Fed. Cir. 2014) (quoting Bilski v. Kappos, 561 U.S. at 594), the “machine-or-transformation” test can provide a “‘useful clue’ in the second step of the Alice framework.” In re Bilski, 545 F.3d 943, 954 (Fed. Cir. 2008), aff’d, Bilski v. Kappos, 561 U.S. 593 (2010). According to In re Bilski, the transformation (1) must involve an underlying article from one state to a different state or thing, and (2) must be central to the purpose of Appellants’ claimed process.  Id.  While an underlying article can be intangible, such as electrical signals, and transformation can include data transformation, data must represent a physical object or an article. See Arrhythmia Research Tech., Inc. v. Corazoxic Corp., 958 F.2d 1053, 1059 (Fed. Cir. 1992); In re Abele, 684 F.2d 902, 905–06 (CCPA 1982). In this case, the examiner finds that the Claims are neither “tied to a particular machine or apparatus” nor do they “transform a particular article into a different state or thing.” In particular, none of the recited steps recite structural limitations on any apparatus or any specific operations. Although the Claims may have included a processor, memory, input, and output devices, these components are merely part of a generic computer system. Steps that are carried out for input and output are considered insignificant extra-solution activities such as data gathering and/or post solution activity. Therefore, the Claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.  
Claim Rejections - 35 USC § 103
5.	Claims 1-3, 6-12, 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jin, et al. (US 20170206435; hereinafter JIN) in view of Yu, et al. (US 20170127016; hereinafter YU).
As per claim 1, JIN (Title: Embedding Space for Images with Multiple Text Labels) discloses “A computer system comprising: a server comprising at least one processing device and at least one memory device operably coupled to the at least one processing device; and a knowledge base in operable communication with the server (JIN, [0028], the computing device 102 may range from full resource devices with substantial memory and processor resources (e.g., servers, personal computers, game consoles) to a low-resource device with limited memory and/or processing resources (e.g., mobile devices); [0044], image database <read on knowledge base>), the server configured to: 
transmit a plurality of image vectors to a vector space resident within the knowledge base (JIN, [0058], each image can be represented by d-dimensional feature vector; [Abstract], Embedding space <read on vector space> for images with multiple text labels; [0113], modules .. may be stored on or transmitted across some form of computer-readable media. The computer-readable media may include a variety of media that may be accessed by the computing device 802);  
transmit a plurality of caption sentence vectors to the vector space, wherein a respective portion of the plurality of caption sentence vectors have one or more semantic relationships with a respective portion of the plurality of image vectors, thereby defining one or more combined image-word semantic relationships (JIN, [Abstract], The embedding space is trained to semantically relate the embedded text labels <read on caption sentence vectors> so that labels like “sun” and “sunset” are more closely related <read on semantic relationship> than “sun” and “bird”. Training the embedding space also includes mapping representative images, having image content which exemplifies the semantic concepts, to respective text labels <read on ‘combined image-word semantic relationships’); 
transmit a plurality of textual content word vectors to the vector space, wherein a respective portion of the plurality of word vectors have one or more semantic relationships with the respective portion of the plurality of image vectors and the respective portion of caption sentence vectors, thereby further defining the one or more combined image-word semantic relationships (Examiner’s Note: The applicant is requested to clarify the relationship between ‘caption sentence’ and ‘textual content word’ and ‘textual content sentence.’ It appears they can be broadly interpreted as THE SAME text, e.g., Claim 6 indicates that the ‘textual content sentence’ is converted from the ‘textual content word.’ JIN, [0095], the MIE module 114 trains the joint image-text embedding space 302. To do so, the MIE module 114 semantically relates the text labels of the text vocabulary 306, e.g., by leveraging textual data available on the Internet to learn scalable and lexically distributed representations of words to capture semantic meaning among the text labels of the text vocabulary 306 .. the MIE module 114 leverages one or more text modeling architecture techniques to do so, such as the word2vec model, the Glove model, and so on); 
transmit a plurality of [ textual content sentence vectors ] to the vector space, wherein a respective portion of the plurality of textual content sentence vectors have one or more semantic relationships with the respective portion of the plurality of image vectors, the respective portion of caption sentence vectors, and the respective portion of the word vectors, thereby further defining the one or more combined image-word semantic relationships; and establish one or more continuous semantic relationships through the one or more combined image-word semantic relationships (JIN, [Abstract], Training the embedding space also includes mapping representative images, having image content which exemplifies the semantic concepts, to respective text labels <read on ‘combined image-word semantic relationships’>. Examiner’s Note: The applicant is requested to clarify ‘continuous semantic relationships’ which is not defined in the Specification).”
JIN does not expressly disclose “textual content sentence vectors ..” However, the feature is taught by YU (Title: Systems and methods for video paragraph captioning using hierarchical recurrent neural networks).
In the same field of endeavor, YU teaches: [0053] “during the generation of a sentence, an embedding average layer (see section (b) in FIG. 2) 260 accumulates all the word embeddings of the sentence and takes an average to get a compact embedding vector.” 
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of YU in the system (as taught by JIN) as one ready mechanism for producing sentence embedding vectors for subsequent mapping between image and text.
As per claim 2 (dependent on claim 1), JIN in view of YU further discloses “wherein the computer system is a cognitive system (JIN, [0003], The text labels embedded in the embedding space are configured to describe semantic concepts exhibited in image content <read on cognition>, e.g., whether an image includes a beach or a sunset).”  
As per claim 3 (dependent on claim 2), JIN in view of YU further discloses “an artificial intelligence (AI) platform resident within the server, the Al platform in operable communication with the knowledge base (see Claims 1 and 2 rejections where cognition reads on AI), the Al platform comprising: an image manager configured to facilitate execution of one or more operations by the server comprising one or more of: extraction of a plurality of images from one or more image sources (YU, [0071], K image patches were extracted); a natural language processing (NLP) manager (YU, [0025], natural language processing); configured to facilitate execution of the one or more operations by the server comprising one or more of: extraction of at least a portion of textual content (JIN, [0002], Conventional visual-semantic embedding techniques leverage semantic information from unannotated text data to learn semantic relationships between text labels and explicitly map images into a rich semantic embedding space) from one or more text data objects from one or more textual sources (JIN, [0071], K image patches were extracted), wherein the extracted textual content includes a plurality of textual content words; 
a caption manager configured to facilitate the execution of the one or more operations by the server comprising one or more of: assign a caption to each image (YU, [0003], generation of captions for videos) of the plurality of images, thereby generate the plurality of captioned images; and a vector manager configured to facilitate the execution of the one or more operations by the server comprising one or more of: vectorize each image of the plurality of images, thereby generate the plurality of image vectors; vectorize each caption, thereby generate the plurality of caption sentence vectors; vectorize each textual content word, of the plurality of textual content words, thereby generate the plurality of textual content word vectors; generate the plurality of textual content sentence vectors; and populate the vector space with the pluralities of vectors (see Claim 1 rejections).”  
As per claim 6 (dependent on claim 3), JIN in view of YU further discloses “the vector manager further configured to: convert the plurality of textual content word vectors into the plurality of textual content sentence vectors (YU, [0053], during the generation of a sentence, an embedding average layer (see section (b) in FIG. 2) 260 accumulates all the word embeddings of the sentence and takes an average to get a compact embedding vector).” 
As per claim 7 (dependent on claim 3), JIN in view of YU further discloses “define the vector space as an image-word combined vector space with N-dimensional coordinates; define, within the image-word combined vector space, a word vectors domain with the plurality of textual content word vectors; define, within the image-word combined vector space, a sentence vectors domain with the plurality of caption sentence vectors and the textual content sentence vectors; and P201902300US01Page 52 of 58define, within the image-word combined vector space, an image vectors domain with the plurality of image vectors (see Claim 1 rejections).”
As per claim 9 (dependent on claim 3), JIN in view of YU further discloses “the vector manager further configured to: capture image-word linguistic regularities between the plurality of image vectors, the plurality of caption sentence vectors, the plurality of textual content word vectors, and the plurality of textual content sentence vectors (JIN, [Abstract], Training the embedding space also includes mapping representative images, having image content which exemplifies the semantic concepts, to respective text labels <read on ‘image-word linguistic regularities’ where ‘regularities’ mean ‘rules’ - see Specification).”
 Claims 10, 11 (similar in scope to claims 1, 1 and 7 and 9) are rejected under the same rationale as applied above for claims 1, 1 and 7 and 9. 
Claims 12, 15, 16-17, 18, 20 (similar in scope to claims 1, 3, 6-7, 7, 9) are rejected under the same rationale as applied above for claims 1, 3, 6-7, 7, 9.
6.	Claims 4, 13 are rejected under 35 U.S.C. 103 as being unpatentable over JIN in view of YU, and further in view of Qing Yu, et al. (US 20080215561; hereinafter QYU).
As per claim 4 (dependent on claim 3), JIN in view of YU further discloses “filter the plurality of captioned images, wherein the caption manager is further configured to: [ P201902300US01Page 51 of 58execute a contextual review of each captioned image of the plurality of captioned images; determine, subject to the contextual review, a contextual relevance of each captioned image of the plurality of captioned images; compare each respective contextual relevance determination with a relevance threshold; and remove, subject to the comparison, those captioned images of the plurality of captioned images with a contextual relevance that does not meet or exceed the relevance threshold, thereby generating a plurality of relevant images and a respective plurality of relevant captions ] (Examiner’s Note: The claim is unclear without specifying for each captioned image, what ‘context’ is reviewed and compared to in order to determine ‘contextual relevance.’ The examiner considers the applicant’s response insufficient).”
JIN in view of YU does not expressly disclose “P201902300US01Page 51 of 58determine .. a contextual relevance of each captioned image .. compare each respective contextual relevance determination with a relevance threshold; and remove, subject to the comparison, those captioned images of the plurality of captioned images with a contextual relevance that does not meet or exceed the relevance threshold, thereby generating a plurality of relevant images and a respective plurality of relevant captions ..” However, the feature is taught by QYU (Title: Scoring relevance of a document based on image text).
In the same field of endeavor, QYU teaches: [Abstract] “determining relevance of a document having text and images to a text string .. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string” and [0022] “the scoring system may discard images whose importance is below a threshold” which reads on a ready mechanism to check any score (e.g., relevance) and to remove those with scores lower than a threshold as a system design choice.
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of QYU in the system (as taught by JIN and YU) to determine and retain only relevant images for subsequent mapping between image and text.
Claim 13 (similar in scope to claim 4) is rejected under the same rationale as applied above for claim 4.
7.	Claims 5, 14 are rejected under 35 U.S.C. 103 as being unpatentable over JIN in view of YU and QYU, and further in view of Pentheroudakis, et al. (US 7092871; hereinafter Pentheroudakis).
As per claim 5 (dependent on claim 4), JIN in view of YU and QYU further discloses “[ tokenize at least a portion of the plurality of relevant captions, thereby generate a plurality of caption tokens]; generate a plurality of caption word vectors from the plurality of caption tokens; and generate the plurality caption sentence vectors from the plurality of caption tokens.”
JIN in view of YU and QYU does not expressly disclose “P201902300US01Page 51 of 58….tokenize at least a portion of the plurality of relevant captions, thereby generate a plurality of caption tokens ..” However, the feature is taught by Pentheroudakis (Title: Tokenizer for a natural language processing system).
In the same field of endeavor, Pentheroudakis teaches: [Abstract] “The segmenter segments a textual input string into tokens for further natural language processing.”
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of Pentheroudakis in the system (as taught by JIN, YU and QYU) for tokenizing text captions for subsequent converting and mapping between image and text vectors.
Claim 14 (similar in scope to claim 5) is rejected under the same rationale as applied above for claim 5.
8.	Claims 8, 14 are rejected under 35 U.S.C. 103 as being unpatentable over JIN in view of YU and QYU, and further in view of Kuznetsov (US 9495425; hereinafter Kuznetsov).
As per claim 8 (dependent on claim 3), JIN in view of YU further discloses “[ determine one or more sentiments ] associated with one or more image vectors of the plurality of image vectors; determine one or more sentiments associated with one or more caption sentence vectors of the plurality of caption sentence vectors; determine one or more sentiments associated with one or more textual content word vectors of the plurality of textual content word vectors; and determine one or more sentiments associated with one or more textual content sentence vectors of the plurality of textual content sentence vectors (see Claim 1 rejections).”
JIN in view of YU does not expressly disclose “determine one or more sentimentsP201902300US01Page 51 of 58………. ..” However, the feature is taught by Kuznetsov (Title: Sentiment-based classification of media content).
In the same field of endeavor, Kuznetsov teaches: [Summary, Para 1] “generation and use of sentiment scores associated with media content, wherein the sentiment scores indicate different types of sentiment expressed in comments associated with the items of media content. The media content may be video, audio, text, still images or other types of media content.”
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of Kuznetsov in the system (as taught by JIN, YU) for determining user’s sentiment based on the media content, such as image and the associated text.
Claim 19 (similar in scope to claim 8) is rejected under the same rationale as applied above for claim 8.
Conclusion
9.	THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).   
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 		
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:30-5:00). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on 571-272-7799. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/		6/13/2022Primary Examiner, Art Unit 2659                                




    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        [[1] See Gottschalk v. Benson, 409 U.S. 63, 67 (1972) (“Phenomena of nature, though just discovered, mental processes, and abstract intellectual concepts are not patentable, as they are the basic tools of scientific and technological work.”).  (Emphasis added).