DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


1.      Claims 1, 3-10 and 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Mane et al US PGPUB 2021/003445 A1 (“Mane”) in view of Yang et al “Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards” (“Yang” - IDS) and Kiela et al “Supervised Multimodal Bitransformers for Classifying Images and Text” (“Kiela”)
      Per Claim 1, Mane discloses a method comprising: 
             selecting, at a server, product corpus data stored in a storage device communicatively coupled to the server that includes at least one selected from the group consisting of: a product name, an image, text, audio, video, or metadata to generate a dataset for a product (The front-end system 24 may be any suitable system, such as, for example, a web server. The front-end system 24 is in communication with a plurality of back-end systems, such as, for example, an item recommendation system 26, a triplet network training system 28, and/or any other suitable system. The back-end systems may be in communication with one or databases…, para. [0037]; para. [0041]; The plurality of item descriptors 250a-250c may include, but are not limited to, text-based descriptors 250a (such as text descriptions of products), visual descriptors 250b (such as images or videos illustrating a product), product attribute descriptors 250c (such as, but not limited to, brand, color, finish, material, style, category-specific style, product type, primary price, room location, category, subcategory, title, product description, etc.), and/or any other suitable item descriptors…, para. [0042]-[0043], video as including audio); 
            clustering and filtering at the server using natural language processing, the dataset for valid descriptions of the product (para. [0041]-[0043]), 
           instantiating, training, performing, generating and outputting at the server (fig. 2; fig. 3; fig. 4; para. [0040]; At step 102, one or more item descriptors are received and preprocessed by a system, such as the item recommendation system 26. The item descriptors may be received from, for example, a product attributes database 30…, para. [0041]),
           outputting, the product description for an electronic product catalog (para. [0067])
           Mane does not explicitly disclose the product having a predetermined sentence length and normal natural language structure, instantiating, a transformer of a multi-modal conditioned natural language generator based on the clustered and filtered dataset, training, the instantiated transformer of the multi-modal conditioned natural language generator, performing, an evaluation of an output of the transformer of the multi-modal conditioned natural language generator or generating, a product description based on the evaluated transformer using the clustered and filtered dataset
            However, these features are taught by Yang:
           the product having a predetermined sentence length and normal natural language structure (sec. 1; sec. 4.1; sec. 5.1); 
           instantiating, a transformer of a multi-modal conditioned natural language generator based on the clustered and filtered dataset (sec. 4.1);
           training, the instantiated transformer of the multi-modal conditioned natural language generator (sec. 4.1); 
          performing, an evaluation of an output of the transformer of the multi-modal conditioned natural language generator (sec. 4.3); 
          generating, a product description based on the evaluated transformer using the clustered and filtered dataset (sec. 5.2)
            Mane in view of Yang does not explicitly disclose generating, a product description based on a multi-modal conditionality of the product
            However, this feature is taught by Kiela (For tasks that consist of a single text and single image input, we assign token inputs to one segment ID and image embeddings to another. We use 0-indexed positional coding for each segment, i.e., we start counting from 0 for each segment…, sec. 2.2); and 
           It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Yang with the method of Mane in arriving at “the product having a predetermined sentence length and normal natural language structure, instantiating, a transformer of a multi-modal conditioned natural language generator based on the clustered and filtered dataset, training, the instantiated transformer of the multi-modal conditioned natural language generator, performing, an evaluation of an output of the transformer of the multi-modal conditioned natural language generator or generating, a product description based on the evaluated transformer using the clustered and filtered dataset”, as well as to combine the teachings of Kiela with the method of Mane in view of Yang in arriving at “generating, a product description based on a multi-modal conditionality of the product”, because such combinations would have resulted in generating accurate descriptions for online fashion items so as to enhance customers' shopping experiences (Yang, Abstract) as well as in improving textual classification tasks (Kiela, Abstract; sec. 2.2).
          Per Claim 3, Mane in view of Yang and Kiela discloses the method of claim 1, 
              Yang discloses wherein the clustering and filtering further comprises: removing one or more characters of the dataset based on a predetermined list of characters (We lowercase all sentences and discard non-alphanumeric characters..., sec. 5.1). 
           Per Claim 4, Mane in view of Yang and Kiela discloses the method of claim 1, 
               Yang discloses wherein the training further comprises: weighting one or more parameters of the multi-modal conditioned natural language generator (sec. 4.1); and
               training, at the server, the transformer of the multi-modal conditioned natural language generator by updating the weighted parameters (We dynamically re-weight the input image features…, sec. 4.1). 
           Per Claim 5, Mane in view of Yang and Kiela discloses the method of claim 1, 
                Yang discloses wherein the performing the evaluation further comprises: scoring the performance of the multi-modal conditioned natural language generator (pg. 3); 
              quantitatively analyzing the multi-modal conditioned natural language generator based on the scored performance (we use the output probability of the generated sentence as the groundtruth category as the SLS reward..., pg. 3). 
           Per Claim 6, Mane in view of Yang and Kiela discloses the method of claim 1,
                Kiela discloses wherein the generating the product description further comprises: embedding tokens for the clustered and filtered dataset (sec. 2.2; sec. 3.1); 
              determining positional encoding for each of the embedded tokens (sec. 2.2); and
              combining the embedded tokens and the positional encoding for each of the tokens to generate the multi-modal conditionality (sec. 2.2). 
          Per Claim 7, Mane in view of Yang and Kiela discloses the method of claim 6,
              Yang discloses decoding, at the transformer, the multi-modal conditionality to the product description into a predetermined natural language (sec. 1; sec. 2). 
           Per Claim 8, Mane in view of Yang and Kiela discloses the method of claim 7, 
              Mane discloses determining, at the server (para. [0040]-[0041])
             Yang discloses determining a language modeling loss to determine whether there is a loss between the generated product description and the product description in the predetermined natural language (sec. 5.1). 
          Per Claim 9, Mane in view of Yang and Kiela discloses the method of claim 1,   
             Mane discloses transmitting, at the server, one or more natural language words for the product description to a user interface based on at least one input received by the user interface (fig. 4; para. [0067]). 
        Per Claim 10, Mane discloses a system comprising:
             a server having a processor and memory (para. [0020]; para. [0037]) to: 
             select product corpus data stored in the memory that includes at least one selected from the group consisting of: a product name, an image, text, audio, video, or metadata to generate a dataset for a product (The front-end system 24 may be any suitable system, such as, for example, a web server. The front-end system 24 is in communication with a plurality of back-end systems, such as, for example, an item recommendation system 26, a triplet network training system 28, and/or any other suitable system. The back-end systems may be in communication with one or databases…, para. [0037]; para. [0041]; The plurality of item descriptors 250a-250c may include, but are not limited to, text-based descriptors 250a (such as text descriptions of products), visual descriptors 250b (such as images or videos illustrating a product), product attribute descriptors…, para. [0042], video as including audio);
            cluster and filter, using natural language processing, the dataset for valid descriptions of the product (para. [0041]-[0043]),
           output the product description for an electronic product catalog (para. [0067])
           Mane does not explicitly disclose the product having a predetermined sentence length and normal natural language structure, instantiate a transformer of a multi-modal conditioned natural language generator based on the clustered and filtered dataset, train the instantiated transformer of the multi-modal conditioned natural language generator, perform an evaluation of an output of the transformer of the multi-modal conditioned natural language generator or generate a product description based on the evaluated transformer using the clustered and filtered dataset and a multi-modal conditionality of the product
            However, these features are taught by Yang:
            the product having a predetermined sentence length and normal natural language structure (sec. 1; sec. 4.1; sec. 5.1);
           instantiate a transformer of a multi-modal conditioned natural language generator based on the clustered and filtered dataset (sec. 4.1);
           train the instantiated transformer of the multi-modal conditioned natural language generator (sec. 4.1); 
        perform an evaluation of an output of the transformer of the multi-modal conditioned natural language generator (sec. 4.3); 
           generate a product description based on the evaluated transformer using the clustered and filtered dataset (sec. 5.2) 
          Mane in view of Yang does not explicitly disclose to generate a product description based on a multi-modal conditionality of the product
          However, this feature is taught by Kiela (For tasks that consist of a single text and single image input, we assign token inputs to one segment ID and image embeddings to another. We use 0-indexed positional coding for each segment, i.e., we start counting from 0 for each segment…, sec. 2.2);    
        It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Yang with the system of Mane in arriving at to “the product having a predetermined sentence length and normal natural language structure, instantiate a transformer of a multi-modal conditioned natural language generator based on the clustered and filtered dataset, train the instantiated transformer of the multi-modal conditioned natural language generator, perform an evaluation of an output of the transformer of the multi-modal conditioned natural language generator or generate a product description based on the evaluated transformer using the clustered and filtered dataset and a multi-modal conditionality of the product”, as well as to combine the teachings of Kiela with the system of Mane in view of Yang in arriving at to “generate a product description based on a multi-modal conditionality of the product”, because such combinations would have resulted in generating accurate descriptions for online fashion items so as to enhance customers' shopping experiences (Yang, Abstract) as well as in improving textual classification tasks (Kiela, Abstract; sec. 2.2).
          Per Claim 12, Mane in view of Yang and Kiela discloses the system of claim 10,
             Mane discloses the server (para. [0040]-[0041])
             Yang discloses clusters and filters by removing one or more characters of the dataset based on a predetermined list of characters (We lowercase all sentences and discard non-alphanumeric characters..., sec. 5.1).
          Per Claim 13, Mane in view of Yang and Kiela discloses the system of claim 10,
              Mane discloses the server (para. [0040]-[0041])
              Yang discloses: trains by weighting one or more parameters of the multi-modal conditioned natural language generator, and training the transformer of the multi-modal conditioned natural language generator by updating the weighted parameters (We dynamically re-weight the input image features…, sec. 4.1).  
           Per Claim 14, Mane in view of Yang and Kiela discloses the system of claim 10, 
             Mane discloses the server (para. [0040]-[0041])
             Yang discloses performs the evaluation by scoring the performance of the multi-modal conditioned natural language generator and quantitatively analyzing the multi-modal conditioned natural language generator based on the scored performance (we use the output probability of the generated sentence as the groundtruth category as the SLS reward..., pg. 3). 
          Per Claim 15, Mane in view of Yang and Kiela discloses the system of claim 10,
              Mane discloses the server (para. [0040]-[0041])
              Kiela discloses generates the product description by embedding tokens for the clustered and filtered dataset, determining positional encoding for each of the embedded tokens, and combining the embedded tokens and the positional encoding for each of the tokens to generate the multi-modal conditionality (sec. 2.2; sec. 3.1). 
          Per Claim 16, Mane in view of Yang and Kiela discloses the system of claim 15,
             Yang discloses wherein the transformer decodes the multi-modal conditionality to the product description into a predetermined natural language (sec. 1; sec. 2).
          Per Claim 17, Mane in view of Yang and Kiela discloses the system of claim 16, 
                Yang discloses wherein the server determines a language modeling loss to determine whether there is a loss between the generated product description and the product description in the predetermined natural language (sec. 5.1).
        Per Claim 18, Mane in view of Yang and Kiela discloses the system of claim 10, 
              Mane discloses wherein the server transmits one or more natural language words for the product description to a user interface based on at least one input received by the user interface (fig. 4; para. [0067]).

2.      Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over 
Mane in view of Yang and Kiela as applied to claims 1 and 10 above, and further in view of Mane et al “Product Title Generation for Conversational Systems using BERT” (“Mane2”)
           Per Claim 2, Mane in view of Yang and Kiela discloses the method of claim 1,        
              Mane in view of Yang and Kiela does not explicitly disclose wherein the clustering and filtering further comprises: translating one or more words of the dataset from a first natural language to a predetermined natural language
              However, this feature is taught by Mane2 (The following sections provides a summary of related work followed by a description of methods applied to convert web-based short titles of products (sequence of words in English) into more naturally spoken summary titles (sequence of words in English) for voice-based applications…, sec. 1)
           It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Mane2 with the method of Mane in view of Yang and Kiela in arriving at ”wherein the clustering and filtering further comprises: translating one or more words of the dataset from a first natural language to a predetermined natural language“, because such combinations would have resulted in providing grammatically correct sequences (Mane2, sec. 1).
           Per Claim 11, Mane in view of Yang and Kiela discloses the system of claim 10, 
             Mane discloses the use of a server (para. [0040]-[0041])
             Mane in view of Yang and Kiela does not explicitly disclose wherein the server clusters and filters by translating one or more words of the dataset from a first natural language to a predetermined natural language.
              However, this feature is taught by Mane2 (The following sections provides a summary of related work followed by a description of methods applied to convert web-based short titles of products (sequence of words in English) into more naturally spoken summary titles (sequence of words in English) for voice-based applications…, sec. 1)
           It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Mane2 with the system of Mane in view of Yang and Kiela in arriving at “wherein the server clusters and filters by translating one or more words of the dataset from a first natural language to a predetermined natural language“, because such combinations would have resulted in providing grammatically correct sequences (Mane2, sec. 1).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. PTO 892 form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUJIMI A ADESANYA whose telephone number is (571)270-3307. The examiner can normally be reached Monday-Friday 8:30-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OLUJIMI A ADESANYA/Primary Examiner, Art Unit 2658