DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Office Action mailed 8/16/2022, applicant has submitted an amendment filed 8/31/2022.
Claim(s) 8-18 has/have been amended.  
EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Anup Shrinivasan Iyer on 9/6/2022.

The application has been amended as follows: 

Amend “the one or more” in line 3 of claim 9 to recite –one or more—(i.e. delete “the”).
Amend “modalities.” In line 5 of claim 9 to recite –modalities associated with the unseen multimodal communication.—.

Amend “comprising code, when executed, causing a first apparatus to:” in lines 3-4 of claim 10 to recite –comprising code that, when executed, causes a first apparatus to:--.

Delete “is further configured ” in line 2 of claim 11.
Delete “is further configured ” in line 2 of claim 12.
Delete “is further configured ” in line 2 of claim 13.
Delete “is further configured ” in line 2 of claim 14.
Delete “is further configured ” in line 2 of claim 15.
Delete “is further configured ” in line 2 of claim 16.
Delete “is further configured ” in line 2 of claim 17.

Delete “is further configured ” in line 2 of claim 18.
Amend “the one or more” in line 3 of claim 18 to recite –one or more—(i.e. delete “the”).
Amend “modalities.” in line 5 of claim 18 to recite –modalities associated with the unseen multimodal communication.—.

Claim Interpretation
As per Claim 2 (and similarly claims 11 and 20): 
“for the one or more multimodal communications” is interpreted as referring to “electronically receive”, not to “the one or more class labels”.
“and the one or more class labels” is interpreted as referring to where the generating of the training dataset is also based on “the one or more class labels”.
As per Claim 5 (and similarly claim 14):
“from the text modality” in line 6 of claim 5 refers to where the one or more features are extracted from.
As per Claim 6 (and similarly claim 15): 
“from the audio modality” in line 6 of claim 6 refers to where the one or more features are extracted from.
As per Claim 7 (and similarly claim 16): 
“from the video modality” in line 6 of claim 7 refers to where the one or more features are extracted from.

Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance:

As per Claim(s) 1 (and similarly claim[s] 10 and 19, and consequently claim[s] 2-9, 11-18, and 20 which depend on claim[s] 1, 10, and 19), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) A system for intelligent multimodal classification in a distributed technical environment, the system comprising: at least one non-transitory storage device; and at least one processing device coupled to the at least one non-transitory storage device, wherein the at least one processing device is configured to: electronically retrieve one or more multimodal communications from a data repository, wherein the one or more multimodal communications comprises one or more communication modalities; initiate one or more feature extraction algorithms on the one or more communication modalities associated with the one or more multimodal communications to extract one or more features from the one or more communication modalities associated with the one or more multimodal communications; generate a training dataset based on at least the one or more features extracted from the one or more communication modalities associated with the one or more multimodal communications; initiate one or more machine learning algorithms on the training dataset to generate a first set of parameters; electronically receive an unseen multimodal communication; generate an unseen dataset based on at least the unseen multimodal communication; classify, using the first set of parameters, the unseen multimodal communication into one or more class labels; and initiate an execution of one or more actions on the unseen multimodal communication based on at least classifying the unseen multimodal communication into the one or more class labels (extracting feature[s] from one or more communication modalities associated with one or more multimodal communications from a data repository, generating a training dataset based on the extracted feature[s], generating a first set of parameters by initiating machine learning on the training dataset [generated based on the extracted feature[s]], classifying an unseen multimodal communication into one or more class labels using the first set of parameters, and initiating action[s] on the unseen multimodal communication based on classifying the unseen multimodal communication into the one or more class labels.)
2005/0131847 teaches “In inductive inference, which has been used thus far in the learning process, one is given data from which one builds a general model and then applies this model to classify new unseen (test) data” (paragraph 437).  This reference describes building a model from data and then applying the model to classify unseen data, and also describes unseen data as being synonymous with test data.  This reference does not appear to describe where the test data and the given data are multimodal communications or communication modalities.
2015/0294194 teaches “classifying a multimodal test object described according to at least one first and one second modality” (Abstract).
2018/0046721 teaches “extracting a plurality of features from the each of the plurality of modalities of content” (claim 4).  Paragraph 19 describes where multiple modalities can be text, image, audio, or video content.  This reference appears to be directed to searching multimodal content (not classifying an unseen multimodal input).
2017/0084295 teaches “In certain embodiments, a module execution interface 420 communicates extracted features produced by the speech feature extraction module 412 to an analytics module 422. The analytics module 422 may be configured to perform further processing on speech information provided by the speech feature extraction module 412. For example, the analytics module 422 may compute additional features (e.g., longitudinal features) using the features extracted from the speech signal by the speech feature extraction module 412. The analytics module 422 may subsequently provide as output information or data, e.g., raw analytics, for use in fusion 430, for example. The fusion module 430 may combine or algorithmically “fuse” speech features (or resultant analytics produced by the platform 400) with other multimodal data. For instance, speech features may be fused with features extracted from data sources of other modalities, such as visual features extracted from images or video, gesture data, etc. The fused multimodal features may provide a more robust indication of speaker state, in some instances” (paragraph 87).  This reference describes extracting features from data sources of multiple modalities.
2021/0192142 teaches “In another possible design of the embodiment of the disclosure, the processing module 802 is further configured to obtain a multimodal data set which includes multiple multimodal content samples, process the multimodal data set to determine an ontology of the multimodal knowledge graph, mine multimodal knowledge node samples of each of the multimodal content samples in the multimodal data set, establish an association relationship between the multimodal knowledge node samples through knowledge graph representation learning, and construct the multimodal knowledge graph based on the association relationship between the multimodal knowledge nodes and the ontology of the multimodal knowledge graph” (paragraph 10).  This reference describes a multimodal data set which includes multimodal content samples and processing the multimodal dataset to determine an ontology.  This reference does not appear to describe where the ontology is used to classify an unseen multimodal input.
2011/0213737 teaches “Training is further enhanced by exploiting the fact that each training dataset 60 is multi-modal, i.e., contains multiple classes or dimensions for a given feature data sample, e.g., feature data sample 42A. Enhanced training is implemented as follows. Once obtained, a given feature data sample 42A is passed into the feature correlation system 44, which finds exclusive groupings of features within the feature data sample 42A that have either the most characteristics in common or the least characteristics in common. Grouping criteria are generally determined a priori. For example, it may be known that financial data and travel data are commonly linked, or that a first health condition is common to a second health condition. The groupings of data become correlated features” (paragraph 25).  This reference describes where a training dataset is multi-modal.

	Upon further search (in response to the amendment filed 8/31/2022):
Perronnin et al. (US 2011/0040711) teaches “The N training images (or, more generally, the N training objects) are processed by a features extractor 42, which extracts low level features from the images, and a vector representation generation module 44, which generates a corresponding N training samples based on the extracted features, each training sample comprising a representative vector with a dimensionality D. In one embodiment, the vector representation generation module 44 generates Fisher vectors, which can be used as vector representations for images. Other vector representations can alternatively be employed, such as a "bag-of-visual-words" vector representation. The N representative vectors are labeled with the classification information of the corresponding N training images 40. The classifier learning system 26 includes an embedding function learning component 46, which, for each of the D dimensions, learns an embedding function for embedding the vectors in a new multi-dimensional space. In the exemplary embodiment, the learning of these functions is performed using a subset of the training samples. The learned embedding functions are then used by an embedding component 48 to embed all the vectors in the new multi-dimension space. The embedded vectors are input to a classifier training module 50. The classifier training module 50 learns values of the classifier parameters using the N labeled embedded representative vectors as a training set in order to generate a trained linear classifier 30” (paragraph 48).  This reference describes extracting features from training objects and then generating training samples based on the extracted features (where the training samples are suggested to be a training dataset), and using the training dataset to train a classifier (which can be interpreted as being defined by a first set of parameters).  Figure 1 depicts where the trained classifier is used to label objects.  Paragraph 17 describes where a digital object can be a combination of image, text, and audio (i.e. multimodal).  Paragraphs 30-31 describe predicting labels of a test sample (a vector representation of an input unlabeled image [which can be interpreted as an image that is “unseen”/has-not-been-seen-and-classified-by-the-system, see also 2005/0131847 which indicates that unseen can be synonymous with test]).  This reference does not appear to teach or suggest where one or more actions are executed on the unseen multimodal communication based on classifying the unseen multimodal communication into one or more class labels (the method in Perronnin appears to end at labeling).
	7689418 teaches “A system for verifying user identity comprising: a conversational system that receives multi-modal inputs from a user interacting with the conversational system during a user session and transforms the received multi-modal inputs into formal commands executable by a program of instructions executable by a processor; and a behavior verifier coupled to the conversational system to extract features from the multi-modal inputs and formal commands, wherein the extracted features include a combination of input modalities representative of the user's current interaction behavior for performing a task during the user session, and wherein the behavior verifier compares the combination of input modalities representative of the user's current interaction behavior for performing the task to a behavior model representative of the user's past interaction behavior comprising a known combination of input modalities for performing the task used by the user during one or more previous user sessions to determine the identity of the user” (claim 1).  This reference describes extracting features from multi-modal inputs and formal commands, but does not appear to perform feature extraction on multi-modal inputs that are stored in a repository.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 9/6/2022
/ERIC YEN/Primary Examiner, Art Unit 2658