DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is in response to the submission filed April 15, 2019.  Claims 1-48 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on July 22, 2019, is being considered by the examiner.

Claim Objections
Claims 37-40 are objected to because of the following informalities:  claim 37 currently depends from itself.  For prosecution purposes, the claim will be treated as if the claim depends from claim 24.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 11, 14-15, 35, and 39-40 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly 
Claim 11 recites the limitation "step d" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim.
Claim 14 recites the limitation "step d" in lines 4.  There is insufficient antecedent basis for this limitation in the claim.
Claim 35 recites the limitation "step d" in lines 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 39 recites the limitation "step d" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-48 are rejected under 35 U.S.C. 103 as being unpatentable over Maes et al (US Patent Application Publication No. 2002/0135618) in view of Dimitriadis et al (US Patent Application Publication No. 2015/0058004).  
Maes discloses a system and method for multi-modal focus detection and mood classification using multi-modal input.  Regarding claim 1, Maes teaches a method for multimodal classification (abstract; para [0012]), comprising the steps of; a) extracting fundamental frequency information from an audio input (para [0193], [0230]); b) extracting other feature information from one or more other inputs (para [0070], [0073]); c) classifying the fundamental frequency information and the other feature information (para [0185], [0192], [0213], [0161], [0216], [0224-0225], [0230]-[0232]).  Maes fails to teach using a multimodal neural network.  In a similar field of endeavor, Dimitriadis teaches a multi-tier classifier for mutli-modal voice activity detection, utilizing multi-modal classification of features [para 0016] that is implemented via a neural network [para 0022-0023].  Dimitriadis teaches the system is advantageous in providing classifications that account for interactions between the multimodal features [para 0006].  One having ordinary skill in the art at the time of the invention would have been able to implement the multimodal neural network processing, as suggested by Dimitriadis, in the classification system of Maes, and the results would have been predictable and resulted in a more accurate classification process from utilizing interactions between the multimodal features, as suggested by Dimitriadis.
Regarding claim 24, Maes teaches a system for multimodal classification (abstract; para [0012]), comprising; a processor (para [0010]); memory (para [0010]); a computer readable medium with non-transitory instructions embodied thereon (para 
Regarding claims 2 and 25, the combination of Maes and Dimitriadis teaches the other feature information includes a video feature vector extracted (para [0007], [0073], [0154]) generated by a neural network [Dimitriadis’ neural network].
Regarding claims 3 and 26, Maes teaches extracting the other feature information includes facial parts (para [0154]) and locations tracking (para [0154]), or blink detection, or pulse rate detection.

Regarding claims 5 and 28, Maes teaches the other feature information includes auditory attention features (abstract; para [0056], [0070], [0074]).
Regarding claims 6 and 29, Maes teaches the other feature information includes text (para [0119], [0179], [0232]).
Regarding claims 7 and 30, Maes teaches generating a text representation of the audio input and wherein d) further comprises classifying the text representation of the audio (para [0179]).
Regarding claims 8 and 31, Maes teaches classifying the text representation of the audio comprises to classify an intent from the text representation (para [0010], [0049], [0161]).
Regarding claims 9 and 32, Maes teaches classifying the text representation of the audio comprises extracting a part of speech vector (para [0039], [0072], [0119]) and/or sentiment lexical feature vector.
Regarding claims 10 and 33, Maes teaches the fundamental frequency information and the other feature information is classified for each word (para [0119], [0114], [0118], [0175]) or viseme (para [0119], [0114], [0118], [0175]).

Regarding claims 12, the combination of Maes and Dimitriadis teaches generating sentence level embedding (Maes para [0119], [0232]) and identifying attention features before generating a single fusion vector (Maes para [0059]) and classifying the fundamental frequency information and the other feature information (Maes para [0161], [0230], [0231]) using a multimodal neural network [Dimitriadis’ neural network].
Regarding claims 13 and 36, Maes teaches the fundamental frequency information and the other feature information is classified for each sentence (para [0119], [0232]).
Regarding claim 14, the combination of Maes and Dimitriadis teaches the fundamental frequency information and the other feature information is classified with a neural network [Dimitriadis’ neural network] and wherein the classification of the fundamental frequency information and the other feature information is further classified in step d) (Maes para [0119], [0161], [0232]).
Regarding claims 37, the combination of Maes and Dimitriadis teaches the fundamental frequency information and the other feature information is classified with a neural network [Dimitriadis’ neural network] and wherein the classification of the 
Regarding claims 15 and 38, Maes fails to teach, but Dimitriadis teaches the multimodal neural network of c) is a weighting neural network [para 0025; 0032].  Maes fails to teach using a multimodal neural network.  One having ordinary skill in the art at the time of the invention would have been able to implement the weighting multimodal neural network processing, as suggested by Dimitriadis, in the classification system of Maes, and the results would have been predictable and resulted in a more accurate classification process from utilizing interactions between the multimodal features, as suggested by Dimitriadis.
Regarding claim 16, Maes teaches the fundamental frequency information and the other feature information is fused to generate a single fusion vector before classification in c) (para [0122], [0129], [0136], [0231], [0232]).
Regarding claim 17, Maes teaches the fundamental frequency information and the other feature information are mapped to a new representation space (para [0071], [0084]) and attention features are identified using one or more neural networks before concatenation (para [0007], [0047], [0048], [0056]).
Regarding claims 18 and 41, the combination of Maes and Dimitriadis  teaches the multimodal classifier c) is configured to classify an emotional state or mood from the audio and other input (Maes para [0010], [0060]), in combination with [Dimitriadis’ neural network].

Regarding claims 20 and 43, the combination of Maes and Dimitriadis  teaches the multimodal neural network in c) is configured to classify an internal state of a person in the audio and other input (Maes para [0034], [0214]), in combination with [Dimitriadis’ neural network].
Regarding claims 21 and 44, the combination of Maes and Dimitriadis  teaches the multimodal neural network in c) is configured to classify a personality of a person in the audio and other input (Maes para [0197], [0222]), the combination of Maes and Dimitriadis.
Regarding claims 22 and 45, the combination of Maes and Dimitriadis  teaches the multimodal neural network in c) is configured to classify an identity of a person in the audio and other input Maes (para [0028], [0043]), in combination with [Dimitriadis’ neural network].
Regarding claims 23 and 46, the combination of Maes and Dimitriadis  teaches the multimodal neural network in c) is configured to classify a mood of a person in the audio and other input (Maes para [0010], [0060]), in combination with [Dimitriadis’ neural network].

Regarding claims 35, Maes teaches generating sentence level embedding (para [0119], [0232]) and identifying attention features in the fusion vector before classification in step d) (para [0007], [0047], [0048], [0056]).
Regarding claims 39, Maes teaches the fundamental frequency information and the other feature information is fused to generate a single fusion vector before classification in step d) (para [0122], [0129], [0136], [0231], [0232]).
Regarding claims 40, Maes teaches the fundamental frequency information and the other feature information are mapped to a new representation space (para [0071], [0084]) and attention features are identified using one or more neural networks before concatenation (para [0007], [0047], [0048], [0056]).
Regarding claims 47, Maes teaches a method for multimodal classification (abstract; para [0012]), comprising the steps of: a) extracting video feature information from a video stream (para [0012], [0116], [0130]) b) extracting other feature information from one or more other inputs associated with the video stream (para [0073], [0154]); c) generating a first set of viseme-level feature vectors from the video feature information and a second set of viseme-level feature vectors from the other feature information 
Regarding claims 48, Maes teaches a method for multimodal classification (abstract; para [0012]), comprising the steps of: a) extracting audio feature information from an audio stream (para [0012]) b) extracting other feature information from one or more other inputs associated with the audio stream (para [0070]); c) generating a first set of word-level feature vectors from the audio feature information and a second set of word-level feature vectors from the other feature information (para [0119], [0228], [0232]); d) fusing the first and second sets of word-level feature vectors to generate fused word-level feature vectors (para [0122], [0136]); e) classifying the audio feature information by applying the fused word-level feature vectors (para [0119],   Maes fails to teach using a multimodal neural network.  In a similar field of endeavor, Dimitriadis teaches a multi-tier classifier for mutli-modal voice activity detection, utilizing multi-modal classification of features [para 0016] that is implemented via a neural network [para 0022-0023].  Dimitriadis teaches the system is advantageous in providing classifications that account for interactions between the multimodal features [para 0006].  One having ordinary skill in the art at the time of the invention would have been able to implement the multimodal neural network processing, as suggested by Dimitriadis, in the classification system of Maes, and the results would have been predictable and resulted in a more accurate classification process from utilizing interactions between the multimodal features, as suggested by Dimitriadis.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Heyl et al (US Patent Application Publication No. 2018/0018970) discloses neural network recognition of signals in multiple sensory domains.
Marcheret et al (US Patent Application Publication No. 2017/0061966) discloses audio-visual speech recognition using neural networks.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598.  The examiner can normally be reached on M,T,TH,F 11:30-8:00.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659