Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1,2,4-6,8-12,14-16,18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Bui et al (20200160042).

As per claim 1, Bui et al (20200160042) teaches a computer implemented system for conducting machine learning using partially-observed data (examiner notes that  “data” from applicants spec, when discussing a level of ‘observability’, is toward attribute data – applicants spec, para 0006; Bui et al (20200160042) – Fig. 2, using differing modalities of verbal (voice/text), gesture input, and selecting/choosing the appropriate neural network; see also para 0068, in the example of ‘lighten the background’, the system chooses the applicable neural network – the selection process determines the present/observable attribute in the inputs, and performs the selection; see also para 0144)), the system including a processor operating in conjunction with computer memory (as processor – para 0139, and memory – para 0154), the system comprising:
 the processor configured to provide: 
a data receiver adapted to receive one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data (as, using the CRF network to determining observable visible features and unobservable network features – para 0091), 
the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved (as, using a mask to identify observable objects – para 0023, and determining an object mask and a command selection from the input – para 0029);
 and a machine learning data architecture engine adapted to: maintain a attributive proposal network for processing the one or more data sets (as, generating a multimodal selection system taking into account memory speech, processing capabilities, etc. – para 0030; wherein the network comprises of neural networks performing the image/voice modalities –para 0031-0032, so that the architecture engine is downloaded to the computing device – para 0033, without changing the core applications – end of para 0029, and para 0030); 
maintain a collective proposal network for processing the corresponding mask data structure (as, maintaining a network dataset of entries, to compare to the derived command – para 0147; and in detail, see para 0092, wherein in the verbal(speech/text) modality, the words “remove” and “tree” are recognized as being understood in the language domain; then the derived data from the verbal modality is cross-checked into the image domain as relevant entity labels – ie, matching object identifiers such as “tree” and the command “remove” to be applicable to the image); 
and maintain a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein (as, using a verbal decoder to determine the speech/command/text and an image/gesture decoder to determine the image, and generating cross-probabilities via a neural network to reduce the cumulative error of the combination of the verbal object class and the gesture input – para 0070; in other words, separate neural networks are used for the verbal object class and a separate object class for image/gestures, and then comparing the combination of the two as a predicted object against predetermined classifications – para 0070),
 for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution (as, if one of the modalities is not observed or recognized – para 0080 –unrecognized commands or, unrecognized image – para 0093; calculating a closest classification vector according to the object class – para 0094, last 7 lines, reflecting back to the first 7 lines of para 0094, wherein classification vectors are calculated for the verbal section and the compute vision section).

As per claim 2, Bui et al (20200160042) teaches the system of claim 1, wherein the attributive proposal network, the collective proposal network, and the generative network are trained together jointly (as training the LSTM network for the verbal – para 0089, collective probabilities – para 0078-0079 for the verbal and visual, and training these networks – para 0093).

As per claim 4, Bui et al (20200160042) teaches the system of claim 1, wherein the partially-observed data is heterogeneous data (as, the observed data can be mix of data types – observable visible features and unobservable network features – para 0091).

As per claim 5, Bui et al (20200160042) teaches the system of claim 1, wherein the output estimated data includes estimated values corresponding to at least one unobserved modality (as using a closest vector estimate when there is no match/unobserved state – para 0094) and the output estimated data can be combined with the partially-observed data (as combining the results of the image estimate – para 0094, for the image object class ‘jackal’ using the image class ‘dog’ as the closest estimated, combined with the observable verbal class ‘jackal’).

As per claim 6, Bui et al (20200160042) teaches the system of claim 1, wherein the output estimated data is a new set of generated data sets (as, the neural networks used for the verbal/gesture/image recognition are trained and updated continuously – para 0036-0037).



As per claims 8,9, Bui et al (20200160042) teaches the data sets to cover the range of low and high dimensionality data (para 0035, defining gesture input can be a swipe, drag, click, or location within the screen – clearly, a low dimensionality dataset; vs verbal/speech input which includes context/intention – para 0041; it is old and notoriously well known in the art of speech recognition and context information to have, on the order of, thousands of speech features and context/intent categories).

As per claim 10, Bui et al (20200160042) teaches the system of claim 1, wherein the mask data structure is an array of Boolean variables, each Boolean variable having a corresponding modality (as using binary masks – para 0028, wherein the binary mask uses the verbal modality and the gesture modality neural networks).

Claims 11,12,14-16,18,19 are method claims whose steps are performed by the system claims 1,2,4-6,8,9 above and as such, claims 11,12,14-16,18,19 are similar in scope and content to claims 1,2,4-6,8,9; therefore, claims 11, 12, 14-16,18,19 are rejected under similar rationale as presented against claims 1,2,4-6,8,9.

	Claim 20 is a non-transitory computer readable medium claim performing steps common to claims 1,2,4-6,8-12, 14-16,18-19 above and as such, claim 20 is similar in scope and content to claims 1,2,4-6,8-12, 14-16,18-19 and therefore, claim 20 is rejected under similar rationale as presented against claims 1,2,4-6,8-12, 14-16,18-19.  Furthermore, Bui et al (20200160042) teaches a computer readable storage medium storing computer executable instructions to perform the method step (para 0139).

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 3,7,13,17 are rejected under 35 U.S.C. 103 as being unpatentable over Bui et al (20200160042) in view of Li et al (20190236450).

As per claims 3, 13, Bui et al (20200160042) teaches the system of claim 1, but does not explicitly teach “wherein the machine learning data architecture engine is further adapted to maintain a second generative network including a second set of one or more decoders, each decoder of the second set of the one or more decoders configured to generate new masks that can be applied to the output estimated data such that the masked output estimated data approximates a level of masking in the received one or more data sets” (Bui et al (20200160042), as applied to the claims above, teaches probability thresholds and when probability thresholds are not met in a particular modality, a next best vector is chosen – see the explanation in claim 5 above).  Li et al (20190236450) teaches a multimodal selector using an additive/second layer – para 0071, generating different mixtures (para 0071) of the different modalities (the different modalities can be visual, post data vectors, profile data including gender and age – para 0070), and using a multiplicative layer function (ie, masking) that emphasize the mixture of good modalities and de-emphasizing bad modalities (last 5 lines of para 0071).  Therefore, it would have been obvious to one of ordinary skill in the art of machine learning applications for modal determination to modify the modality masking of Bui et al (20200160042) with an additional layer of masking/modality mixture calculations, as taught by Li et al (20190236450) because it would advantageously emphasize more informative information while discarding noise information ( see Li et al (20190236450), end of para 0071).    

	As per claims 7, 17, the combination of Bui et al (20200160042) in view of Li et al (20190236450) teaches the concept of good modalities (ie, observable modalities) and bad modalities (non-informative or noisy vector representations; ie, unobservable); Li et al (20190236450) – para 0071.  Examiner notes that observability/unobservability is also discussed in Bui et al – see claim 4 above).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see related art listed on the PTO-892 form.
Zadeh (20180204111) teaches modality distinguishment (human vs machine oriented) – para 1021, with binary constraints – para 1183; as part of a comprehensive media source containing images/video/speech/voice – para1334
Deasy et al (20210383538) teaches distinguishing between a first and second modality (para 0021, albeit both imagery).
Polak et al (20180032845) teaches a multimodal classification system (para 0070) with aggregating score representing the probabilities of differing modes – abstract).
Chang (20050265607) teaches a combining classifier with multiple modalities (see fig. 4).


Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                            07/30/2022