DETAILED ACTION
Claims 1-20 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the Examiner. 
Election/Restrictions
Applicant elected to prosecute Group I without traverse is acknowledged by the Primary Examiner.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with ERIC V. FIGUEROA on 07/19/2022.
The application has been amended as follows:
(Currently Amended) A computer-implemented method, comprising:
receiving first inputs associated with a first modality, and second inputs associated with a second modality;
processing the received first inputs and second inputs in a convolutional neural network (CNN), wherein a first set of weights are assigned to the first inputs and a second set of weights are assigned to the second inputs[[ ]], wherein text labeling is not performed, the first inputs are not converted to the second mode, and the second inputs are not converted to the first mode;
[AltContent: rect]determining a loss for each of the first inputs and the second inputs based on a loss function that applies the first set of weights, the second set of weights, and a presence of a co-occurrence, wherein the co-occurrence is associated with the first inputs and the second inputs in sequence within a common time window;
generating a shared feature space as an output of the CNN, wherein a distance between cells associated with the first inputs and the second inputs in the shared feature space is determined based on the loss associated with each of the first inputs and the second inputs; and
based on the shared feature space, providing an output indicative of a classification or probability of a classification.

(Original) The computer-implemented method of claim 1, wherein a first anchor channel is associated with the first modality and a second anchor channel is associated with the second modality.

(Original) The computer-implemented method of claim 1, wherein the first modality comprises a visual mode and the second modality comprises an audio mode.

(Original) The computer-implemented method of claim 1, wherein the first inputs and the second inputs are received from one or more sensors.
(Original) The computer-implemented method of claim 4, wherein the one or more sensors comprise at least one of a camera associated with the first inputs and a microphone associated with the second inputs.

(Cancelled)


(Cancelled)


(Original) The computer-implemented method of claim 1, wherein the computer- implemented method is executed in a neural processing unit of a processor.

(Original) The computer-implemented method of claim 1, wherein the computer- implemented method is executed in a mobile communications device, a home management device, and/or a processor of a robotic device.

(Currently Amended) A non-transitory computer readable medium having a storage that stores instructions, the instructions executed by a processor, the instructions comprising:
receiving first inputs associated with a first modality, and second inputs associated with a second modality;
processing the received first inputs and second inputs in a convolutional neural network
(CNN), wherein a first set of weights are assigned to the first inputs and a second set of weights are assigned to the second inputs[[ ]], wherein text labeling is not performed, the first inputs are not converted to the second mode, and the second inputs are not converted to the first mode;
[AltContent: rect]determining a loss for each of the first inputs and the second inputs based on a loss function that applies the first set of weights, the second set of weights, and a presence of a co-occurrence, wherein the co-occurrence is associated with the first inputs and the second inputs in sequence within a common time window;
generating a shared feature space as an output of the CNN, wherein a distance between
cells associated with the first inputs and the second inputs in the shared feature space is determined based on the loss associated with each of the first inputs and the second inputs; and based on the shared feature space, providing an output indicative of a co-occurrence or not.


(Currently Amended) The computer-implemented method of claim [[1]] 10, wherein the first modality comprises a visual mode and the second modality comprises an audio mode.

(Currently Amended) The computer-implemented method of claim [[1]] 10, wherein the first inputs and the second inputs are received from one or more sensors, and wherein the one or more sensors comprise at least one of a camera associated with the first inputs and a microphone associated with the second inputs.

(Cancelled)


(Cancelled)


(Currently Amended) The computer-implemented method of claim [[1]] 10, wherein the computer-implemented method is executed in a mobile communications device, a home management device, and/or a processor of a robotic device.

16-20. (Cancelled)

Allowable Subject Matter
Claims 1-5, 8-12, and 15 are allowed. 
	The following is an Examiner’s statement of reasons for allowance:
	US 2018/0225822 A1 discloses a neural network is trained to perform the plurality of medical imaging analyses based on the datasets of input training medical imaging data and the corresponding output training medical imaging data. The neural network is represented as a plurality of nodes each associated with a set or vector of weights. The weights in the set of weights correspond to the hierarchical structure of networks, such that each vector of weight includes a weight WH for the HNet, weight WM for UNet.M, weight WA for UNet.A, weight WMA for SNet.MA, and weight WMAT for the remaining part of Net.MAT. The hierarchical structure allows for weights at a top level of the hierarchy (i.e., WH) to be used for learning weights further down the hierarchy (e.g., WM, WA, and WMAT). Accordingly, datasets of training medical imaging data associated with one medical imaging analysis can be used for learning weights associated with a different medical imaging analysis. 

	US 10943154 B2 discloses the method 200 can include training a neural network based on the multi-modal data at least in part by using a triplet loss computed for the driving events as a regression loss to determine an embedding of driving event data. In an aspect, training component 110, e.g., in conjunction with processor 102, memory 104, etc., can train the neural network based on the multi-modal data at least in part by using the triplet loss computed for the driving events as the regression loss to determine the embedding of driving event data (e.g., in the trained dataset 116). For example, training component 110 can train the neural network based on a variety of input data to generate trained dataset 116. In an example, training component 110 can obtain input data in the form of objects for identification, where the data can include images of the objects and associated object labels. In one example, training component 110 can train the neural network with multiple images to facilitate detecting events represented by the images based on association with one or more of the images. Training component 110, for 
example, can determine events with which the multiple images are likely associated (e.g., based on comparing aspects of the images), and can identify an association between the images based on identifying similar properties of the images. In one example, the training component 110 can label the events in the image to improve retrieval performance.

	US 2021/0097401 A1 discloses According to a first aspect a network system to generate output data values from input data values according to one or more learned data distributions comprises an input to receive a set of observations, each comprising a respective first data value for a first variable and a respective second data value for a second variable dependent upon the first variable. The system may comprise an encoder neural network system configured to encode each observation of the set of observations to provide an encoded output for each observation. The system may further comprise an aggregator configured to aggregate the encoded outputs for the set of observations and provide an aggregated output. The system may further comprise a decoder neural network system configured to receive a combination of the aggregated output and a target input value and to provide a decoder output. The target input value may comprise a value for the first variable and the decoder output may predict a corresponding value for the second variable.
	However, all cited prior arts of record fail to disclose in claims 1, and 10, “… receiving first inputs associated with a first modality, and second inputs associated with a second modality; processing the received first inputs and second inputs in a convolutional neural network (CNN), wherein a first set of weights are assigned to the first inputs and a second set of weights are assigned to the second inputs, wherein text labeling is not performed, the first inputs are not converted to the second mode, and the second inputs are not converted to the first mode;[AltContent: rect] determining a loss for each of the first inputs and the second inputs based on a loss function that applies the first set of weights, the second set of weights, and a presence of a co-occurrence, wherein the co-occurrence is associated with the first inputs and the second inputs in sequence within a common time window; generating a shared feature space as an output of the CNN, wherein a distance between cells associated with the first inputs and the second inputs in the shared feature space is determined based on the loss associated with each of the first inputs and the second inputs; and based on the shared feature space, providing an output indicative of a classification or probability of a classification.” (or similar limitations)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	US 11301995 B2 - concepts for feature identification in medical imaging of a subject. One such concept processes a medical image with a Bayesian deep learning network to determine a first image feature of interest and an associated uncertainty value, the first image feature being located in a first sub-region of the image. It also processes the medical image with a generative adversarial network to determine a second image feature of interest within the first sub-region of the image and an associated uncertainty value. Based on the first and second image features and their associated uncertainty values, the first sub-region of the image is classified.
	
	US 2020/0034948 A1 - The present disclosure describes a computer-implemented method of transforming a low-resolution MR image to a high-resolution MR image using a deep CNN-based MRI SR network and a computer-implemented method of transforming an MR image to a pseudo-CT (sCT) image using a deep CNN-based sCT network. The present disclosure further describes a MR image-guided radiation treatment system that includes a computing device to implement the MRI SR and CT networks and to produce a radiation plan based in the resulting high resolution MR images and sCT images.
Inquiries 
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to PAKEE FANG whose telephone number is (571)270-3633.  The Examiner can normally be reached on Mon-Fri 9:00AM-5:00PM.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, PÉREZ-GUTIÉRREZ RAFAEL can be reached on 571-272-7915.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PAKEE FANG/
Primary Examiner, Art Unit 2642