DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Priority
Examiner acknowledges Applicant's claim for priority based on KR10-2017-0153971 filed 11/17/2017 in the Republic of Korea. A non-English copy of this priority document is in the file. An English translation has not been received and may be required in the future, but it is not required at this time. 

Title
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. Examiner believes that the title of the invention is imprecise. A descriptive title indicative of the invention will help in proper indexing, classifying, searching, etc. See MPEP §606.01. However, the title of the invention should be limited to 500 characters. Examiner suggests in including the aspect(s) of the claims which Applicant believes to be novel or nonobvious over the prior art.

Claim Interpretation
During patent examination, pending claims must be “given their broadest reasonable interpretation consistent with the specification.”  MPEP 2111; See also, MPEP 2173.02.  Limitations appearing in the specification but not recited in the claim are not read into the claim.  In re Prater, 415 F.2d 1393, 1404-05, 162 USPQ 541, 550-551 (CCPA 1969).  See also, In re Zletz, 893 F.2d 319, 321-22, 13 USPQ2d 1320, 1322 (Fed. Cir. 1989) (“During patent examination the pending claims must be interpreted as broadly as their terms reasonably allow”).  The reason is simply that during patent prosecution when claims can be amended, ambiguities should be recognized, scope and breadth of language explored, and clarification imposed.  An essential purpose of patent examination is to fashion claims that are precise, clear, correct, and unambiguous.  Only in this way can uncertainties of claim scope be removed, as much as possible, during the administrative process.

Claim Objections
Claim(s) 12 is/are objected to because of the following informalities:  
Claim 12 lines 2-3: Change “wherein the further comprising training” to – further comprising training –.
Appropriate correction is required.
	
35 USC § 101
Response to Arguments
Applicant’s arguments, see page 8, filed 1/3/2022, with respect to §101 have been fully considered and are persuasive.  The claims were not rejected under §101 in the previous Office Action, and no rejection under §101 is being made in this Office Action.

Claim Rejections - 35 USC § 112
Response to Arguments
Applicant’s arguments, see page 9, filed 1/3/2022, with respect to rejections under §112(a) have been fully considered and are persuasive.  The rejection of claims 1-15 under §112(a) has been withdrawn. 

Claim Rejections - 35 USC § 112
Response to Arguments
Applicant’s arguments, see page 9, filed 1/3/2022, with respect to rejections under §112(b) have been fully considered and are persuasive.  The rejection of claims 1-15 under §112(b) has been withdrawn. 

PRIOR ART
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. §103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. §102(b)(2)(C) for any potential 35 U.S.C. §102(a)(2) prior art against the later invention.
Claim(s) 1-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Chen (“Audio-Visual Integration in Multimodal Communication”) in view of 
Andrew (“Deep Canonical Correlation Analysis”).

Claims 1 and 6 and 11 (Independent)
Chen discloses: 
obtaining target modality signals of a first domain aligned in a time order (e.g. §II ¶1: human speech is bimodal in nature or §II ¶IV: acoustic … speech or §V ¶1: acoustic signal or §V ¶3: studio dialog; EN: met by the described acoustic speech signals in an audio domain with associated timing) and auxiliary modality signals of a second domain that are not aligned in the time order (e.g. §II ¶1: human speech is bimodal in nature or §II ¶IV: visible speech or §V ¶1: video … speaker’s mouth image or §V ¶3: image of the speaker … lip movement … video codec often skips some frames to meet bandwidth requirement … loss of lip synchronization or §V ¶5: loss of lip synchronization in video … transmission delay; EN: met by the described video images of lip movement which do not have “the time order” of the audio domain because they have skipped frames or delayed transmission); 
obtaining, by using a first neural network model, characteristic information of the target modality signals from the target modality signals (e.g. §VII.B ¶1: Neural networks … used to convert acoustic parameters into visual parameters; EN: This corresponds to a neural network that that produces characteristic visual parameters as output in response to receiving acoustic signals); 
obtaining … time order information of the auxiliary modality signals from the characteristic information and the auxiliary modality signals (e.g. §V ¶3: time warp the video … to make the lip movement fit the studio dialog … process the mouth image accordingly to achieve lip synchronization or §VII ¶1: key issue in bimodal speech analysis and synthesis is the establishment of the mapping between acoustic parameters and mouth-shape parameters or §VII.B ¶1: into visual parameters; EN: in order to fix the lip synchronization by matching the image to the audio as discussed in §V requires using the determined visual parameters derived as corresponding to the acoustic parameters (which is mapped to the characteristic information above) and the existing mouth images that need to be synchronized (which is mapped to the auxiliary modality signals above); and 
training the first neural network model by updating weights of the first neural network model … (e.g. §VII.B ¶1: train the network weights), 
wherein the characteristic information is used for identifying at least one object included in the target modality signals (e.g. §VII.B ¶1: Neural networks … used to convert acoustic parameters into visual parameters; EN: The identified visual parameters are identified objects (e.g. positioning of lips, etc.) corresponding to the input acoustic signal).  
Chen fails to explicitly recite:
a neural network analyzing input visual signals;
using a first loss signal obtained based on the time order information and a second loss signal obtained based on the characteristic information.
Andrew discloses: 
obtaining target modality signals of a first domain aligned in a time order (e.g. §1 ¶4: views (modalities) or §3 ¶1: Deep CCA computes representations of the two views by passing them through multiple stacked layers of nonlinear transformation or §4.3 ¶1: speech data … acoustic … recordings; EN: The reference discloses obtaining modality signals from two domains of different modalities, and specifically in §4.3 where the first domain is acoustic recordings of speech data which are temporal sequences aligned in time order) and auxiliary modality signals of a second domain … (e.g. §1 ¶4: views (modalities) or §3 ¶1: Deep CCA computes representations of the two views by passing them through multiple stacked layers of nonlinear transformation or §4.3 ¶1: speech data … articulatory recordings; EN: The reference discloses obtaining modality signals from two domains of differing modalities, and specifically in §4.3 where the second domain is articulatory recordings of positions of the speaker’s lips, tongue, and jaws); 
obtaining, by using a first neural network model, characteristic information of the target modality signals from the target modality signals (e.g. §3 ¶1: Deep CCA computes representations of the two views by passing them through multiple stacked layers of nonlinear transformation or Figure 1 and the associated discussion or §4.3 ¶1: acoustic recordings; EN: The broadest reasonable interpretation of the term “characteristic information” includes any information which broadly “characterizes” the input, and since any produced output is being produced in response to the input, it is necessarily characteristic of that input; This is met by the first neural network processing the first modality, i.e. the DCCA network of View 1 in Figure 1, the first modality comprising the speech acoustic recordings of §4.1, the network producing output which is necessarily information characterizing the input by virtue of being produced in response the input, as explained above; Examiner encourages applicant to amend the claim to stipulate more narrowly what “characteristic information” is intended to cover such as in what manner the information is characteristic of the input); 
obtaining, by using a second neural network model, … information of the auxiliary modality signals … and the auxiliary modality signals (e.g. §3 ¶1: Deep CCA computes representations of the two views by passing them through multiple stacked layers of nonlinear transformation or Figure 1 and the associated discussion or §4.3 ¶1: articulatory data consisting of horizontal and vertical displacements; EN: This is met by the second neural network processing the second modality, i.e. the DCCA network of View 2 in Figure 1, the second modality comprising the articulation position recordings); and 
training the first neural network model by updating weights of the first neural network model using a first loss signal obtained based on the time order information and a second loss signal obtained based on the characteristic information (e.g. §3: jointly learn parameters for both views … To find (theta1, theta1), we follow the gradient of the correlation objective as estimated on the training data or Equations (9), (10), (11), (12) and the associated discussion). 
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen to incorporate the neural network structure and joint training correlation training as taught by Andrew for the benefit of selecting a suitable topology for the network (Chen §VII.B) and simultaneously learning two deep nonlinear mappings of two views that are maximally correlated by jointly learning to maximize the correlation of parameters of both transformations (Andrew especially e.g. §Abstract or §1). 


Claims 2 and 7 and 12
In the combination above, Chen discloses: 
a first loss signal obtained based on the time order information (e.g. §V or §VII; EN: This is the error/shift in the video image frames compared to where they need to be to synchronize with the audio). 
Chen fails to explicitly recite:
further comprising training the second neural network model updating weights of the second neural network model.
Andrew discloses: 
further comprising training the second neural network model updating weights of the second neural network model using a first loss signal obtained (e.g. §3: jointly learn parameters for both views … To find (theta1, theta1), we follow the gradient of the correlation objective as estimated on the training data or Equations (9), (10), (11), (12) and the associated discussion; EN: Training the second neural network using the joint correlation based on the errors/losses of both networks as is done for the first neural network is within the broadest reasonable interpretation of the open “comprising” language of this claim).  
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen to incorporate the neural network structure and joint training correlation training as taught by Andrew for the benefit of selecting a suitable topology for the network (Chen §VII.B) and simultaneously learning two deep nonlinear mappings of two views that are maximally correlated by jointly learning to maximize the correlation of parameters of both transformations (Andrew especially e.g. §Abstract or §1). 


Claims 3 and 8 and 13
Chen discloses: 
wherein the at least one of the first neural network … comprises sequence puzzle network (e.g. §§VII.B: neural network; EN: Neither the claims nor the specification define a sequence puzzle network, and it is not a well-known term of art but rather appears to be applicant’s own nomenclature naming the neural network used, so any neural network meets the broadest reasonable interpretation of the claim).  
Andrew also discloses: 
wherein the at least one of the first neural network or the second neural network comprises sequence puzzle network (e.g. §3 or Figure 1; EN: Neither the claims nor the specification define a sequence puzzle network, and it is not a well-known term of art but rather appears to be applicant’s own nomenclature naming the neural network used, so any neural network meets the broadest reasonable interpretation of the claim).  
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen to incorporate the neural network structure and joint training correlation training as taught by Andrew for the benefit of selecting a suitable topology for the network (Chen §VII.B) and simultaneously learning two deep nonlinear mappings of two views that are maximally correlated by jointly learning to maximize the correlation of parameters of both transformations (Andrew especially e.g. §Abstract or §1). 


Claims 4 and 9 and 14
Chen discloses: further comprising: 
obtaining first target modality signals of the first domain and first auxiliary modality signals of the second domain associated with an object, wherein the first target modality signals are aligned in the time order (e.g. ¶IV: acoustic … speech or §V ¶1: acoustic signal or §V ¶3: studio dialog; EN: met by the described acoustic speech signals in an audio domain with associated timing) and the first auxiliary modality signals that are not aligned in the time order (e.g. §V ¶1: video … speaker’s mouth image or §V ¶3: image of the speaker … lip movement … video codec often skips some frames to meet bandwidth requirement … loss of lip synchronization or §V ¶5: loss of lip synchronization in video … transmission delay; EN: met by the described video images of lip movement which do not have “the time order” of the audio domain because they have skipped frames or delayed transmission); and 
obtaining second target modality signals of the second domain and second auxiliary modality signals of the first domain, wherein the second target modality signals are aligned in the time order (e.g. §V ¶2: mouth movement … original recording … original lip movement; EN: Here the reference discusses where the video is used as the reference time order that needs to be referenced) and the second auxiliary modality signals that are not aligned in the time order (e.g. §V ¶2: warp the acoustic signal to make it sound synchronized with the person’s mouth movement … dialog recorded in a studio to replace the dialogue recorded while filming a scene … studio dialogue … studio dialogue can be made in synchronization with the original lip movement; EN: Here the reference discusses where the audio needs to be warped to have a modified time scale to align with the original video recording, so here it is the audio that is not time-aligned with the reference).  


Claims 5 and 10 and 15
Chen discloses:
obtaining characteristic information of the first target modality signals … from the first target modality signals (§VII.B ¶1: Neural networks … used to convert acoustic parameters into visual parameters; EN: This corresponds to a neural network that that produces characteristic visual parameters as output in response to receiving acoustic signals), and characteristic information of the second target modality signals … from the second target modality signals (e.g. §V ¶2: find the best ‘time-warping path’ that is required to modify the time scale of the studio audio to align the original recording).
Chen fails to explicitly recite:
discussion of explicitly using neural networks to align audio with reference video timing.
Andrew discloses: further comprising: 
obtaining characteristic information of the first target modality signals, using a third neural network model from the first target modality signals (e.g. §3 ¶1: Deep CCA computes representations of the two views by passing them through multiple stacked layers of nonlinear transformation or Figure 1 and the associated discussion or §4.3 ¶1: articulatory data consisting of horizontal and vertical displacements; EN: This is the reapplication of the same multimodal technique taught by Andrew for the similar application on for different data in the domains in need of analysis and synchronization as suggested by Chen above) and characteristic information of the second target modality signals, using a fourth neural network model from the second target modality signals (e.g. §3 ¶1: Deep CCA computes representations of the two views by passing them through multiple stacked layers of nonlinear transformation or Figure 1 and the associated discussion or §4.3 ¶1: acoustic recordings; EN: This is the reapplication of the same multimodal technique taught by Andrew for generating the reverse solution that Chen puts forth as needing done above) ; and 
determining a category of the object by aggregating the characteristic information of the first target modality signals and the characteristic information of the second target modality signals (e.g. §5: classification … test the representations produced by deep CCA in the context of prediction tasks and compare against other nonlinear multi-view representation learning).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen to incorporate the neural network structure and joint training correlation training as taught by Andrew for the benefit of selecting a suitable topology for the network (Chen §VII.B) and simultaneously learning two deep nonlinear mappings of two views that are maximally correlated by jointly learning to maximize the correlation of parameters of both transformations (Andrew especially e.g. §Abstract or §1). 


Examiner’s Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any prior art made of record on the attached PTO-892 and not relied upon is considered pertinent to applicant's disclosure.
Applicant is reminded that in amending in response to a rejection of claims, the patentable novelty must be clearly shown in view of the state of the art disclosed by the references cited and the objections made.  Applicant must also show how the amendments avoid such references and objections.  See 37 CFR §1.111(c).  Additionally when amending, in their remarks Applicant should particularly cite to the supporting paragraphs in the original disclosure for the amendments.

Correspondence Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN J BUSS whose telephone number is (571)272-5831.  The examiner can normally be reached on Monday, Tuesday, Thursday 9A-5P ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
As detailed in MPEP 502.03, communications via Internet e-mail are at the discretion of the applicant.  Without a written authorization by applicant in place, the USPTO will not respond via Internet e-mail to any Internet correspondence which contains information subject to the confidentiality requirement as set forth in 35 U.S.C. 122. A paper copy of such correspondence will be placed in the appropriate patent application. Examiner suggests filing PTO/SB/439 if applicant desires the examiner to be able to communicate by email.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 


/B.B./
Examiner, Art Unit 2127

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121