DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The Amendment filed 21 January 2021 (hereinafter “the Amendment”) has been entered and considered. Claims 1, 10, and 19 have been amended. Claims 9 and 27 have been canceled. Claims 1-5, 8, 10-14, 19, 21-22, 24, 26, and 28-29, all the claims pending in the application, are rejected. All new grounds of rejection set forth in the present action were necessitated by Applicants’ claim amendments; accordingly, this action is made final.

Response to Amendment
Double Patenting
In view of the amendments to independent claims 1, 10, and 19, the double patenting rejections are withdrawn.

Prior Art Rejections
In view of the amendments to independent claims 1, 10, and 19, the previously applied prior art rejections are withdrawn. Applicants’ arguments are rendered moot in view of the new grounds of rejection set forth below.
The Examiner notes that the previously applied Chandar reference is replaced by an earlier publication by the same author (see rejection below and the attached PTO-892). Applicants acknowledge that the previously applied Chandar reference discloses the cross-reconstruction of views (for example, image to text or vice versa), but does not teach or suggest that such a cross-reconstructed view is used as a single source for a classifier. However, the 
“In some applications, one view might be easy to get while the second view might be hard to get or costly to compute online. Unfortunately, the second view might be most representative of the classes. In such scenario, a common representation which is predictive of the second view can improve the performance of the classifier even though it has access to the first view only” (Chapter 1.2; emphasis added).

In another example, Chandar discloses that a common representation predictive of a second view can improve the performance of a classifier if there is not enough training data in the first view (Chapter 1.2). 
Thus, Chandar discloses the newly added feature of the amended claims of single-view, cross-reconstructed classification, and contemplates scenarios in which such a feature would be useful. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective 

Claims 1-2, 4-5, 8, 10-11, 13-14, 19, 22, 24, 26, and 28-29 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2001/0042085 to Peairs et al. (cited in the IDS filed 11 February 2019; hereinafter “Peairs”) in view of “Correlational Neural Networks for Common Representation Learning” to Chandar et al. (hereinafter “Chandar”) and further in view of U.S. Patent Application Publication No. 2016/0364419 to Stanton et al. (cited in the IDS filed 11 February 2019; hereinafter “Stanton”).
As to independent claim 1, Peairs discloses a method comprising: receiving, at a computer, an electronic document ([0040] discloses receiving a document); using raw text extracted from the electronic document to determine a textual data representation of the electronic document ([0042] discloses performing textual analysis on extracted text to determine a textual representation of the document; see also Fig. 5 and [0049-0052]); using a raster image extracted from the electronic document to determine a visual data representation of the electronic document ([0043] discloses performing graphical analysis on an image of the document to determine a visual representation of the document; see also Fig. 6 and [0053-0057]); and applying, by the computer, the machine learning classifier based on the textual data representation and visual data representation to classify a new electronic document ([0038] discloses classifying newly received documents based on the model formed by the text and image representations). 
Peairs discloses that the system is trained to recognize the types of documents based on the above described analysis including both textual analysis and graphical analysis of the documents ([0037]). Accordingly, Peairs contemplates performing the above analysis on a received document which is used to train the document classifier. However, Peairs does not on which to train a machine learning classifier. Peairs also does not expressly disclose generating, by the computer, a fusion representation based on the textual data representation and the visual data representation of the electronic document or that the classification of the new documents is based on the fusion representation or applied using a single input source of either text reconstructed from image or image reconstructed text of the new electronic document. 
Chandar, like Peairs, is directed to learning features over multiple modalities (used interchangeably with “views”) including audio, video, images, and text (see at least Chapter 1.2 first and last paragraphs). Chandar proposes the concept of learned common representations which can help in learning across views. For example, a common representation predictive of a second view can improve the performance of a classifier if it only has access to a first view, particularly if: 1) the second view is more representative of the classes but is less accessible or 2) if there is not enough training data in the first view (Chapter 1.2). Thus, Chandar discloses forming learned common representations of multi-modal data, and performing training and testing of a model based on the learned common representations, the testing involving cross-reconstruction of one view from the other view for improving the performance of a classifier (Chapters 1.2, 3 and 5). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Peairs to generate a shared representation of the multi-modal representations to train a model and using the trained model based on the shared representation to reconstruct a second view (e.g., image or text) from a first view (e.g., the other of image or text) for classification, as taught by Chandar, to arrive at the claimed invention 
The proposed combination of Peairs and Chandar does not expressly disclose that the textual data representation is determined by applying, by the computer, a first neural network to the raw text or that the visual data representation is determined by applying, by the computer, a second neural network to the raster image. 
Stanton, like Peairs, is directed to document classification based on text and images (Abstract and [0022]). Stanton discloses applying a neural network to the text of the dataset, applying a neural network to the images of the dataset, and combining the results to be input to a joint processing neural network ([0035-0040]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Peairs and Chandar to apply separate neural networks to text and visual data, respectively, as taught by Stanton, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to better characterize the text and images.
As to claim 2, the proposed combination of Peairs, Chandar, and Stanton further teaches that the generating is further based on a label associated with the electronic document, the label denoting a document category 
As to claim 4, the proposed combination of Peairs, Chandar, and Stanton further teaches that the generating further comprises: generating, by the computer, the fusion representation based on a correlation between the textual data representation and the visual data representation, the textual data representation, and the visual data representation (Sections 3.3 of Chandar discloses that the generation of the shared representation is based on a correlation between the multimodal data). 
As to claim 5, the proposed combination of Peairs, Chandar, and Stanton further teaches that the first neural network is different from the second neural network ([0037-0039] of Stanton contemplates using different neural networks for the text and visual data processing). 
As to claim 8, the proposed combination of Peairs, Chandar, and Stanton further teaches that the generating is based, at least in part, on a cost function which maximizes a correlation between the textual data representation and the visual data representation (Section 3.3 of Chandar discloses a cost function which maximizes the correlation between representations of the two views).
As to claim 28, the proposed combination of Peairs, Chandar, and Stanton further teaches generating, by the computer prior to applying the second neural network, the raster image from the electronic document ([0022-0026, 0038-0040] of Stanton discloses generating the image data prior to applying the second neural network; similarly, [0043, 0053-0057] of Peairs discloses generating the image of the documents prior to graphical analysis thereof; the reasons for combining the references are analogous to those discussed above in conjunction with claim 1). 
As to claim 29, the proposed combination of Peairs, Chandar, and Stanton further teaches weighting, by the computer during training of the machine learning classifier, a cost function used for the generating of the fusion representation to direct a learning process for the training (Section 3.3 of Chandar discloses generating the shared representation of the multimodal data for training by using a cost function which is weighted). 

As to independent claim 10, Peairs discloses a computing device comprising: a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method of multi-modal electronic document classification; a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor ([0017] discloses a non-transitory computer-readable medium storing executable instructions; [0021] discloses a processor coupled to memory) to: receive, as input, an electronic document ([0040] discloses receiving a document); using raw text extracted from the electronic document to determine a textual data representation ([0042] discloses performing textual analysis on extracted text to determine a textual representation of the document; see also Fig. 5 and [0049-0052]); using an image extracted from the electronic document to determine a visual data representation ([0043] discloses performing graphical analysis on an image of the document to determine a visual representation of the document; see also Fig. 6 and [0053-0057]); and apply the machine learning classifier based on the textual data representation and the visual data representation to classify a new electronic document ([0038] discloses classifying newly received documents based on the model formed by the text and image representations). 
Peairs discloses that the system is trained to recognize the types of documents based on the above described analysis including both textual analysis and graphical analysis of the documents ([0037]). Accordingly, Peairs contemplates performing the above analysis on a received document which is used to train the document classifier. However, Peairs does not expressly detail the training. That is, Peairs does not expressly disclose the received electronic document is one on which to train a machine learning classifier for the multi-modal electronic document classification. Peairs also does not expressly disclose that the processor is calculate a correlation between the textual data representation and the visual data representation; generate a fusion representation based on the correlation, the textual data representation, and the visual data representation, using a cost function that minimizes an error of reconstruction, minimizes an error of cross-reconstruction, and maximizes the correlation or that the classification of the new document is based on the fusion representation or applied using a single input source of either text reconstructed from image or image reconstructed text of the new electronic document. 
Chandar, like Peairs, is directed to learning features over multiple modalities (used interchangeably with “views”) including audio, video, images, and text (see at least Chapter 1.2 first and last paragraphs). Chandar proposes the concept of learned common representations which can help in learning across views. For example, a common representation predictive of a second view can improve the performance of a classifier if it only has access to a first view, particularly if: 1) the second view is more representative of the classes but is less accessible or 2) if there is not enough training data in the first view (Chapter 1.2). Thus, Chandar discloses forming learned common representations of multi-modal data, and performing training and testing of a model based on the learned common representations, the testing involving cross-reconstruction of one view from the other view for improving the performance of a classifier (Chapters 1.2, 3 and 5). Chandar further discloses that the common representation of the views if performed by calculating a correlation between the views (Chapter 3.3). Chapter 3.3 discloses an objective function which minimizes an error of reconstruction, minimizes an error of cross-reconstruction, and maximizes the correlation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Peairs to generate a shared representation of the multi-
The proposed combination of Peairs and Chandar does not expressly disclose that the textual data representation is determined by applying, by the computer, a first neural network to the raw text or that the visual data representation is determined by applying, by the computer, a second neural network to the raster image. 
Stanton, like Peairs, is directed to document classification based on text and images (Abstract and [0022]). Stanton discloses applying a neural network to the text of the dataset, applying a neural network to the images of the dataset, and combining the results to be input to a joint processing neural network ([0035-0040]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Peairs and Chandar to apply separate neural networks to text and visual data, respectively, as taught by Stanton, to arrive at the claimed invention discussed above. Such a modification is the result of combining 
Claims 11 and 13 recite features nearly identical to those recited in claims 2 and 5, respectively. Accordingly, claims 11 and 13 are rejected for reasons analogous to those discussed above in conjunction with claims 2 and 5, respectively.
As to claim 14, the proposed combination of Peairs, Chandar, and Stanton further teaches that the first neural network is the same as the second neural network ([0037-0039] of Stanton contemplates using the same neural networks for the text and visual data processing).


As to independent claim 19, Peairs discloses a machine readable medium having stored thereon instructions for performing a method comprising machine executable code which when executed by at least one machine ([0017] discloses a non-transitory computer-readable medium storing executable instructions; [0021] discloses a processor coupled to memory), causes the machine to: extract raw text and an image from an electronic document; use the raw text to determine a textual data representation of the electronic document ([0040] discloses receiving a document; [0042] discloses performing textual analysis on text extracted from the received document to determine a textual representation of the document; see also Fig. 5 and [0049-0052]); use the image to determine a visual data representation of the electronic document ([0043] discloses performing graphical analysis on an image of the document to determine a visual representation of the document; see also Fig. 6 and [0053-0057]); and apply the machine learning classifier based on the textual data representation and the visual data representation to classify a new electronic document ([0038] discloses classifying newly received documents based on the model formed by the text and image representations). 
Peairs discloses that the system is trained to recognize the types of documents based on the above described analysis including both textual analysis and graphical analysis of the documents ([0037]). Accordingly, Peairs contemplates performing the above analysis on a received document which is used to train the document classifier. However, Peairs does not expressly detail the training. That is, Peairs does not expressly disclose the received electronic document is one on which to train a machine learning classifier. Peairs also does not expressly disclose that the processor is configured to generate a fusion representation based on the textual data representation, the visual data representation, and a correlation between the textual data representation and the visual data representation or that the classification of the new documents is based on the fusion representation or applied using a single input source of either text reconstructed from image or image reconstructed text of the new electronic document. 
Chandar, like Peairs, is directed to learning features over multiple modalities (used interchangeably with “views”) including audio, video, images, and text (see at least Chapter 1.2 first and last paragraphs). Chandar proposes the concept of learned common representations which can help in learning across views. For example, a common representation predictive of a second view can improve the performance of a classifier if it only has access to a first view, particularly if: 1) the second view is more representative of the classes but is less accessible or 2) if there is not enough training data in the first view (Chapter 1.2). Thus, Chandar discloses forming learned common representations of multi-modal data, and performing training and testing of a model based on the learned common representations, the testing involving cross-reconstruction of one view from the other view for improving the performance of a classifier (Chapters 1.2, 3 and 5). Chandar further discloses that the common representation of the views if performed by calculating a correlation between the views (Chapter 3.3). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Peairs to generate a shared representation of the multi-modal representations based on a correlation between multimodal data in order to train a model, and to use the trained model based on the shared representation to reconstruct a second view (e.g., image or text) from a first view (e.g., the other of image or text) for classification, as taught by Chandar, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The 
The proposed combination of Peairs and Chandar does not expressly disclose that the textual data representation is determined by applying, by the computer, a first neural network to the raw text or that the visual data representation is determined by applying, by the computer, a second neural network to the raster image. 
Stanton, like Peairs, is directed to document classification based on text and images (Abstract and [0022]). Stanton discloses applying a neural network to the text of the dataset, applying a neural network to the images of the dataset, and combining the results to be input to a joint processing neural network ([0035-0040]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Peairs and Chandar to apply separate neural networks to text and visual data, respectively, as taught by Stanton, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to better characterize the text and images.
Claim 22 recites features nearly identical to those recited in claim 5. Accordingly, claim 22 is rejected for reasons analogous to those discussed above in conjunction with claim 5.
As to claim 24, the proposed combination of Peairs, Chandar, and Stanton further teaches machine executable code which when executed by the at least one machine causes the machine to: calculate the correlation between textual data representation and visual data representation (Sections 2-3 of Chandar discloses calculating the correlation between the multimodal representations of data). 
Claim 26 recites features nearly identical to those recited in claim 8. Accordingly, claim 26 is rejected for reasons analogous to those discussed above in conjunction with claim 8.


Claims 3, 12, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Peairs in view of Chandar and Stanton and further in view of “Recurrent Neural Network for Text Classification with Multi-Task Learning” to Liu et al. (cited in the IDS filed 11 February 2019; hereinafter “Liu”).
As to claim 3, the proposed combination of Peairs, Chandar, and Stanton does not expressly disclose that the applying the first neural network further comprises: generating, by the computer, the textual data representation of said extracted text as a fixed length vector. 
Liu, like Peairs, is directed to document classification (Abstract). Section 2 of Liu discloses applying a recurrent neural network to the extracted text to represent the text as a fixed-length vector. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Peairs, Chandar, and Stanton to represent the textual data as a fixed-length vector, as taught by Liu, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to better characterize the text in the document.

Claims 12 and 21 recite features nearly identical to those recited in claim 3. Accordingly, claims 12 and 21 are rejected for reasons analogous to those discussed above in conjunction with claim 3.

Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ramamurthy (U.S. Patent Application Publication No. 2019/0324781) discloses a Deep Auto Encoder which classifies an input document in image format after converting the image of the input document into a text document using Optical Character Recognition.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486.  The examiner can normally be reached on noon - 8:30 PM Monday through Thursday and Saturday.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/SEAN M CONNER/Primary Examiner, Art Unit 2663