Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-3, 5, 7-10, 12 14-17 and 19 are presented for examination.
Claims 4, 6, 11, 13, 18 and 20 were cancelled. 
Claims 1, 8 and 15 were amended.
This is a Non-Final Action.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/04/2020 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3, 5, 7-10, 12 14-17 and 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Drawings
The drawings are objected to under 37 CFR 1.83(a) because they fail to show Fig 3a:302, 304, 208, 210 elements as described in the specification.  Any structural detail that is essential for a proper understanding of the disclosed invention should be shown in the drawing. MPEP § 608.02(d). Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 rejected under 35 U.S.C. 103 as being unpatentable over Sanchez et al. (US 2018/0032874 – IDS) in view of Haas et al. (US 7,836,356) further in view of Shmelev (US 2009/0116758)

1. Sanchez teaches, A computer-implemented method for document vectorization (Abstract), the method comprising:
receiving, by a processor, a plurality of documents each having a plurality of words (Fig 4:400 – receiving first document, containing sub-documents);
generating, by the processor utilizing a vector embedding engine, a plurality of vectors, wherein each vector in the plurality of vectors represent each word of the plurality of words in the plurality of documents, wherein each vector corresponds to an encoded representation of a word within the plurality of documents (Fig 4:405 - Fig 4: 405 – each of the sub-documents have each vector; Paragraph 32 – teaches Word2vec algorithm which is utilized to encoded representation of a word in a plurality of documents);
generating a word probability for each of the plurality of words in the plurality of documents (Abstract, Paragraph [0032] - Probability distribution);
Sanchez does not teach or disclose, 

performing a compression operation on each of the document images to produce a compact representation for the plurality of documents, wherein the compression operation is performed by a convolutional auto-encoder and wherein the compact representation for the plurality of documents compresses a document vector for each document in the plurality of documents.
However, Haas teaches, creating a document image representation for each document in the plurality of documents by creating  a QxQ matrix comprising a plurality of cells; plotting, by the processor using machine learning algorithm, each vector in the plurality of vectors in the QxQ matrix, wherein at least one cell in the plurality of cells includes a word probability associated with the one word in a document (Col 3: lines 53-64 – teaches for each stream m (document, words), utilizing a QxQ table wherein each cell corresponds to a rectangular region of space defining a bin); and
Shmelev teaches, 
performing a compression operation on each of the document images to produce a compact representation for the plurality of documents, wherein the compression operation is performed by a convolutional auto-encoder and wherein the compact representation for the plurality of documents compresses a document vector for each document in the plurality of documents (Abstract, Paragraphs 12 & 34-37 – teaches for each document generating n-dimensional code vector which are input to a corresponding image compression unit, which compresses the 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to allow Sanchez’s  to incorporate the teaches of Haas and Shmelev, because all the prior arts are in the same field of endeavor of data processing and by incorporating the teachings of Hass and Shmelev would enhance Sanchez’s invention by allowing it to further analyze documents based on word association and further allow for compression of documents by utilizing document images.

2.    Sanchez, Haas and Shmelev teach, document processing and vectorization (Abstract, Sanchez and Paragraph [0008], Chandramouli).  Sanchez, Chandramouli and Zaima do not explicitly teach, removing stop words from each of the plurality of documents prior to producing the plurality of vectors.  However, Official notice is taken.  Removal of stop gap words during data processing is a common practice and it would be utilized explicitly or implicitly in any situation wherein data processing is taking place in view of clustering, classification or analysis. It would have been obvious before the effective filing date of the claimed invention, to a person having ordinary skill in the art, to utilize stop word removal during data analysis, thus, provides sufficient rationale to support a conclusion of obviousness (See MPEP 2141 Section III citing KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385, 1396 (U.S. 2007)).  


  Sanchez, Haas and Shmelev teach, The computer-implemented method of claim 1, wherein each vector corresponds to an encoded representation of a word within the plurality of documents (Paragraph [0003], Sanchez).

5.    Sanchez, Haas and Shmelev teach, The computer-implemented method of claim 1, wherein generating the plurality of vectors is performed by a Word2Vec model (Paragraph [0032], Sanchez).

7.      Sanchez, Haas and Shmelev teach, The computer-implemented method of claim 1, wherein: each image comprises a plurality of cells; and the position for each word corresponds to a cell in the plurality of cells (Col 3: lines 53-64 – teaches for each stream m (document, words), utilizing a QxQ table wherein each cell corresponds to a rectangular region of space defining a bin);, Haas)

Claims 8-10, 12, 14-17 and 19 are similar to claims 1-3, 5, and 7 hence rejected similarly.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMRESH SINGH whose telephone number is (571)270-3560.  The examiner can normally be reached on Monday-Friday 8am-5pm.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Amresh Singh/
Primary Examiner, Art Unit 2159