DETAILED ACTION
This action is responsive to the application filed 12/11/2020.
Claims 1-20 are pending.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 2, 4-7, 9-11, 13-16, 18 and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yih, et al., U.S. PGPUB No. 2012/0323968 (“Yih”).
Yih teaches a system and method for identifying similar documents. With regard to Claim 1, Yih teaches a duplicate document detection method of a computer apparatus including processing circuitry, the method comprising: 
extracting, by the processing circuitry, a similar document pair set and a dissimilar document pair set from a document database, the similar document pair set including a plurality of similar document pairs having a common attribute, and the dissimilar document pair set including a plurality of dissimilar document pairs extracted randomly ([0021] describes that known pairs of documents can be labeled to indicate a degree of similarity, including similar and dissimilar pairs); 
calculating, by the processing circuitry, a mathematical similarity for each of the plurality of similar document pairs and each of the plurality of dissimilar document pairs using a mathematical measure to obtain a first plurality of mathematical similarities based on the plurality of the similar document pairs and a second plurality of mathematical similarities based on the plurality of dissimilar document pairs ([0021] describes that a projection matrix is trained using known pairs of documents. [0045]-[0046] describe that the training of parameters for the process of mapping text vectors to concept vectors uses text object label data, which is an evaluation of the similarity of the text objects of the pair of documents); 
calculating, by the processing circuitry, a semantic similarity for each of the plurality of similar document pairs and each of the plurality of dissimilar document pairs to obtain a first plurality of semantic similarities based on the plurality of similar document pairs and a second plurality of semantic similarities based on the plurality of dissimilar document pairs, each of the first plurality of semantic similarities being higher than a corresponding one of the first plurality of mathematical similarities, and each of the second plurality of semantic similarities being lower than a corresponding one of the second plurality of mathematical similarities ([0045]-[0046] describe that the text vectors are mapped to concept vectors, for which a similarity score is determined. A loss function between the label and similarity score is calculated, indicating that the two values differ for each of the pairs of documents); 
training, by the processing circuitry, a similarity model based on the plurality of similar document pairs, the plurality of dissimilar document pairs, the first plurality of semantic similarities and the second plurality of semantic similarities to obtain a trained similarity model ([0047] describes that model parameters are adjusted to minimize the error value calculated by the loss function); and 
detecting, by the processing circuitry, a duplicate document using the trained similarity model ([0048] describes that the optimized set of parameters can then be used in comparing a plurality of text objects to identify duplicate or near-duplicate documents).
Claim 10 recites a medium storing instructions which are executed to perform the method of Claim 1, and is similarly rejected. Claim 11 recites a computer apparatus that carries out the method of Claim 1, and is likewise rejected.
With regard to Claim 2, Yih teaches that the common attribute comprises at least one of an author of a document, a post section of the document, or a registration time range of the document. [0021] teaches that documents can have similarity scores indicating any potential level of similarity. Therefore, documents pairs used to train the system can have the same author as well as a same section of content.
With regard to Claim 4, Yih teaches that the training the similarity model comprises: sequentially inputting each of the plurality of similar document pairs and each of the plurality of dissimilar document pairs to the similarity model as a respective input document pair; and training the similarity model to minimize a mean squared error (MSE) between each respective output value of the similarity model and a corresponding semantic similarity among the first plurality of semantic similarities and the second plurality of semantic similarities corresponding to the respective input document pair. 
[0038]-[0039] describes that training the model can use a loss function calculated for parameters used to map text objects, where loss can be calculated and minimized using a mean squared error function. Fig. 3 shows that model parameters can be updated and new similarity scores calculated for use in each iteration of training the model parameters.
Claim 13 recites a computer apparatus that carries out the method of Claim 4, and is similarly rejected. Claim 18 recites a computer apparatus that carries out the method of Claim 4, and is likewise rejected.
With regard to Claim 5, Yih teaches that the detecting the duplicate document comprises: extracting a plurality of candidate document pairs from a document set in which a duplicate is to be detected; calculating a respective semantic similarity of each of the plurality of candidate document pairs by sequentially inputting each of the plurality of candidate document pairs to the similarity model; and determining one of the plurality of candidate document pairs for which the respective semantic similarity is greater than or equal to a threshold as including duplicate documents. 
[0048]-[0050] describes that an input document and documents which are being compared are converted from raw text vectors to concept vectors using the model and trained parameters. Similarity scores are compared and ranked for each of the input documents, and matching documents can be determined as duplicates or near-duplicates.
Claim 14 recites a computer apparatus that carries out the method of Claim 5, and is likewise rejected.
With regard to Claim 6, Yih teaches that the extracting the plurality of candidate document pairs comprises extracting a plurality of subsets among subsets of the document set as the plurality of candidate document pairs, each of the plurality of subsets having two elements. [0048] describes that sets of documents are extracted, and pairs of documents compared in sets of two for determining similarity.
With regard to Claim 7, Yih teaches extracting a plurality of candidate document pairs that each comprise a respective document among a plurality of registered documents and a new document in response to a request for registering the new document; calculating a respective semantic similarity of each of the plurality of candidate document pairs by sequentially inputting each of the plurality of candidate document pairs to the similarity model; determining one or more of the plurality of candidate document pairs for which the respective semantic similarity is greater than or equal to a first threshold as representing one or more duplicate document determinations; and determining the new document to be the duplicate document based on a number of the one or more duplicate document determinations being greater than or equal to a second threshold.
[0041] describes that once the model parameters are trained, the model can be used to compare unknown text objects. [0048]-[0050] describes that an input document and documents which are being compared are converted from raw text vectors to concept vectors using the model and trained parameters. Similarity scores are compared and ranked for each of the input documents, and matching documents can be determined as duplicates or near-duplicates.
Claim 15 recites a computer apparatus that carries out the method of Claim 7, and is likewise rejected.
With regard to Claim 9, Yih teaches that the mathematical measure is at least one of a cosine similarity, a Euclidean distance, or a Jaccard similarity. [0046] describes that label similarity data can be determined automatically. [0019] describes cosine or Jaccard similarity as ways to determine similarity.
With regard to Claim 16, Yih teaches that training the similarity model to minimize the MSE comprises inputting the respective output value of the similarity model and the corresponding semantic similarity to a loss function using the MSE. [0039] describes that loss can be calculated using a mean squared error function. [0046] describes that a loss function is applied to the similarity score calculated using the concept vectors output by the model.
Claim 19 recites a computer apparatus that carries out the method of Claim 16, and is likewise rejected.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 8, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yih, in view of Stein, U.S. PGPUB No. 2011/0055332 (“Stein”).
With regard to Claim 8, Yih, in view of Stein teaches displaying a Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) instead of registering the new document in response to determining the new document is the duplicate document. Yih teaches detecting duplicate documents, as described above. Stein teaches at [0042]-[0043] that that a similarity of a message to existing document data can be calculated. If similarity is higher than a threshold, the message is determined to match anther document, and a CAPTCHA can be sent to the message sender, instead of immediately publishing the message.
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine Yih with Stein. Stein teaches a specific useful application for duplicate document detection. Therefore, one of skill in the art would seek to combine Stein with Yih, in order to improve the system of Yih by applying duplicate document detection to a practical application.
Claim 20 recites a computer apparatus that carries out the method of Claim 8, and is likewise rejected.
With regard to Claim 17, Stein teaches blocking registration of the duplicate document. [0043] describes that a message determined as matching another document can be blocked from routing, publishing, or access by applications. It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine Yih with Stein. Stein teaches a specific useful application for duplicate document detection. Therefore, one of skill in the art would seek to combine Stein with Yih, in order to improve the system of Yih by applying duplicate document detection to a practical application.

Allowable Subject Matter
Claims 3 and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH D BLOOMQUIST whose telephone number is (571)270-7718. The examiner can normally be reached M-F, 8:30-5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 571-272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KEITH D BLOOMQUIST/Primary Examiner, Art Unit 2178                                                                                                                                                                                                        

9/8/2022