DETAILED ACTION
	Claims 1-22 rejected under 35 USC § 103.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-21 are rejected under 35 U.S.C. 103 as being unpatentable over Ramezani et al., U.S. PG-Publication No. 2022/0147814 A1, in view of Wu et al., U.S. PG-Publication No. 2019/0287685 A1.

Claim 1
	Ramezani discloses a training method of a document sentence concept labeling system. Ramezani discloses "a method for training a computer implemented neural network for performing a processing task on regulatory content." The method comprises steps of "fine-tuning [a] language model using regulatory content training data to generate a regulatory content language embedding output" and "configuring at least one task specific output layer to generate task specific results in response to receiving the regulatory content language embedding output from the language model." Ramezani, ¶ 4. One task specific output layer is "a requirement classification output layer" for classifying identifying text.  Id. at ¶ 19. Ramezani discloses a "requirement extraction system 700" comprising a "requirement classifier 702" configured to "generate a classification output 704 based on the output of the language model 102." The classification output 704 labels a sentence with "REQ" if it includes a non-optional requirement, "ORR" is it includes an optional or recommended requirement, or "DSC" if it includes descriptive language related to a requirement. Id. at ¶ 73. Accordingly, the output of requirement extraction system 700 comprises sentences labeled with the concepts REQ, ORR, or DSC.
	Ramezani discloses the method, comprising: receiving a plurality of labeled documents, each of which is labeled one or more sentence sets corresponding to one or more sentence concepts. The neural network comprising a "requirement classification output layer" is trained "suing training data including text sequences that are labeled." Id. at ¶ 19. Ramezani discloses that training requirement extraction system 500 uses "a set of sentences that are labeled as REQ, ORR, or DSC" that "are input as the labeled task specific training data set 510." Id. at ¶ 73.
	Ramezani discloses generating a start position and an end position of each of the sentence sets in each of the labeled documents. Ramezani discloses that requirements system 700 "receives sentences of regulatory content 706" as an input," wherein the input "includes sequences of tokens corresponding to sentences or text sequences." In a BERT implementation of "regulatory content language model 102," a special token [CLS] "is used to denote the start of each sequence" (i.e. start position of each sentence set) and a special token [SEP] "is used to indicate separation between sentences or text sequences" (i.e. end position of each sentence set). Id. at ¶ 72.
	Ramezani discloses inputting each of the [labeled] documents into a pre-trained language model to obtain a set of word embeddings of each of the generated documents. Ramezani discloses that "a regulatory content language model 102 … receives a input of regulatory content data 104 and generates a language embedding output 106 representing the semantic and syntactic meaning of words in the regulatory content," wherein "the meaning of each word may be expressed as a vector having a plurality of values." Id. at ¶¶ 38-39. Ramezani discloses a "process for training the regulatory content language model 102 … by configuring a generic language model on the training system 300." The generic language model may be "implemented using a pre-trained language model, such as Google's BERT." Id. at ¶ 46. 
	Ramezani discloses inputting the sets of word embeddings, the start positions and the end positions of the generated documents into a document analysis model for performing a training procedure of the document analysis model, wherein the document analysis model is used to label the sentence concepts in an unlabeled document. [FUNCTIONAL LANGUAGE: INTENDED USE] This limitation employs functional language because it recites a feature by what it does rather than by what it is. MPEP 2173.05(g). The broadest reasonable interpretation of this limitation is a step of "inputting the sets of word embeddings, the start positions and the end positions of the generated documents into a document analysis model for performing a training procedure of the document analysis model." The recitation of an intention for using the document analysis model to "label the sentence concepts in an unlabeled document"  has no patentable weight because it merely states an intended use for the "document analysis model." See MPEP 2111.04.
	Nonetheless, Ramezani discloses that the "training configuration 500 … includes one or more task specific neural network layers 504, which are configured to receive the language embedding output 502 and generate a task specific result" (i.e. input the sets of word embeddings). Id. at ¶ 53. Further, Ramezani discloses training requirement extraction system 700 using the training configuration 500 using "a set of sentences that are labeled as REQ, ORR, or DSC" as the input labeled task specific training data set 510. (i.e. input documents with labeled sentences, the start and end positions denoted by [CLS] and [SEP] tokens). The trained requirements classifier 702 "is configured as a softmax classifier, which receives the  regulatory model 102 output and generates classification output probabilities 704 that add up to 1.00." The classification output 704 comprises sentences labeled (i.e. classified) as REQ, ORR, or DSC. Id. at ¶¶ 73-74.
	Ramezani does not expressly disclose changing orders of the sentence sets in each of the labeled documents, and updating the start positions and the end positions in each of the labeled documents, to obtain a plurality of generated documents, each of which is labeled the sentence sets.
	Wu discloses changing orders of the sentence sets in each of the labeled documents, and updating the start positions and the end positions in each of the labeled documents, to obtain a plurality of generated documents, each of which is labeled the sentence sets. Wu discloses a method of classifying a "history of present illness" (HPI) text string comprising one or more words organized into sentences. Wu, ¶ 6. In one embodiment, an HPI classifier 104 comprises a preprocessor 202 for pre-processing text prior to classification. Preprocessor 202 further comprises an example sentence reorderer 209 that "shuffles the order of each sentence of the HPI 108 around into a random order." Id. at ¶ 38. Particularly, the sentence reorderer 209 "parses the input HPI 108 to determine sentences (e.g., by punctuation, capital, or any other suitable method to parse a text string into sentences)" and then "randomly shuffles the ordering of the sentences in the HPI 108." Id. at ¶ 73.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the text classification method of Ramezani to incorporate randomizing sentence order as taught by Wu. One of ordinary skill in the art would be motivated to integrate randomizing sentence order into Ramezani, with a reasonable expectation of success, in order to "remove any potential unintentional effects that sentence ordering may have on the [text] classification." See Wu, ¶ 38.

Claim 2
	Wu discloses wherein one of the sentence sets contains more than one sentence. Wu discloses an embodiment wherein "the sentence reorderer 209 can … concatenate the parsed sentences into a single text string." Wu, ¶ 73.
	Further, Ramezani discloses that the [CLS] and [SEP] tokens are used to designate the start and end of "sentences of text sequences." Ramezani, ¶ 72. One of ordinary skill in the art would recognize that the "text sequence" could comprise a text string generated by concatenating at least two sentences, as taught by Wu.

Claim 3
	Wu discloses wherein the document analysis model receives full text of the generated documents. Wu discloses that a "data source 102 outputs an unprocessed HPI 108 which his transferred to … the HPI classifier 104." Wu, ¶ 34. Accordingly, the unprocessed HPI 108 is the full text document. Further, sentence reorderer 209 "shuffles the order of each sentence of the HPI 108 around into a random order." Id. at ¶ 38. Accordingly, the shuffled text are generated documents comprising the full text HPI 108.

Claim 4
	Ramezani discloses wherein the document analysis model predicts the start positions and the end positions of the sentence sets in the unlabeled document. Ramezani discloses that requirements system 700 "receives sentences of regulatory content 706" as an input," wherein the input "includes sequences of tokens corresponding to sentences or text sequences." In a BERT implementation of "regulatory content language model 102," a special token [CLS] "is used to denote the start of each sequence" (i.e. start position of each sentence set) and a special token [SEP] "is used to indicate separation between sentences or text sequences" (i.e. end position of each sentence set). Id. at ¶ 72.

Claim 5
	Wu discloses wherein the labeled documents are not inputted into the document analysis model, when performing the training procedure. Wu discloses a method of classifying a "history of present illness" (HPI) text string comprising one or more words organized into sentences. Wu, ¶ 6. In one embodiment, an HPI classifier 104 comprises a preprocessor 202 for pre-processing text prior to classification. Preprocessor 202 further comprises an example sentence reorderer 209 that "shuffles the order of each sentence of the HPI 108 around into a random order." Id. at ¶ 38. Particularly, the sentence reorderer 209 "parses the input HPI 108 to determine sentences (e.g., by punctuation, capital, or any other suitable method to parse a text string into sentences)" and then "randomly shuffles the ordering of the sentences in the HPI 108." Id. at ¶ 73. Accordingly, after the random reordering, the re-ordered generated documents are input into the model, instead of the original ordered labeled documents (i.e. the labeled documents are not input into the model).

Claim 6
	Ramezani discloses wherein the pre-trained language model is a BERT model, an ALBERT model, an XLNet model, a RoBERTa model, a DeBERTa model, or a compressed, a simplified or a pruned version of any of the above model. Ramezani discloses that "a regulatory content language model 102 … receives a input of regulatory content data 104 and generates a language embedding output 106 representing the semantic and syntactic meaning of words in the regulatory content," wherein "the meaning of each word may be expressed as a vector having a plurality of values." Id. at ¶¶ 38-39. Ramezani discloses a "process for training the regulatory content language model 102 … by configuring a generic language model on the training system 300." The generic language model may be "implemented using a pre-trained language model, such as Google's BERT." Id. at ¶ 46.

Claim 7
	Ramezani discloses wherein the document analysis model contains … a Softmax layer. In one embodiment, "a final output player … may be configured as a softmax layer." Ramezani, ¶¶ 54-55; 74
	Wu discloses wherein the document analysis model contains a dense layer. Wu discloses that the method for classifying a text string comprises an embedding layer "to embed the hashes into dense vectors" and an LSTM layer "to convert the dense vectors into an activated output vector." Wu, ¶¶ 5; 46; 58.

Claim 8
	Ramezani discloses a labeling method of a document sentence concept labeling system. Ramezani discloses a "requirement extraction system 700" comprising a "requirement classifier 702" configured to "generate a classification output 704 based on the output of the language model 102." The classification output 704 labels a sentence with "REQ" if it includes a non-optional requirement, "ORR" is it includes an optional or recommended requirement, or "DSC" if it includes descriptive language related to a requirement. Id. at ¶ 73. Accordingly, the output of requirement extraction system 700 comprises sentences labeled with the concepts REQ, ORR, or DSC. 
	Ramezani discloses the method, comprising: inputting an unlabeled document and one or more sentence concepts into a pre-trained language model to obtain a set of word embeddings of the unlabeled document. The regulatory text "in the plurality of documents may include unlabeled regulatory text." Id. at ¶ 8. Ramezani discloses that "a regulatory content language model 102 … receives a input of regulatory content data 104 and generates a language embedding output 106 representing the semantic and syntactic meaning of words in the regulatory content," wherein "the meaning of each word may be expressed as a vector having a plurality of values." Id. at ¶¶ 38-39. Ramezani discloses a "process for training the regulatory content language model 102 … by configuring a generic language model on the training system 300." The generic language model may be "implemented using a pre-trained language model, such as Google's BERT." Id. at ¶ 46.
	Ramezani discloses inputting the set of word embeddings of the unlabeled document into a document analysis model to obtain a start position and an end position of a sentence set corresponding to each of the sentence concepts in the unlabeled document. Ramezani discloses that requirements system 700 "receives sentences of regulatory content 706" as an input," wherein the input "includes sequences of tokens corresponding to sentences or text sequences." In a BERT implementation of "regulatory content language model 102," a special token [CLS] "is used to denote the start of each sequence" (i.e. start position of each sentence set) and a special token [SEP] "is used to indicate separation between sentences or text sequences" (i.e. end position of each sentence set). Id. at ¶ 72.
	Ramezani discloses obtaining each of the sentence sets. The trained requirements classifier 702 "is configured as a softmax classifier, which receives the  regulatory model 102 output and generates classification output probabilities 704 that add up to 1.00." The classification output 704 comprises sentences labeled (i.e. classified) as REQ, ORR, or DSC. Id. at ¶¶ 73-74.
	Ramezani does not expressly disclose obtaining each of the sentence sets according to each of the start positions and each of the end positions. Wu discloses a method of classifying a "history of present illness" (HPI) text string comprising one or more words organized into sentences. Wu, ¶ 6. In one embodiment, an HPI classifier 104 comprises a preprocessor 202 for pre-processing text prior to classification. Preprocessor 202 further comprises an example sentence reorderer 209 that "shuffles the order of each sentence of the HPI 108 around into a random order." Id. at ¶ 38. Particularly, the sentence reorderer 209 "parses the input HPI 108 to determine sentences (e.g., by punctuation, capital, or any other suitable method to parse a text string into sentences)" and then "randomly shuffles the ordering of the sentences in the HPI 108." Id. at ¶ 73. Accordingly, the sentence reorderer 209 obtains each sentence according to the sentence determination parsing (i.e. determining start and end of sentence).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the text classification method of Ramezani to incorporate randomizing sentence order as taught by Wu. One of ordinary skill in the art would be motivated to integrate randomizing sentence order into Ramezani, with a reasonable expectation of success, in order to "remove any potential unintentional effects that sentence ordering may have on the [text] classification." See Wu, ¶ 38.

Claim 9
	Wu discloses wherein one of the sentence sets contains more than one sentence. Wu discloses an embodiment wherein "the sentence reorderer 209 can … concatenate the parsed sentences into a single text string." Wu, ¶ 73.
	Further, Ramezani discloses that the [CLS] and [SEP] tokens are used to designate the start and end of "sentences of text sequences." Ramezani, ¶ 72. One of ordinary skill in the art would recognize that the "text sequence" could comprise a text string generated by concatenating at least two sentences, as taught by Wu.

Claim 10
	Ramezani discloses wherein the document analysis model receives full text of the unlabeled document. Ramezani discloses that requirement extraction system700 "receives sentences of regulatory content 706 as an input." Ramezani, ¶ 71. The "regulatory text in the plurality of documents include unlabeled regulatory text." Id. at ¶ 8.

Claim 11
	Ramezani discloses wherein the pre-trained language model is a BERT model, an ALBERT model, an XLNet model, a RoBERTa model, a DeBERTa model, or a compressed, a simplified or a pruned version of any of the above model. Ramezani discloses that "a regulatory content language model 102 … receives a input of regulatory content data 104 and generates a language embedding output 106 representing the semantic and syntactic meaning of words in the regulatory content," wherein "the meaning of each word may be expressed as a vector having a plurality of values." Id. at ¶¶ 38-39. Ramezani discloses a "process for training the regulatory content language model 102 … by configuring a generic language model on the training system 300." The generic language model may be "implemented using a pre-trained language model, such as Google's BERT." Id. at ¶ 46.

		Claim 12
	Ramezani discloses wherein the document analysis model contains … a Softmax layer. In one embodiment, "a final output player … may be configured as a softmax layer." Ramezani, ¶¶ 54-55; 74
	Wu discloses wherein the document analysis model contains a dense layer. Wu discloses that the method for classifying a text string comprises an embedding layer "to embed the hashes into dense vectors" and an LSTM layer "to convert the dense vectors into an activated output vector." Wu, ¶¶ 5; 46; 58.

Claims 13-20
	Claims 13, 14, 15, 16, 17, 18, 19, and 20 recite a system configured to perform the steps of the method recited in claims 1, 8, 2, 3, 10, 5, 6, and 7. Accordingly, claims 13-20 are rejected as indicated in the rejection of claims 1, 8, 2, 3, 10, 5, 6, and 7.

Claim 21
	Ramezani discloses wherein the unlabeled document is inputted into a relation extraction 
system to identify whether an entity relation pair holds in the sentence sets corresponding to the sentence concepts in the unlabeled document. Ramezani discloses a "regulatory conjunction classifier system 800" comprising a "regulatory language model 102" that "receives pairs of extracted requirements 802 as input" (as extracted from requirement extraction system 700). System 800 determines whether the pair of extractions has a parent-child relationship. Specifically, the classification output of a conjunction classifier 804 comprises "not_conjunction" representing that the pair "do not share a parent-child relationship," or "conjunction_single" and "conjunction_multiple" representing that the pair has a parent-child relationship with the child having either a single or multiple requirement, respectively. Ramezani, ¶¶ 75-79.


Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Ramezani, in view of Wu, further in view of Warren et al., U.S. Patent No. 9,645,988 B1.

Claim 22
	Warren discloses wherein the unlabeled document is inputted into a document retrieval 
system to identify whether a query condition is met based on the sentence sets corresponding to the sentence concepts in the unlabeled document. Warren discloses "a method for searching an electronic document for passages related to a concept being search for, where the concept is expressed as a … plurality of words." The method comprises the steps of "deconstructing … text … into a stream of features … wherein the features include the text of sentences" and "executing … a conditional random field algorithm to label sentences in the electronic document as either being relevant to the concept being searched for … or as background information." Warren, 1:53-2:10. Warren discloses that all sentences are labeled as either "forming part of the concept being searched" or as "to background information." Further, the "features and text of the sentence (as derived from natural language processing (NLP) extraction of individual words and tokens of the text) may be used as inputs to the CRF model to arrive at the label. A concept search is performed "which would return those sentences … labelled as forming part of state A" (state A representing part of the concept being searched). Id. at 6:12-25. Figure 3 illustrates an example with a document containing sentences S1-S5, wherein the user enters a concept query of "intellectual property," which is met by a sentence set of S1 and S4 as presented in the "results" section. Id. at 6:26-55; FIG. 3.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the sentence concept labeling method of Ramezani-Wu to incorporate the searching of documents using sentence concept labels as taught by Warren. One of ordinary skill in the art would be motivated to integrate searching of documents using sentence concept labels into Ramezani-Wu, with a reasonable expectation of success, in order to increase search result accuracy when identifying sentences and passages relating to a concept, because the method will "return search results that do not include the specific word or words used to express the concept." Warren, 7:16-24.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172. The examiner can normally be reached M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAVITA PADMANABHAN can be reached on (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        September 29, 2022