DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4, 10, and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 4:
Claim 4 recites the limitation "wherein each of the different threshold values are generated by using grid search" in lines 1-2.  There is insufficient antecedent basis for this limitation in the claim, and therefore claim 4 is rejected as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is not clear to which threshold values the limitation refers. For the purposes of this office action, “the different threshold values” are interpreted as generated threshold values of claim 2.

	Regarding claims 10 and 17, taking claim 10 as exemplary:
	Both claim 10 and 17 recites “wherein the dataset is a group of documents and the first, second, and third classification models are configured to assign labels to each document based on natural language processing performed in connection with each one of the group of documents.” However, it is unclear if the “each one of the group of documents is separate from the dataset of a single group of documents or if there are other groups of documents for with claims 10 and 17 are reciting. Therefore, claims 10 and 17 are rejected as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is not clear to which threshold values the limitation refers. For the purposes of this office action, “each one of the group of documents” are interpreted as each within a singular group of documents that reflects the recitation “wherein the dataset is a group of documents” within the claims.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 5-8, 10-14, and 16-19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Rusk et al., (US 2022/0058496 A1, hereinafter Rusk).
Regarding claims 1 and 12, taking claim 1 as exemplary:
Rusk shows:
“A computer system comprising: electronic data storage configured to store a plurality of classification models each configured to predict whether one or more labels applies to one or more members of a dataset;” (Paragraph [0068]: “The decision in step 104 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the first subset of classifiers, out of a plurality of classifiers in a first mashup 110, may compute a predicted label according to the classifiers' various methodologies. During the first mashup 110, the predetermined number of classifiers may be a majority of classifiers. For example, given the neural network classifier, the elastic model, and the XGBoost model, the labels of two classifiers may be used in determining whether or not the classifiers agree to a label. In some embodiments, a first number of the selected subset of classifiers may classify the document with a first classification. In alternate embodiments, a second number of the selected subset of classifiers may classify the document with a second classification.” In paragraph [0069]: “The classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 106. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 112.” In paragraph [0070]: “The decision in step 106 depends on a computing device determining whether the label is meaningful. Meaningful labels may be used to identify the pages in a single document and may include, but are not limited to: title page, signature page, first pages, middle pages, end pages, recorded pages, etc. Further, meaningful labels may be used to identify documents from one another and may include, but are not limited to: document 1, document 2, document 3, page 1, page 2, page 3, etc., where a user could use the document labels to map the digitally scanned document to a physical document. In other embodiments, the classifier may return specific labels such as: title of document 1, title of document 2, etc., where the “title” portion of the label would correspond with the title of a physical document. In some embodiments, a classifier may be unable to classify a document, returning a label that is not meaningful. In one example, a label that may be returned that is not meaningful is the label “Unknown.” In response to a label that may not be meaningful, the process proceeds to step 112.” And in paragraph [0151]: “The computing device 700 may further include a storage device, such as one or more hard disk drives or redundant arrays of independent disks, for storing an operating system and other related software, and for storing application software programs such as any program or software 720 for implementing (e.g., configured and/or designed for) the systems and methods described herein.” – The labels to documents of Rusk are the labels to members of a data set.)
“at least one hardware processor configured to: retrieve the dataset; execute a first processing instance that runs a first classification model of the plurality of classification models against the dataset, the first classification model configured to assign a first label to members of a first portion of the dataset;” (Paragraph [0129]: “System 606 may receive the stack of documents 604 from client 602 and classify the documents 604. A processor 608 may be the logic in a device that receives software instructions. A central processing unit (“CPU”) may be considered any logic circuit that responds to and processes instructions.” And in paragraph [0068]: “The decision in step 104 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the first subset of classifiers, out of a plurality of classifiers in a first mashup 110, may compute a predicted label according to the classifiers' various methodologies.” And in paragraph [0070]: “The decision in step 106 depends on a computing device determining whether the label is meaningful. Meaningful labels may be used to identify the pages in a single document and may include, but are not limited to: title page, signature page, first pages, middle pages, end pages, recorded pages, etc. Further, meaningful labels may be used to identify documents from one another and may include, but are not limited to: document 1, document 2, document 3, page 1, page 2, page 3, etc., where a user could use the document labels to map the digitally scanned document to a physical document. In other embodiments, the classifier may return specific labels such as: title of document 1, title of document 2, etc., where the “title” portion of the label would correspond with the title of a physical document. In some embodiments, a classifier may be unable to classify a document, returning a label that is not meaningful. In one example, a label that may be returned that is not meaningful is the label “Unknown.” In response to a label that may not be meaningful, the process proceeds to step 112.” Paragraph [0075]: “In step 112, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the first mashup 110. – The labels from the first mashup is the first label to members.)
“execute a second processing instance that runs a second classification model of the plurality of classification models against the first portion of the dataset, the second classification model configured to assign at least one of a second and a third label to each of the members of the first portion of the dataset;” (Paragraph [0075]: “In step 112, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the first mashup 110. Thus, a second mashup may be performed. The second mashup may be considered a second iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the second mashup 120 may be different from the number of classifiers employed to label the document in the first mashup 110. Further, the classifiers employed in the second mashup 120 may be different from the classifiers employed in the first mashup 110. For example, the classifiers employed in the second mashup 120 may be a second subset of classifiers, the second subset of classifiers including a neural network, as discussed above, an elastic model, as discussed above, an XGBoost model, as discussed above, an Automated machine learning model, and a Regular Expression (“RegEx”) classifier, or any combination of these or other third party models.” In paragraph [0088]: “The decision in step 114 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the second subset of classifiers, out of a plurality of classifiers in a second mashup 120, may compute a predicted label according to the classifiers' various methodologies. During the second mashup 120, the predetermined number of classifiers may be a minority of classifiers, the number of minority classifiers being at least greater than one classifier. For example, given the neural network classifier, the elastic search model, the XGBoost model, the automatic machine learning model, and the RegEx classifier, the labels of two classifiers may be used in determining whether or not the classifiers agree on a label. In some embodiments, a first number of the selected subset of classifiers may classify the document with a first classification. In alternate embodiments, a second number of the selected subset of classifiers may classify the document with a second classification.” And in paragraph [0089]: “The classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 116. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 124.” – The second mashup of different labels that may or may not agree are the second and third labels.)
“and execute a third processing instance that runs a third classification model of the plurality of classification models against those members of the first portion of the dataset that are assigned the second label, the third classification model configured to assign at least a fourth label to those members of the first portion of the dataset that are also assigned the second label,” (Paragraph [0095]: “In step 124, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the second mashup 120 or the first mashup 110. Thus, a third mashup 130 may be performed. The third mashup 130 may be considered a third iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the third mashup 130 may be different from the number of classifiers employed to label the document in second mashup 120 and the first mashup 110. Further, the classifiers employed in the third mashup 130 may be different from the classifiers employed in the second mashup 120 and the first mashup 110. For example, the classifiers employed in the third mashup 130 may be a third subset of classifiers, the third subset of classifiers including a neural network, as discussed above, an elastic search model, as discussed above, an XGBoost model, as discussed above, an automated machine learning model, as discussed above, and a Regular Expression (RegEx) classifier, as discussed above.” – The third mashup to determine a label due to not predicting of the same label or meaningful label of the first and second mashup of Rusk is the fourth label.)
“wherein assignment of the first, second, third, and/or fourth labels to a member of the dataset is based on a classification probability value for the member being greater than a threshold value.” (Paragraph [0071]: “In addition to a classifier returning a document label, a classifier may return a confidence score. The confidence score may be used to indicate the classifier's confidence in the classifier's label classification. In some embodiments, the classifier's confidence score may be determined based on the classification. For example, as discussed herein, classifiers may employ a softmax classifier to transform a numerical output produced by a model into a classification and subsequent label. The softmax classifier may produce a classification label based on a probability distribution utilizing the predicted numerical values, over several output classes. A label may be chosen based on the probability distributions such that the label selected may be the label associated with the highest probability in the probability distribution. In one embodiment, the confidence score may be the probability, from the probability distribution, associated with the selected label.” And in paragraph [0072]: “The confidence score associated with the selected label may be compared to a threshold. In response to the confidence score exceeding the threshold, the label may be considered a meaningful label and the process may proceed to step 150. In response to the confidence score not exceeding the threshold,” – The threshold values based on probability of Rusk are the thresholds with classification probability.)

Regarding claims 2 and 13, taking claim 2 as exemplary:
Rusk shows the system and method of claims 1 and 12 as claimed and specified above.
And Rusk shows “wherein each of the plurality of classification models is associated with a generated threshold value that is based on the corresponding one of the plurality of classification models.” (Paragraph [0091]: “In addition to a classifier returning a document label, as discussed above, a classifier may return a confidence score. The confidence scores may be compared to a threshold.” In paragraph [0092]: “As discussed above, each classifier in a plurality of classifiers may have their own threshold value. In some embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have the same threshold value. In alternate embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have different threshold values.” And in paragraph [0099]: “In some embodiments, the selected classifiers for the third mashup 130 may be the same as the selected classifiers in the preceding mashups. In alternate embodiments, the same selected classifiers for the third mashup 130 may be retrained because of the incorporation of historic data.” – The threshold values for the classifies of Rusk are the plurality of classification models associated with generated threshold values.)

Regarding claims 3 and 14, taking claim 3 as exemplary:
Rusk shows the system and method of claims 2 and 13 as claimed and specified above.
And Rusk shows “wherein all of the plurality of classification models are associated with different threshold values.” (Paragraph [0092]: “As discussed above, each classifier in a plurality of classifiers may have their own threshold value. In some embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have the same threshold value. In alternate embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have different threshold values.” – The different threshold values of Rusk are the different threshold values.)

Regarding claim 5:
Rusk shows the system of claim 1 as claimed and specified above.
And Rusk shows “wherein the first, second, and third processing instances are arranged in a hierarchical manner.” (Paragraph [0075]: “In step 112, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the first mashup 110. Thus, a second mashup may be performed. The second mashup may be considered a second iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the second mashup 120 may be different from the number of classifiers employed to label the document in the first mashup 110. Further, the classifiers employed in the second mashup 120 may be different from the classifiers employed in the first mashup 110. For example, the classifiers employed in the second mashup 120 may be a second subset of classifiers, the second subset of classifiers including a neural network, as discussed above, an elastic model, as discussed above, an XGBoost model, as discussed above, an Automated machine learning model, and a Regular Expression (“RegEx”) classifier, or any combination of these or other third party models.” In paragraph [0088]: “The decision in step 114 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the second subset of classifiers, out of a plurality of classifiers in a second mashup 120, may compute a predicted label according to the classifiers' various methodologies. During the second mashup 120, the predetermined number of classifiers may be a minority of classifiers, the number of minority classifiers being at least greater than one classifier. For example, given the neural network classifier, the elastic search model, the XGBoost model, the automatic machine learning model, and the RegEx classifier, the labels of two classifiers may be used in determining whether or not the classifiers agree on a label. In some embodiments, a first number of the selected subset of classifiers may classify the document with a first classification. In alternate embodiments, a second number of the selected subset of classifiers may classify the document with a second classification.” In paragraph [0089]: “The classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 116. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 124.” And in paragraph [0095]: “In step 124, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the second mashup 120 or the first mashup 110. Thus, a third mashup 130 may be performed. The third mashup 130 may be considered a third iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the third mashup 130 may be different from the number of classifiers employed to label the document in second mashup 120 and the first mashup 110. Further, the classifiers employed in the third mashup 130 may be different from the classifiers employed in the second mashup 120 and the first mashup 110. For example, the classifiers employed in the third mashup 130 may be a third subset of classifiers, the third subset of classifiers including a neural network, as discussed above, an elastic search model, as discussed above, an XGBoost model, as discussed above, an automated machine learning model, as discussed above, and a Regular Expression (RegEx) classifier, as discussed above.” – The use of a first, then second, then third mashup of Rusk are the instances that are in hierarchical manner.)

Regarding claim 6:
Rusk shows the system of claim 1 as claimed and specified above.
And Rusk shows “wherein at least one of first and third models assigns labels that are mutually exclusive to members of the dataset.” (Paragraph [0095]: “In step 124, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the second mashup 120 or the first mashup 110. Thus, a third mashup 130 may be performed. The third mashup 130 may be considered a third iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the third mashup 130 may be different from the number of classifiers employed to label the document in second mashup 120 and the first mashup 110. Further, the classifiers employed in the third mashup 130 may be different from the classifiers employed in the second mashup 120 and the first mashup 110. For example, the classifiers employed in the third mashup 130 may be a third subset of classifiers, the third subset of classifiers including a neural network, as discussed above, an elastic search model, as discussed above, an XGBoost model, as discussed above, an automated machine learning model, as discussed above, and a Regular Expression (RegEx) classifier, as discussed above.” – The assigning of different labels from the first and third mashup of Rusk are the first and third models assigns labels that are mutually exclusive to members of the dataset.)

Regarding claim 7:
Rusk shows the system of claim 1 as claimed and specified above.
And Rusk shows “wherein at least the second classification model assigns labels that are non-mutually exclusive to members of the dataset, wherein at least one of the members of the first portion of the dataset is assigned both the second and third labels.” (Paragraph [0070]: “The decision in step 106 depends on a computing device determining whether the label is meaningful. Meaningful labels may be used to identify the pages in a single document and may include, but are not limited to: title page, signature page, first pages, middle pages, end pages, recorded pages, etc. Further, meaningful labels may be used to identify documents from one another and may include, but are not limited to: document 1, document 2, document 3, page 1, page 2, page 3, etc., where a user could use the document labels to map the digitally scanned document to a physical document. In other embodiments, the classifier may return specific labels such as: title of document 1, title of document 2, etc., where the “title” portion of the label would correspond with the title of a physical document. In some embodiments, a classifier may be unable to classify a document, returning a label that is not meaningful. In one example, a label that may be returned that is not meaningful is the label “Unknown.” In response to a label that may not be meaningful, the process proceeds to step 112.” In paragraph [0088]: “The decision in step 114 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the second subset of classifiers, out of a plurality of classifiers in a second mashup 120, may compute a predicted label according to the classifiers' various methodologies. During the second mashup 120, the predetermined number of classifiers may be a minority of classifiers, the number of minority classifiers being at least greater than one classifier. For example, given the neural network classifier, the elastic search model, the XGBoost model, the automatic machine learning model, and the RegEx classifier, the labels of two classifiers may be used in determining whether or not the classifiers agree on a label. In some embodiments, a first number of the selected subset of classifiers may classify the document with a first classification. In alternate embodiments, a second number of the selected subset of classifiers may classify the document with a second classification.” And in paragraph [0089]: “The classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 116. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 124.” – The predicting of the same labels of Rusk is the non-mutually exclusive to members of the dataset, wherein at least one of the members of the first portion of the dataset is assigned both the second and third labels.)

Regarding claims 8 and 16, taking claim 8 as exemplary:
Rusk shows the system and method of claims 1 and 12 as claimed and specified above.
And Rusk shows “wherein the at least on hardware processor is further configured to store: validated data that has been labeled based on execution of the first, second, and/or third processing instance; and retrain at least one of the first, second, and/or third processing models based on at least some of the validated data.” (Paragraph [0098]: “In addition to the features learned from the document, features from a parent document may be learned and input into the various classifiers in the third mashup 130. In some embodiments, the parent document may be a document that has been classified and labeled. For example, document 1 may successfully be classified and labeled as document 1. Document 2, immediately following document 1, may have not been successfully classified during either the first or second mashups 110 and 120 respectively. Thus, features from document 1 may be learned from document 1 to help improve the classification of document 2 in the third mashup 130. In a simplified example, page t of a book may provide a classifier context as to what is on page t+1 of a book. In other words, features from a parent document may be considered historic inputs. Historic inputs may improve the classification likelihood of the document being classified in the third mashup 130. Thus, a time series analysis may be performed by incorporating the features of the parent document. Incorporating historic data may provide improve the ability of the third mashup 130 to classify and label the document because it is assumed that, for example, pages within the same document are serially autocorrelated. In other words, there may be correlations between the same features over time.” In paragraph [0099]: “In some embodiments, the selected classifiers for the third mashup 130 may be the same as the selected classifiers in the preceding mashups. In alternate embodiments, the same selected classifiers for the third mashup 130 may be retrained because of the incorporation of historic data.” In paragraph [0100]: “For example, as described herein, RegEx classifiers operate based on pattern matching. Thus, historic data, such as successful RegExes, about a previously classified image and that image's associated classification, may help the RegEx classifier in mashup 130 to classify the document.” – The third mashup using historic data using parent documents or pages within the same document of Rusk is the retraining of a third model using validated data. Note that the claim is written in the alternative, and not all claim elements (i.e. retraining and validating of the first and second models) needs to be described for teaching by the reference to be satisfied.)

Regarding claims 10 and 17, taking claim 10 as exemplary:
Rusk shows the system and method of claims 1 and 12 as claimed and specified above.
And Rusk shows “wherein the dataset is a group of documents and the first, second, and third classification models are configured to assign labels to each document based on natural language processing performed in connection with each one of the group of documents.” (Paragraph [0070]: “The decision in step 106 depends on a computing device determining whether the label is meaningful. Meaningful labels may be used to identify the pages in a single document and may include, but are not limited to: title page, signature page, first pages, middle pages, end pages, recorded pages, etc. Further, meaningful labels may be used to identify documents from one another and may include, but are not limited to: document 1, document 2, document 3, page 1, page 2, page 3, etc., where a user could use the document labels to map the digitally scanned document to a physical document. In other embodiments, the classifier may return specific labels such as: title of document 1, title of document 2, etc., where the “title” portion of the label would correspond with the title of a physical document. In some embodiments, a classifier may be unable to classify a document, returning a label that is not meaningful. In one example, a label that may be returned that is not meaningful is the label “Unknown.” In response to a label that may not be meaningful, the process proceeds to step 112.” And in paragraph [0079]: “In some embodiments, a RegEx classifier may be used to classify the image. A RegEx classifier is a classifier that searches for, and matches, strings. Typically, RegEx classifiers apply a search pattern to alphanumeric characters, and may include specific characters or delimiters (e.g. quotes, commas, periods, hyphens, etc.) to denote various fields or breaks, wildcards or other dynamic patterns, similarity matching, etc.” – The use of RegEx classifiers on documents of Rusk is the use of NLP within documents.)

Regarding claim 11:
Rusk shows the system of claim 1 as claimed and specified above.
And Rusk shows “at least one hardware processor configured to: calculate a first threshold based on the first classification model, with assignment of the first label being additionally based on the calculated first threshold.” (Paragraph [0071]: “In addition to a classifier returning a document label, a classifier may return a confidence score. The confidence score may be used to indicate the classifier's confidence in the classifier's label classification. In some embodiments, the classifier's confidence score may be determined based on the classification. For example, as discussed herein, classifiers may employ a softmax classifier to transform a numerical output produced by a model into a classification and subsequent label. The softmax classifier may produce a classification label based on a probability distribution utilizing the predicted numerical values, over several output classes. A label may be chosen based on the probability distributions such that the label selected may be the label associated with the highest probability in the probability distribution. In one embodiment, the confidence score may be the probability, from the probability distribution, associated with the selected label.” In paragraph [0072]: “The confidence score associated with the selected label may be compared to a threshold. In response to the confidence score exceeding the threshold, the label may be considered a meaningful label and the process may proceed to step 150.” in paragraph [0129]: “System 606 may receive the stack of documents 604 from client 602 and classify the documents 604. A processor 608 may be the logic in a device that receives software instructions. A central processing unit (“CPU”) may be considered any logic circuit that responds to and processes instructions. Thus, CPUs provide flexibility in performing different applications because various instructions may be performed by the CPU. One or more algorithmic logic units (“ALU”) may be incorporated in processors to perform necessary calculations in the event an instruction requires a calculation be performed. And in paragraph [0131]: “As illustrated, processor 608 may include a neural network engine 610 and parser 612. A neural network engine 610 is an engine that utilizes the inherent parallelisms in a neural network to improve and speed up the time required for calculations.” – The use of a threshold based on probability to determine a label of Rusk is the calculating a first threshold based on the first classification model, with assignment of the first label being additionally based on the calculated first threshold.)

Regarding claim 18:
Rusk shows:
“A non-transitory computer readable storage medium configured to store computer-executable instructions for use with a computer system, the stored computer-executable instructions comprising instructions that cause the computer system to perform operations” (Paragraph [0151]: “Referring again to FIG. 7A, the computing device 700 may support any suitable installation device 716, such as a disk drive, a CD-ROM drive, a CD-R/RW drive, a DVD-ROM drive, a flash memory drive, tape drives of various formats, USB device, hard-drive, a network interface, or any other device suitable for installing software and programs. The computing device 700 may further include a storage device, such as one or more hard disk drives or redundant arrays of independent disks, for storing an operating system and other related software, and for storing application software programs such as any program or software 720 for implementing (e.g., configured and/or designed for) the systems and methods described herein. Optionally, any of the installation devices 716 could also be used as the storage device. Additionally, the operating system and the software can be run from a bootable medium.” – The disks and CD-ROM for implementing the systems and methods of Rusk is the non-transitory computer readable storage medium configured to store computer-executable instructions for use with a computer system, the stored computer-executable instructions comprising instructions that cause the computer system to perform operations.)
“comprising: storing, to electronic data storage, a plurality of classification models each configured to predict whether one or more labels applies to a respective input dataset;” (Paragraph [0068]: “The decision in step 104 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the first subset of classifiers, out of a plurality of classifiers in a first mashup 110, may compute a predicted label according to the classifiers' various methodologies. During the first mashup 110, the predetermined number of classifiers may be a majority of classifiers. For example, given the neural network classifier, the elastic model, and the XGBoost model, the labels of two classifiers may be used in determining whether or not the classifiers agree to a label. In some embodiments, a first number of the selected subset of classifiers may classify the document with a first classification. In alternate embodiments, a second number of the selected subset of classifiers may classify the document with a second classification.” In paragraph [0069]: “The classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 106. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 112.” In paragraph [0070]: “The decision in step 106 depends on a computing device determining whether the label is meaningful. Meaningful labels may be used to identify the pages in a single document and may include, but are not limited to: title page, signature page, first pages, middle pages, end pages, recorded pages, etc. Further, meaningful labels may be used to identify documents from one another and may include, but are not limited to: document 1, document 2, document 3, page 1, page 2, page 3, etc., where a user could use the document labels to map the digitally scanned document to a physical document. In other embodiments, the classifier may return specific labels such as: title of document 1, title of document 2, etc., where the “title” portion of the label would correspond with the title of a physical document. In some embodiments, a classifier may be unable to classify a document, returning a label that is not meaningful. In one example, a label that may be returned that is not meaningful is the label “Unknown.” In response to a label that may not be meaningful, the process proceeds to step 112.” Paragraph [0075]: “In step 112, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the first mashup 110. Thus, a second mashup may be performed. The second mashup may be considered a second iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the second mashup 120 may be different from the number of classifiers employed to label the document in the first mashup 110. Further, the classifiers employed in the second mashup 120 may be different from the classifiers employed in the first mashup 110. For example, the classifiers employed in the second mashup 120 may be a second subset of classifiers, the second subset of classifiers including a neural network, as discussed above, an elastic model, as discussed above, an XGBoost model, as discussed above, an Automated machine learning model, and a Regular Expression (“RegEx”) classifier, or any combination of these or other third party models.” And in paragraph [0151]: “The computing device 700 may further include a storage device, such as one or more hard disk drives or redundant arrays of independent disks, for storing an operating system and other related software, and for storing application software programs such as any program or software 720 for implementing (e.g., configured and/or designed for) the systems and methods described herein.” – The use of mashups to determine labels, and agreement of labels, for documents using classifiers of Rusk is the plurality of classification models each configured to predict whether one or more labels applies to a respective input dataset.)
“executing a plurality of processing instances that each run a different one of the plurality of classification models, wherein at least two of the processing instances take, as input for the corresponding classification model, labeled output from another of the plurality of processing instances;” (Paragraph [0075]: “In step 112, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the first mashup 110. Thus, a second mashup may be performed. The second mashup may be considered a second iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the second mashup 120 may be different from the number of classifiers employed to label the document in the first mashup 110. Further, the classifiers employed in the second mashup 120 may be different from the classifiers employed in the first mashup 110. For example, the classifiers employed in the second mashup 120 may be a second subset of classifiers, the second subset of classifiers including a neural network, as discussed above, an elastic model, as discussed above, an XGBoost model, as discussed above, an Automated machine learning model, and a Regular Expression (“RegEx”) classifier, or any combination of these or other third party models.” In paragraph [0088]: “The decision in step 114 depends on a computing device determining whether a predetermined number of classifiers predict the same label. As described herein, the second subset of classifiers, out of a plurality of classifiers in a second mashup 120, may compute a predicted label according to the classifiers' various methodologies. During the second mashup 120, the predetermined number of classifiers may be a minority of classifiers, the number of minority classifiers being at least greater than one classifier. For example, given the neural network classifier, the elastic search model, the XGBoost model, the automatic machine learning model, and the RegEx classifier, the labels of two classifiers may be used in determining whether or not the classifiers agree on a label. In some embodiments, a first number of the selected subset of classifiers may classify the document with a first classification. In alternate embodiments, a second number of the selected subset of classifiers may classify the document with a second classification.” In paragraph [0089]: “The classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 116. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 124.” And in paragraph [0095]: “In step 124, several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the second mashup 120 or the first mashup 110. Thus, a third mashup 130 may be performed. The third mashup 130 may be considered a third iteration. Several classifiers may be employed out of the plurality of classifiers. The number of classifiers employed to label the document in the third mashup 130 may be different from the number of classifiers employed to label the document in second mashup 120 and the first mashup 110. Further, the classifiers employed in the third mashup 130 may be different from the classifiers employed in the second mashup 120 and the first mashup 110. For example, the classifiers employed in the third mashup 130 may be a third subset of classifiers, the third subset of classifiers including a neural network, as discussed above, an elastic search model, as discussed above, an XGBoost model, as discussed above, an automated machine learning model, as discussed above, and a Regular Expression (RegEx) classifier, as discussed above.” – The use of different mashups to determine labels of documents and to determine agreement of labels of Rusk is the plurality of processing instances that each run a different one of the plurality of classification models, wherein at least two of the processing instances take, as input for the corresponding classification model, labeled output from another of the plurality of processing instances.)
“and as part of each executed processing instance that is running a corresponding classification model of the plurality of classification models: retrieving a threshold value that is linked to the corresponding classification model, assigning, based on the probability value generated by the corresponding classification model and the retrieved threshold value, a label to a corresponding document.” (Paragraph [0071]: “The softmax classifier may produce a classification label based on a probability distribution utilizing the predicted numerical values, over several output classes. A label may be chosen based on the probability distributions such that the label selected may be the label associated with the highest probability in the probability distribution. In one embodiment, the confidence score may be the probability, from the probability distribution, associated with the selected label.” In paragraph [0072]: “The confidence score associated with the selected label may be compared to a threshold. In response to the confidence score exceeding the threshold, the label may be considered a meaningful label and the process may proceed to step 150. In response to the confidence score not exceeding the threshold, the label selected by the classifier may not be considered meaningful. Instead, the label selected by the classifier may be replaced by, for example, the label “Unknown.” In response to the label not being a meaningful label, the process may proceed to step 112.” And in paragraph [0092]: “As discussed above, each classifier in a plurality of classifiers may have their own threshold value. In some embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have the same threshold value. In alternate embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have different threshold values.” – The classifiers with their own threshold values of Rusk is the retrieving a threshold value that is linked to the corresponding classification model. )

Regarding claim 19:
Rusk shows the non-transitory computer readable storage medium of claim 18 as claimed and specified above.
And Rusk shows “wherein the operations further comprise: calculating, for each of the plurality of classification models, a threshold value that is based on how the corresponding classification model performs.” (Paragraph [0071]: “The softmax classifier may produce a classification label based on a probability distribution utilizing the predicted numerical values, over several output classes. A label may be chosen based on the probability distributions such that the label selected may be the label associated with the highest probability in the probability distribution. In one embodiment, the confidence score may be the probability, from the probability distribution, associated with the selected label.” In paragraph [0072]: “The confidence score associated with the selected label may be compared to a threshold. In response to the confidence score exceeding the threshold, the label may be considered a meaningful label and the process may proceed to step 150. In response to the confidence score not exceeding the threshold, the label selected by the classifier may not be considered meaningful. Instead, the label selected by the classifier may be replaced by, for example, the label “Unknown.” In response to the label not being a meaningful label, the process may proceed to step 112.” And in paragraph [0092]: “As discussed above, each classifier in a plurality of classifiers may have their own threshold value. In some embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have the same threshold value. In alternate embodiments, classifiers employed in both the first mashup 110 and the second mashup 120 may have different threshold values.” – The choosing of a label and threshold based on the probability and confidence score of Rusk is the wherein the operations further comprise: calculating, for each of the plurality of classification models, a threshold value that is based on how the corresponding classification model performs.)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 4, 15, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rusk in view of Thampy et al., (US 2020/0092159 A1, hereinafter Thampy).
Regarding claim 4:
Rusk shows the system of claim 2 as claimed and specified above.
But Rusk does not appear to explicitly recite “wherein each of the different threshold values are generated by using grid search.”
However, Thampy teaches “wherein each of the different threshold values are generated by using grid search.” (Paragraph [0071]: “Another strategy that service 302 could use for purposes of reporting network anomalies can leverage percentile level thresholds.” In paragraph [0084]: To solve the optimization function and find the parameter values, threshold optimizer 410 may employ any number of optimization methods. For example, in some embodiments, threshold optimizer 410 may apply a Nelder-Mead optimization or a Limited Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization to the optimization function. Alternatively, threshold optimizer 410 may employ a brute force grid search, to find the optimal parameter values for its optimization function.” And in paragraph [0086]: “A prototype of the techniques herein was constructed and tested against real network data, to identify the KPI thresholds using a grid search approach. The results for two test networks are shown below in Tables 1-2 for the first network and Tables 3-4 for the second network” – The use of grid search to determine threshold values classifying/reporting anomalies of Thampy is the generating threshold values using grid search.)
Rusk and Thampy are analogous in the arts because both Rusk and Thampy describe classification using thresholds.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of both Rusk and Thampy before him or her, to modify the teachings of Rusk to include the teachings of Thampy in order to include an additional determination of threshold for classification and thereby expand the capabilities and accuracy threshold classification of Rusk using the grid search of Thampy.

Regarding claim 15:
Rusk shows the method of claim 14 as claimed and specified above.
But Rusk does not appear to explicitly recite “wherein each of the different threshold values are generated by using grid search.”
However, Thampy teaches “wherein each of the different threshold values are generated by using grid search.” (Paragraph [0071]: “Another strategy that service 302 could use for purposes of reporting network anomalies can leverage percentile level thresholds.” In paragraph [0084]: To solve the optimization function and find the parameter values, threshold optimizer 410 may employ any number of optimization methods. For example, in some embodiments, threshold optimizer 410 may apply a Nelder-Mead optimization or a Limited Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization to the optimization function. Alternatively, threshold optimizer 410 may employ a brute force grid search, to find the optimal parameter values for its optimization function.” And in paragraph [0086]: “A prototype of the techniques herein was constructed and tested against real network data, to identify the KPI thresholds using a grid search approach. The results for two test networks are shown below in Tables 1-2 for the first network and Tables 3-4 for the second network” – The use of grid search to determine threshold values classifying/reporting anomalies of Thampy is the generating threshold values using grid search.)
Rusk and Thampy are analogous in the arts because both Rusk and Thampy describe classification using thresholds.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of both Rusk and Thampy before him or her, to modify the teachings of Rusk to include the teachings of Thampy in order to include an additional determination of threshold for classification and thereby expand the capabilities and accuracy threshold classification of Rusk using the grid search of Thampy.

Regarding claim 20:
Rusk shows the non-transitory computer readable storage medium of claim 19 as claimed and specified above.
But Rusk does not appear to explicitly recite “wherein the operations further comprise: as part of calculating the threshold value for each classification model, executing a grid search optimization process.”
However, Thampy teaches “wherein the operations further comprise: as part of calculating the threshold value for each classification model, executing a grid search optimization process.” (Paragraph [0071]: “Another strategy that service 302 could use for purposes of reporting network anomalies can leverage percentile level thresholds.” In paragraph [0084]: To solve the optimization function and find the parameter values, threshold optimizer 410 may employ any number of optimization methods. For example, in some embodiments, threshold optimizer 410 may apply a Nelder-Mead optimization or a Limited Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization to the optimization function. Alternatively, threshold optimizer 410 may employ a brute force grid search, to find the optimal parameter values for its optimization function.” And in paragraph [0086]: “A prototype of the techniques herein was constructed and tested against real network data, to identify the KPI thresholds using a grid search approach. The results for two test networks are shown below in Tables 1-2 for the first network and Tables 3-4 for the second network” – The use of grid search to determine threshold values classifying/reporting anomalies of Thampy is the generating/calculating threshold values using grid search for each classification model.)
Rusk and Thampy are analogous in the arts because both Rusk and Thampy describe classification using thresholds.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of both Rusk and Thampy before him or her, to modify the teachings of Rusk to include the teachings of Thampy in order to include an additional determination of threshold for classification and thereby expand the capabilities and accuracy threshold classification of Rusk using the grid search of Thampy.

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rusk in view of Chowdhary et al., (US 2021/0158041 A1, hereinafter Chowdhary).
Regarding claim 9:
Rusk shows the system of claim 1 as claimed and specified above.
But Rusk does not appear to explicitly recite “wherein a recall metric for labeling the dataset by using at least the first, second, or third processing instances is at least 0.90.”
However, Chowdhary teaches “wherein a recall metric for labeling the dataset by using at least the first, second, or third processing instances is at least 0.90.” (Chowdhary [0215] Reference will now be made to SVM Training according to one or more embodiments. The best hyper-parameters for the SVM found in the grid search (of an example) are listed in Table III(B) with their performance metrics on the test data. Accuracy represents the probability of a sample being correctly labeled. Precision represents the probability of a predicted positive (t.sub.p+f.sub.p) sample being a true positive (t.sub.p), whereas recall is the probability of a true positive sample (t.sub.p) being identified amongst all positive samples (t.sub.p+f.sub.n), and Fl-score is the harmonic mean of precision and recall. Their formulae are given in Equation (8B). ...Math. m = k / 2 + 1 k .Math. ( k m ) .Math. P m ( 1 - P ) k - m , ... where P is the accuracy for single frame. On average a corn stalk remains in the ROI for about 10 frames, so a value of 5 is chosen for k. Then the accuracy of recognition is increased to as high as 99.69%. TABLE-US-00010 TABLE III(B) Best hyper-parameters found in the grid search and their corresponding performance metrics on the test data V4 VT R2 kernel rbf rbf rbf C 10 100 100 γ 0.001 0.001 0.001 accuracy 91.75% 94.38% 92.79% percision 0.91 0.91 0.91 recall 0.95 0.94 0.95 F1-score 0.95 0.93 0.94” – The use of recall using grid search for accuracy recognition of 0.95 is the recall for labeling above 0.91.)
Rusk and Chowdhary are analogous in the arts because both Rusk and Chowdhary describe classification.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of both Rusk and Chowdhary before him or her, to modify the teachings of Rusk to include the teachings of Chowdhary in order to include an additional precision for determining parameters for classification of Rusk using the grid search and recall of Chowdhary and thereby increase the accuracy and capabilities of Rusk.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Raveh (US 2021/0295213 A1), part of the prior art made of record, teaches classifications with labels, the use of thresholds, and the use of multiple classification models, of claims 1, 12 and 18 in paragraph [0008] through a classification model used to predict a label of an image and to also use multiple classification models in paragraph [0043] to determine labels.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHANE D WOOLWINE whose telephone number is (571)272-4138. The examiner can normally be reached M-F 9:30-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHANE D. WOOLWINE
Primary Examiner
Art Unit 2124



/SHANE D WOOLWINE/Primary Examiner, Art Unit 2124