DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-6, 8-10, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Basu et al., hereinafter referred to as Basu (US 2015/0044659 A1 – already of record) in view of Brownlee (“A Gentle Introduction to k-fold Cross-Validation”, Pgs. 1-19, Pub. 10/19/2018; URL: http://web.archive.org/web/20181019052743/https://machinelearningmastery.com/k-fold-cross-validation/). 

	As per claim 1, Basu discloses a method for creating compact example subsets for intent classification (Basu: Abstract.), by a processor (Basu: Fig. 1 & Para. [0044] disclose a computer 102 that includes a processor executing instructions stored in a memory device.), comprising: 
receiving a set of content used for training an intent classifier (Basu: Fig. 1 & Paras. [0021], [0044] disclose the computer’s processor receives short answers to particular questions [i.e., a set of content] used for training an intent classifier.); 
separating entries within the set of content into a first subset and a second subset (Basu: Paras. [0021], [0050]-[0051], [0063] disclose grouping the short answers into a first and second cluster or subcluster.); 
performing a cross-validation operation on the first and second subsets to identify a correctly labeled portion and an incorrectly labeled portion of the set of content (Basu: Paras. [0053], [0070] disclose performing a cross-validation operation on the grouped clusters which can be displayed as a report that identifies correctly labeled and incorrectly labeled portions.), wherein the cross-validation operation further comprises: 
performing an initial training of the intent classifier utilizing the first subset to form a first subset trained classifier (Basu: Para. [0070] discloses for example, a ten-fold cross-validation in which training was performed on grouping labels for nine of the ten training questions [i.e., forms first subset trained].); 
utilizing the first subset trained classifier against the second subset to identify a correctly labeled subset and an incorrectly labeled subset of the second subset (Basu: Para. [0070] discloses the trained 9 folds first subset is tested on the tenth [i.e., second subset, which is either a test or validation subset].); 
performing a secondary training of the intent classifier, subsequent to the initial training, to form a second subset trained classifier (Basu: Para. [0070] discloses the tenth test subset is used as a secondary training to form a second subset trained classifier.); and 
(Basu: Paras. [0053], [0070] disclose performing a cross-validation operation on the grouped clusters to identify a correctly labeled subset and an incorrectly labeled subset of the first subset.); and 
forming a reduced content used for performing a final training of the intent classifier by combining (i.e., clustering) a first number of the entries from the correctly labeled portion and a second number of the entries from the incorrectly labeled portion of the set of content (Basu: Paras. [0053], [0070] disclose the grouped clusters are displayed as a report that forms a reduced content used for performing a final training of the intent classifier.).
However Basu does not explicitly disclose “… utilizing the second subset trained classifier against the first subset to identify a … subset of the first subset …”.
Further, Brownlee is in the same field of endeavor and teaches utilizing the second subset trained classifier (i.e., Fold1 + Fold2, with tested Fold 3 make up the second subset trained classifier) against the first subset (i.e., the first subset can be (Fold1 + Fold2) or (Fold2 + Fold3) or (Fold1 + Fold3)) to identify a subset of the first subset (Brownlee: Pg. 5, under “Worked Example” discloses the concept of k-fold cross-validation, wherein multiple trained models or classifiers are generated based on the number of k-folds. For example, the K-Fold class can be used directly in order to split up a dataset prior to modeling such that all models will use the same data splits. In other words, data from the second subset trained classifier/model is used against data of the first subset to identify or validate subset data of the first subset.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Basu and Brownlee before him or her, to modify the cross-validation method of Basu to include the utilizing second subset trained classifier feature as described in Brownlee. The motivation for doing so would have been to improve machine-learning accuracy by providing a more robust training method.   

	As per claim 3, Basu discloses the method of claim 1, further comprising organizing the correctly labeled subset of the first subset and the correctly labeled subset of the second subset into the correctly labeled portion; and organizing the incorrectly labeled subset of the first subset and the incorrectly labeled subset of the second subset into the incorrectly labeled portion (Basu: Paras. [0053], [0070] disclose the ten-fold cross-validation operation organizes the grouped clusters into the correctly labeled and incorrectly labeled portions.).

As per claim 5, Basu discloses the method of claim 1, wherein the reduced content is selected utilizing an anti-clustering procedure (Basu: Para. [0073] discloses any of a number of different clustering techniques may be used to group the items into clusters and subclusters.).






As per claim 6, Basu discloses the method of claim 5, wherein the anti-clustering procedure further comprises: 
utilizing a vector representation of each of the entries of the set of content (Basu: Para. [0068] discloses vector representation of the content.); and 
utilizing a k-means clustering process on sets of vectors of the vector representation to choose the entries from the correctly labeled portion and the entries from the incorrectly labeled portion (Basu: Paras. [0068], [0070], [0073] disclose any of a number of different clustering techniques may be used to group the items into clusters and subclusters which are consistent with vector representation of the content.).

As per claim 8, Basu discloses a system for creating compact example subsets for intent classification (Basu: Abstract.), comprising: 
a processor executing instructions stored in a memory device (Basu: Fig. 1 & Para. [0044] disclose a computer 102 that includes a processor executing instructions stored in a memory device.); 
wherein the processor (Basu: Fig. 1):
receives a set of content used for training an intent classifier (Basu: Fig. 1 & Paras. [0021], [0044] disclose the computer’s processor receives short answers to particular questions [i.e., a set of content] used for training an intent classifier.); 
separates entries within the set of content into a first subset and a second subset (Basu: Paras. [0021], [0050]-[0051], [0063] disclose grouping the short answers into a first and second cluster or subcluster.); 
performs a cross-validation operation on the first and second subsets to identify a correctly labeled portion and an incorrectly labeled portion of the set of content (Basu: Paras. [0053], [0070] disclose performing a cross-validation operation on the grouped clusters which can be displayed as a report that identifies correctly labeled and incorrectly labeled portions.) wherein the cross-validation operation further comprises: 
performing an initial training of the intent classifier utilizing the first subset to form a first subset trained classifier (Basu: Para. [0070] discloses for example, a ten-fold cross-validation in which training was performed on grouping labels for nine of the ten training questions [i.e., forms first subset trained].); 
utilizing the first subset trained classifier against the second subset to identify a correctly labeled subset and an incorrectly labeled subset of the second subset (Basu: Para. [0070] discloses the trained 9 folds first subset is tested on the tenth [i.e., second subset, which is either a test or validation subset].); 
performing a secondary training of the intent classifier, subsequent to the initial training, to form a second subset trained classifier (Basu: Para. [0070] discloses the tenth test subset is used as a secondary training to form a second subset trained classifier.); and 
(Basu: Paras. [0053], [0070] disclose performing a cross-validation operation on the grouped clusters to identify a correctly labeled subset and an incorrectly labeled subset of the first subset.); and 
forms a reduced content used for performing a final training of the intent classifier by combining (i.e., clustering) a first number of the entries from the correctly labeled portion and a second number of the entries from the incorrectly labeled portion of the set of content (Basu: Paras. [0053], [0070] disclose the grouped clusters are displayed as a report that forms a reduced content used for performing a final training of the intent classifier.).
However Basu does not explicitly disclose “… utilizing the second subset trained classifier against the first subset to identify a … subset of the first subset …”.
Further, Brownlee is in the same field of endeavor and teaches utilizing the second subset trained classifier (i.e., Fold1 + Fold2, with tested Fold 3 make up the second subset trained classifier) against the first subset (i.e., the first subset can be (Fold1 + Fold2) or (Fold2 + Fold3) or (Fold1 + Fold3)) to identify a subset of the first subset (Brownlee: Pg. 5, under “Worked Example” discloses the concept of k-fold cross-validation, wherein multiple trained models or classifiers are generated based on the number of k-folds. For example, the K-Fold class can be used directly in order to split up a dataset prior to modeling such that all models will use the same data splits. In other words, data from the second subset trained classifier/model is used against data of the first subset to identify or validate subset data of the first subset.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Basu and Brownlee before him or her, to modify the cross-validation method of Basu to include the utilizing second subset trained classifier feature as described in Brownlee. The motivation for doing so would have been to improve machine-learning accuracy by providing a more robust training method.
As per claims 10, 12, 17, the claims recite analogous limitations to claims 3, 5 above, and is/are therefore rejected on the same premise.

As per claims 13, 20 the claims recite analogous limitations to claims 4, 6 above, and is/are therefore rejected on the same premise.

As per claim 15, the claim recites analogous limitations to claims 1 & 8 above, and
is/are therefore rejected on the same premise.

Claims 4, 11, 13, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Basu in view of Brownlee in further view of Dalyac et al., hereinafter referred to as Dalyac (US 2018/0300576 A1 – already of record). 

	As per claim 4, Basu discloses the method of claim 1, 
	However Basu-Brownlee do not explicitly disclose “… wherein, commensurate with the combining, the first number of entries is larger than the second number of entries.”
Further, Dalyac is in the same field of endeavor and teaches wherein, commensurate with the combining, the first number of entries is larger than the second number of entries (Dalyac: Para. [0104] discloses the scaling of the circles representative of the clusters which are representative of the datasets can be optimized and/or adjusted by a user. In other words, a first number of entries of the dataset can be larger or smaller the second number of entries.). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Basu-Brownlee and Dalyac before him or her, to modify the clustering algorithm of Basu-Brownlee to include the number of entries size feature as described in Dalyac. The motivation for doing so would have been to improve efficient use of computational resources by providing representative data via dimension reduction. 

As per claims 11, 18 the claims recite analogous limitations to claim 4 above, and is/are therefore rejected on the same premise.


Claims 7, 14, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Basu in view of Brownlee in further view of Kummamuru (US 2008/0154579 A1 – already of record).

	As per claim 7, Basu discloses the method of claim 1, 
	However Basu does not explicitly disclose “… wherein the content comprises utterances received from a conversational corpus.”
	Further, Kummamuru discloses wherein the content comprises utterances received from a conversational corpus (Kummamuru: Para. [0015] discloses performing clustering of the corpus of call transcripts [i.e., conversational corpus] in order to group related text segments.).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Basu and Kummamuru before him or her, to modify the clustering system of Basu to include the conversational corpus utterances feature as described in Kummamuru. The motivation for doing so would have been to improve analysis of datasets by providing methods that effectively recognize text segments.  

As per claims 14 & 21, the claims recite analogous limitations to claim 7 above, and is/are therefore rejected on the same premise.















Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure and can be viewed in the list of cited references.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PEET DHILLON whose telephone number is (571)270-5647. The examiner can normally be reached M-F: 5am-1:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath V. Perungavoor can be reached on 571-272-7455. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.









/PEET DHILLON/Primary Examiner, Art Unit 2488                                                                                                                                                                                                        Date: 05-24-2022