DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This application is a CON of 16/379,978 filed on 04/10/2019 (now Patent No. 10679100), which is a CON of 16/143,773 filed on 09/27/2018 (now Patent No. 10303978), which claims benefit of 62/648,318 filed on 03/26/2018.
Claim 1 is pending and has been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 07/21/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities: Specification paragraph [0085] ends with “represented as follows:”, but does not appear to present any further information.  
Appropriate correction is required.

Claim Interpretation
The preamble of claim 1 recites “A system for intelligently identifying machine learning training data for implementing a machine learning-based dialogue service” (emphasis added). Examiner notes that while the body of the claim describes calculating an efficacy metric of a corpora of training data and identifying whether to train a machine learning classifier based on the efficacy metric, the body of 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Instant claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over reference claim 1 of U.S. Patent No. US 10,679,100 B2 (reference patent). Although the claims at issue are not identical, they are not patentably distinct from each other because instant claim 1 (the claim being examined) is “generic to a species or sub-genus claimed in a conflicting patent or application, i.e., the entire scope of the reference claim falls within the scope of the examined claim.”  See MPEP 804(II)(B)(1). 


U.S. Patent No. US 10,679,100 B2 (reference patent)
Claim 1:

A system for intelligently identifying machine learning training data for implementing a
machine learning-based dialogue service, the system comprising:

one or more sources of machine learning training data;

one or more hardware computing servers implementing a machine learning-based
dialogue service that:

calculates, using the one or more hardware computing servers, one or more efficacy metrics of a corpora of raw machine learning training data; and

identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data.
Claim 1:

A system for intelligently identifying machine learning training data for implementing a machine learning-based dialogue service, the system comprising:

one or more sources of machine learning training data;

one or more hardware computing servers implementing a machine learning-based dialogue service that:

constructs a corpora of machine learning test corpus that comprise a plurality of historical queries and/or historical commands test-sampled from one or more production logs of a deployed dialogue system, the corpora of machine learning test corpus relates to a baseline set of user queries and/or user commands used in calculating efficacy metrics of raw machine learning data;

configures one or more training data sourcing parameters to source a corpora of raw machine learning training data from the one or more sources of machine learning training data;

obtains, from the one or more sources of machine learning training data, the corpora of raw machine learning training data based on the one or more training data sourcing parameters;

calculates, using the one or more hardware computing servers, efficacy metrics of the corpora of raw machine learning training data, 

wherein calculating the efficacy metrics includes:

using the corpora of machine learning test corpus to calculate a coverage metric value that indicates a degree to which the corpora of raw machine learning training data represents possibilities of expressing a target classification 
calculating the coverage metric value for each of a plurality of distinct corpus of machine learning training data within the corpora of raw machine learning training data, wherein calculating the coverage metric value for each of the plurality of distinct corpus of machine learning training data includes:
[i] selecting a subject test corpus datum from within a subject distinct machine learning test corpus of the corpora of machine learning test corpus;
[ii] constructing a plurality of diversity pairwise comprising the subject test corpus datum and each training data within a subject distinct corpus of machine learning training data of the corpora of raw machine learning training data;
[iii] calculating a semantic similarities value of each of the plurality of diversity pairwise involving the subject test corpus training datum;
[iv] identifying a minimum diversity metric value for the subject test corpus datum based on the semantic similarities value of each of the plurality of diversity pairwise involving the subject test corpus training datum;
[v] calculating a minimum diversity metric value for each remaining test corpus datum within the subject distinct machine learning test corpus; and
[vi] calculating the coverage metric value for the subject distinct corpus of machine learning training data based on the minimum diversity metric value for the subject test corpus datum and for each of the remaining test corpus datum of the subject distinct machine learning test corpus;
calculating the coverage metric value for the corpora of raw machine learning training data based on the coverage metric value for each of the plurality of distinct corpus of machine learning training data within the corpora;

identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system using the corpora of raw machine learning training data based on whether the calculated coverage metric value satisfies a predetermined coverage metric threshold.


Reference claim 1 discloses that “identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system using the corpora of raw machine learning training data based on whether the calculated coverage metric value satisfies a predetermined coverage metric threshold”, this feature anticipates instant claim 1’s “identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data” because in reference claim 1, it discloses that calculating “efficacy metrics” means performing a plurality of steps to calculate the “coverage metric” (or in other words, calculating a coverage metric is a specific calculation of efficacy metric), therefore reference claim 1’s identifying of whether to train the classifier based on a coverage metric reads on instant claim 1’s identifying whether to train the classifier based on efficacy metric.

Instant claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over reference claim 1 of U.S. Patent No. US 10,303,978 B1 (reference patent). Although the claims at issue are not identical, they are not patentably distinct from each other because instant claim 1 (the claim being examined) is “generic to a species or sub-genus claimed in a conflicting patent or application, i.e., the entire scope of the reference claim falls within the scope of the examined claim.”  See MPEP 804(II)(B)(1). 



U.S. Patent No. US 10,303,978 B1 (reference patent)
Claim 1:

A system for intelligently identifying machine learning training data for implementing a
machine learning-based dialogue service, the system comprising:

one or more sources of machine learning training data;

one or more hardware computing servers implementing a machine learning-based
dialogue service that:

calculates, using the one or more hardware computing servers, one or more efficacy metrics of a corpora of raw machine learning training data; and

identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data.
Claim 1:

A system for intelligent formation and acquisition of machine learning training data for implementing an artificially intelligent dialogue system, the system comprising:

one or more remote sources of machine learning training data;

one or more hardware computing servers implementing an artificially intelligent dialogue platform that:

constructs a corpora of machine learning test corpus that comprise a plurality of historical queries and/or historical commands test sampled from one or more production logs of a deployed dialogue system;

configures one or more training data sourcing parameters to source a corpora of raw machine learning training data from one or more remote sources of machine learning training data;
transmits, via one or more communication networks, the one or more training data sourcing parameters to the one or more remote sources of machine learning training data and collects, via the one or more communication networks, the corpora of raw machine learning training data;

calculates, using the one or more hardware computing servers, one or more efficacy metrics of the corpora of raw machine learning training data, wherein calculating the one or more efficacy metrics includes calculating one or more of a coverage metric value and a diversity metric value of the corpora of raw machine learning training data;

identifies whether to train at least one machine learning classifier of the artificially intelligent dialogue system based on one or more the coverage metric value and the diversity metric value of the corpora of raw machine learning;

responsive to training the at least one machine learning classifier using the corpora of raw machine learning training data, deploys the at least one machine learning classifier into a live implementation of the artificially intelligent dialogue system.


As indicated in the table above, all the claimed features in instant claim 1 are disclosed in reference claim 1 (see underlined elements). While the two claims are not identical, instant claim 1 is anticipated by reference claim 1. It is evident from the table that all limitations in instant claim 1 are linguistically comparable to the underlined limitations in reference claim 1 except for the last limitation of instant claim 1, for which explanation is provided below:
Reference claim 1 discloses that “identifies whether to train at least one machine learning classifier of the artificially intelligent dialogue system based on one or more the coverage metric value and the diversity metric value of the corpora of raw machine learning”, this feature anticipates instant claim 1’s “identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data” because in reference claim 1, it discloses that calculating “efficacy metrics” means performing a plurality of steps to calculate the “coverage metric” and “diversity metric” (or in other words, calculating a coverage metric and a diversity metric are specific calculations of efficacy metric), therefore reference claim 1’s identifying of whether to train the classifier based on a coverage metric and diversity metric reads on instant claim 1’s identifying whether to train the classifier based on efficacy metric.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the machine learning-based dialogue system" in line 8-9.  There is insufficient antecedent basis for this limitation in the claim. It is recommended that "the machine learning-based dialogue system" is amended to “the machine learning-based dialogue service” (interpretation for examination purposes).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 


Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a system for intelligently identifying machine learning training data for implementing a machine learning-based dialogue service, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
that: calculates,..., one or more efficacy metrics of a corpora of raw machine learning training data; and
identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of generic computer components language (“using the one or more hardware computing servers”) and extra-solution activity language (“the system comprising: one or more sources of machine learning training data;”; “one or more hardware computing servers implementing a machine learning-based dialogue service”). The above limitations in the context of this claim encompass calculating efficacy metrics of a corpora of machine learning data, which corresponds to mathematical calculations of specific metrics associated with data, and identifying whether to train a classifier based on the calculated metric, which corresponds to evaluating whether to perform an action or not based on a metric, a mental step that can be performed in the human mind.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “using the one or more hardware computing servers”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “the system comprising: one or more sources of machine learning training data” is describing that the system has one or more sources of data, which is analogous to the insignificant extra-solution activity of storing and retrieving information in memory. The additional element of “one or more hardware computing servers implementing a machine learning-based dialogue service” is describing the machine learning-based dialogue service is implemented by hardware servers, which is analogous to an insignificant extra-solution activity of determining the type of hardware is used to implement the service. Limitations that amount to insignificant extra-solution activity cannot integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the additional elements of “the system Goth, III et al. (US 2016/0253596 A1) in pg. 3 [0036]: “Typically, each client or server machine is a data processing system such as illustrated in FIG. 2 comprising hardware and software, and these entities communicate with one another over a network, such as the Internet, an intranet, an extranet, a private network, or any other communications medium or link” and pg. 4 [0041]: “The machine learning model used in the Q&A system typically is trained with question and answer "pairs."” (emphasis added) teach that it is typical (routine or conventional) that machine-learning based question and answer systems (machine-learning based dialogue service) are implemented by hardware computing servers (also see pg. 2 [0024]-[0025]). Therefore, the claim is not patent eligible.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Allen et al. (US 2016/0048514 A1) in view of Goth, III et al. (US 2016/0253596 A1).
Regarding Claim 1,
Allen et al. teaches A system for intelligently identifying machine learning training data for implementing a machine learning-based dialogue service (as discussed above, for the purpose examination, the claim preamble should not be considered as having significance for claim construction; nevertheless Allen et al. in Fig. 1 and 6 teach a computer-based system that intelligently identifies (see also pg. 8 [0064]) new training answers to the training questions using the updated information source, which corresponds to identifying machine learning training data (see Fig. 6 steps 604-605); pg. 4-5 [0042] teaches using training data to create or refine machine learning models for a dialogue service (pg. 4 [0038])), 
the system comprising: one or more sources of machine learning training data (pg. 8 [0064] “Per block 604, the selected updated information source may be ingested by the QA system. The documents of the updated information source may be structured or unstructured data. In some embodiments, hundreds, thousands, or millions or more of documents can be ingested by the system as part of one new information source ingestion and these documents may be ingested at substantially the same time (i.e., during a single ingestion cycle) or may be ingested at different times” teaches that one or more sources (“hundreds...of documents”) are ingested as sources of machine learning training data, also see Fig. 6 steps 604-605);
one or more hardware computing servers implementing a machine learning-based dialogue service that (pg. 4-5 [0042] teaches using training data to create or refine machine learning models for a Question-Answer System with dialogue component (see pg. 4 [0038]), which teaches a machine learning-based dialogue service; pg. 2 [0022]: “the computer systems may include servers, desktops, laptops, and hand-held devices. In addition, the answer module 132 may include one or more modules or units to perform the various functions of embodiments as described below (e.g., receiving an input question, assigning the input question to a question category, determining a set of candidate answers, comparing confidence scores and user feedback to confidence criteria, etc.), and may be implemented by any combination of any quantity of software and/or hardware modules or units” teaches the computer utilized to implement the system can be computing servers that contain hardware modules, thus rendering the system has hardware computing servers; also see Fig. 6):
calculates, using the one or more hardware computing servers, one or more efficacy metrics of a corpora of raw machine learning training data (pg. 2 [0022] teaches using hardware computing servers; Fig. 6 Step 605: “Identify new training answers to the training questions using the updated information source” and Fig. 6 Step 606: “Calculate confidence scores and accuracy rates for the new training answers” teach calculating confidence scores and accuracy rates (correspond to efficacy metrics) of a corpora of training answers (correspond to a corpora of raw machine learning training data) identified from updated information source; as the training answers (training data) are identified directly from the source, these answers/data are considered raw training data).
Allen et al. does not appear to explicitly teach identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data.
However, Goth, III et al. teaches identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data (pg. 1 [0009]: “an active learning framework is operative to identify informative questions that should be added to existing question-answer (Q&A) pairs that comprise a training dataset for a learning model. In this approach, the question-answer pairs (that will be labeled as "true" or "false" and used in the learning model) are automatically selected from a larger pool of unlabeled data... After the questions are labeled, an additional re-sampling is performed to assure high quality of the training data. Preferably, and with respect to a particular question, this additional re-sampling is based on a distance measure between correct and incorrect answers” teaches curating labeled questions (correspond to a corpora of raw machine learning data because the labeled question-answer pairs represent collections of information) by re-sampling based on a distance measure (corresponds to efficacy metric); pg. 6 [0064]-[0065]: “Once the selected questions are labeled, preferably an additional re-sampling is then performed. This is step 522. Generally, the purpose of this re-sampling is to eliminate questions that are not likely to contribute to learning (and thus further assure high quality of the training data). This is achieved during re-sampling by randomly selecting a subset from the majority class (step 518), and applying standard classifiers (e.g., decision tree, naive onto the re-balanced data sets comparing the accuracies, preferably based on a distance measure....At step 524, the remaining selected questions are then added to the training set T and, at step 526, used to train a new classifier. This completes the processing” teaches the re-sampling includes eliminating questions that do not meet the distance measure (efficacy metric) and training the machine learning classifier with remaining questions that meet the distance measure pg. 7 [0073]: “The subject matter described herein has significant advantages over the prior art. It facilitates the selection of highly discriminative questions from unlabeled data. The questions can then be used to train a statistical machine learning model, for example, in a question-answering system” teaches the classifier in the present systems is a machine learning classifier within a machine-learning based question-answering (dialogue) system; also see pg. 7 [0072]).
Allen et al. and Goth, III et al. are analogous art to the claimed invention because they are directed to curating training data for machine learning models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate identifies whether to train at least one machine learning classifier of the machine learning-based dialogue system based on the one or more efficacy metrics of the corpora of raw machine learning training data as taught by Goth, III et al. to the disclosed invention of Allen et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[facilitate] the selection of highly discriminative questions from unlabeled data” that “can then be used to train a statistical machine learning model...in a question-answering system”, which is an “approach [that provides] significantly improved results as compared to existing solutions (to the class imbalance problem)” (Goth, III et al. pg. 7 [0073]).





Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Dutta et al. (US 10,339,470 B1) teaches utilizing a classification engine to improve a classification model in which the classification engine may derive a statistical model based on a synthetic data set
Bastide et al. (US 2017/0004204 A1) teaches identifying changes, within a corpus of information, to answers to questions provided within the corpus of information.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact 






/YING YU CHEN/               Examiner, Art Unit 2125