DETAILED ACTION
Claims 1-6 are pending in the present application and are under examination on the merits. This communication is the first action on the merits (FAOM).
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
Applicant filed an Information Disclosure Statements (IDS) on 8/3/2020 and 4/11/2021. Each of these filings are in compliance with 37 C.F.R. 1.97.
As required by M.P.E.P. 609(C), the applicant's submission of the Information Disclosure Statements is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P. 609(C), copies of the respective PTOL -1449s initialed and dated by the examiner are attached to the instant office action.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference numerals not mentioned in the description: 527 in Fig. 5. Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-6 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication Number 2019/0354544 to Hertz et al. (hereafter referred to as Hertz) in view of U.S. Patent Application Publication Number 2018/0144243 to Hsieh et al. (hereafter referred to as Hsieh).
As per claim 1, Hertz teaches: 
A system for creation and expansion of high quality data set collections for training of machine learning algorithms via crowdsourced curation, comprising: a reputation scoring engine comprising a first plurality of programming instructions stored in a memory of, and operating on a processor of, a computing device, wherein the first plurality of programming instructions, when operating on the processor, cause the computing device to: receive a data set (Paragraph Number [0065] teaches the system 10 includes a server device 12 configured to include a processor 14, such as a central processing unit (‘CPU’), random access memory (RAM′) 16, one or more input-output 
score a data entry within the data set, wherein the score is calculated from a plurality of scoring metrics (Paragraph Number [0066] teaches the non-volatile memory 20 is configured to include an identification module 24 for identifying entities from one or more sources. The entities identified may include, but are not limited to, organizations, people, products, industries, geographies, commodities, financial indicators, economic indicators, events, topic codes, subject codes, unique identifiers, social tags, industry terms, general terms, metadata elements, and classification codes. An association module 26 is also provided for computing a significance score for an association between entities, the significance score being an indication of the level of significance a second entity to a first entity. Paragraph Number [0067] teaches a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to, a cluster module 30 for clustering (e.g., 
sum all of the data entry scores within the data set combining to form an overall reputation score  (Paragraph Number [0026] teaches identify a first entity having a relationship or an association with a second entity, apply a plurality of relationship or association criteria to the relationship/association, weight each of the criteria based on defined weight values, and compute a significance score for the first entity with respect to the second entity based on a sum of a plurality of weighted criteria values.  The system identifies text representing or signifying a connection between two or more entities and in particular in the context of a supply chain environment. Paragraph Number [0095] teaches once the plurality of criteria are applied to the first association, at step 50, the association module 26 weights each of the plurality of criteria values assigned to the first association. In one embodiment, the association module 26 multiplies a user-configurable value associated with each of the plurality of criteria with each of the plurality of criteria values, and then sums the plurality of multiplied criteria values to compute a significance score. As discussed previously, the significance score indicates a level of significance of the second entity to the first entity. In another embodiment, the association module 26 multiplies a pre-defined system value associated with each of the plurality of criteria, and then sums the plurality of multiplied criteria values to compute the significance score).
flag an erroneous data entry which may not be resolved through the machine learning algorithm (Paragraph Number [0169] teaches the algorithm is precision-oriented to avoid introducing too many false positives into the knowledge graph. In one manner of 
compare the overall reputation score with a numerical threshold for reputability; (Paragraph Number [0031] teaches the system may further comprise a machine learning-based algorithm adapted to detect relationships between entities in an unstructured text document. The classifier may predict a probability of a relationship based on an extracted set of features from a sentence. The extracted set of features may include context-based features comprising one or more of n-grams and patterns. The system may further comprise wherein updating the Knowledge Graph is based on the aggregate evidence score satisfying a threshold value).
send the flagged erroneous data entry and the data sets not meeting the threshold for reputability to a verification queue (Paragraph Number [0086] teaches the association module 26 may apply the validation criteria to the first association. In one embodiment, the association module 26 determines whether the first entity and the second entity co-exist as an entity pair in the set of entity pairs 40. As described previously, each of the entity pairs defined in the set of entity pairs 40 may be previously identified as having a relationship with one another. Based on the determination, the association module 26 
store the data sets that meet the threshold for reputability to a data store as a reputable data set collection (Paragraph Number [0089] teaches the association module 26 applies interestingness criteria to the first association as determined by a first portion of the set of documents and/or a first portion of a structured data store. The first portion is associated with a first time interval. The association module 26 then applies interestingness criteria to the first association as determined by a second portion of the set of documents and/or a second portion of the structured data store. Paragraph Number [0156] teaches while performing the above-discussed services, with our RDF model, we store our knowledge graph 712, i.e., the recognized entities and their relations, in an inverted index for efficient retrieval with keyword queries (i.e., the Keyword Search Service 716 in FIG. 7) and also in a triple store in order to support complex query needs).
a verification queue comprising a second plurality of programming instructions stored in the memory of, and operating on the processor of, the computing device, wherein the second plurality of programming instructions, when operating on the processor, cause the computing device to: receive the flagged erroneous data entry and the data sets not meeting the threshold for reputability (Paragraph Number [0065] teaches the system 10 includes a server device 12 configured to include a processor 14, such as a central processing unit (‘CPU’), random access memory (RAM′) 16, one or more input-output devices 18, such as a display device (not shown) and keyboard (not shown), and non-volatile memory 20, all of which are interconnected via a common bus 22 and controlled by the processor 14. Paragraph Number [0066] teaches the non-volatile 
Hertz teaches gathering data sets and determining if the data reaches a validity threshold for use, but does not explicitly teach a supervised machine learning process to validate data through human curation to be used in iterative machine learning training which is taught by the following citations from Hsieh:
assign the data in the verification queue to a data steward for human curation (Paragraph Number [0056] teaches semi-supervised learning is a deep learning training method in which the machine is provided a small amount of classified data from human sources compared to a larger amount of unclassified data available to the machine. 
send the curated and resolved data back to the reputation scoring engine for an additional iteration (Paragraph Number [0056] teaches semi-supervised learning is a deep learning training method in which the machine is provided a small amount of classified data from human sources compared to a larger amount of unclassified data available to the machine. Paragraph Number [0151] teaches certain examples provide 
Both Hertz and Hsieh are directed to data analysis through machine learning processes. Hertz teaches gathering data sets and determining if the data reaches a validity threshold for use. Hsieh improves upon Hertz by disclosing a supervised machine learning process to validate data through human curation to be used in iterative machine learning training. One of ordinary skill in the art would be motivated to further include a supervised machine learning process to validate data through human curation to be used in iterative machine learning training, to efficiently utilize human and artificial intelligence to better refine data in data sets.	Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the system and method of gathering data sets and 
As per claim 4, claim 4 recites a method that is substantially similar to the method performed by the system in claim 1 and is rejected for the same reasons put forth in regard to claim 1.
As per claims 2 and 5, the combination of Hertz and Hsieh teaches each of the limitations of claims 1 and 4 respectively.
In addition, Hertz teaches:
further comprising a synthetic data generator comprising a third plurality of programming instructions stored in the memory of, and operating on the processor of, the computing device, wherein the third plurality of programming instructions, when operating on the processor, cause the computing device to: retrieve a reputable data set collection stored within the data store (Paragraph Number [0068] teaches server 12 may include in non-volatile memory 20 a Supply Chain Analytics & Risk “SCAR” (aka “Value Chains”) engine 23, as discussed in detail hereinbelow, in connection with determining supply chain relationships among companies and providing other enriching data for use by users. SCAR 23 includes, in this example, a training/classifier module 25, Natural Language Interface/Knowledge Graph Interface Module 27 and Evidence Scoring Module 29 for generating and updating Knowledge Graphs associated with 
generate a synthetic data set from the reputable data set (Paragraph Number [0102] teaches the machine learning (ML)-based classifier may involve use of positive and negative labeled documents for training purposes. Training may involve nearest neighbor type analysis based on computed similarity of terms or words determined as features to determine positiveness or negativeness. Inclusion or exclusion may be based on threshold values. A training set of documents and/or feature sets may be used as a basis for filtering or identifying supply-chain candidate documents and/or sentences. Training may result in models or patterns to apply to an existing or supplemented set(s) of documents).
send the synthetic data set to the reputation scoring engine; and wherein the reputation scoring engine further merges the synthetic data with the reputable data set collection where the synthetic data set passes the threshold for reputability (Paragraph Number [0102] teaches the machine learning (ML)-based classifier may involve use of 
As per claims 3 and 6, the combination of Hertz and Hsieh teaches each of the limitations of claims 1 and 2, and 4 and 5 respectively.
Hertz teaches gathering data sets and determining if the data reaches a validity threshold for use, but does not explicitly teach a supervised machine learning process to validate data through human curation in addition to a generative adversarial network to be used in iterative machine learning training which is taught by the following citations from Hsieh:
wherein the synthetic data generator is a generative adversarial network. (Paragraph Number [0114] teaches an AI methodology (e.g., deep learning network model and/or other machine learning model, etc.) is selected from an AI catalog 1326 of available models, for example. For example, a deep learning model can be imported, the model can be modified, transfer learning can be facilitated, an activation function can be selected and/or modified, machine learning selection and/or improvement can occur (e.g., support vector machine (SVM), random forest (RF), etc.), an optimization algorithm (e.g., stochastic gradient descent (SGD), AdaG, etc.) can be selected and/or modified, etc. The AI catalog 1326 can include one or more AI models such as good old fashioned 
A person of ordinary skill would be motivated to combine these references for the same reasons put forth in regard to claim 1.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW H. DIVELBISS whose telephone number is (571) 270-0166. The fax phone number is 571-483-7110. The examiner can normally be reached on M-Th, 7:00 - 5:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O'Connor can be reached on (571) 272-6787. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 

/MATTHEW H DIVELBISS/Examiner, Art Unit 3624