DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

2.	This is in response to the applicant response filed on 07/12/2022. In the applicant’s response, the Specification was amended; claims 1, 16, 10, 12-13, 15, 18, and 20 were amended. Accordingly, claims 1-20 are pending and being examined. Claims 1, 12, and 20 are independent form.
Claim Rejections - 35 USC § 103
3	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1-3, and 5-19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (“Clustering-based Missing Value Imputation for Data Preprocessing”, IEEE, 2006, hereinafter “Zhang”) in view of Zhao et al (“HICCUP: Hierarchical Clustering Based Value Imputation using Heterogeneous Gene Expression Microarray Datasets”, IEEE, 2007, hereinafter “Zhao”). 

Regarding claim 1, Zhang discloses a computer-implemented method (the method for missing data imputation based on data clustering; see the title and fig.1) comprising:
receiving a data missing record with a plurality of fields, wherein the data missing record is missing a value for a missing field out of the plurality of fields (receiving instance I having a missing value Ik; see “missing valued instance” Ik in step 4 of the CRI algorithm in Section II-C, on page 1083); 
identifying a closest record to the data missing record, wherein the closest record is a single member of a identifying its nearest cluster number; see “its nearest cluster Ci” in step 4.2 of the he CRI algorithm in Section II-C, on page 1083); 
determining a mean estimate for the missing field based on records in the calculating the mean value based on the instances in cluster Ci; see steps 5.1-5.2 of the he CRI algorithm; see Eq(2). It should be noticed that kernel deterministic imputation m(Xi) defined by Eq(2) is a mean estimate of the observed (complete) attributes Xi in cluster Ci; see Section II-B); 
determining a record similarity between the data missing record and the closest record (wherein the nearest Ci is determined by the distance/similarity between instance Ik and cluster Ci; see “its nearest cluster Ci” in step 4.2 of the he CRI algorithm in Section II-C, on page 1083); and 
imputing an imputed value for the missing field of the data missing record (“use m(Xi) to fill missing-value in Ik”; see steps 5.3—5.5 of the he CRI algorithm in Section II-C), wherein the imputing comprises adjusting an observed value for the missing field of the closest record based on the record similarity (computing the imputed value m(Xi) using Eq.(2) only for those instances who are close/similar to its nearest cluster Ci. It should be noticed that Xi is an observed value string, see Section II-B-1) and Ci is the nearest cluster determined based on the distance/similarity to the missing-value in Ik).

Zhang is silent to using “hierarchical” clustering to build “hierarchical” clusters as recited in the claim. However, hierarchical clustering is a well-known technique and widely used in the field of missing data imputation. As evidence, Zhao teaches a value imputation method based on “hierarchical clustering” for gene expression microarray datasets (see “hierarchical clustering” in fig.1(a) and fig.2). It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Zhao into the teachings of Zhang by replacing general clustering taught by Zhang with hierarchical clustering taught by Zhao, to select appropriate subsets of the most relevant samples for better value imputation (Zhao, see Section I-A).

Regarding claim 3, 13, the combination of Zhang and Zhao discloses: completing the data missing record with the imputed value, whereby a completed record is created; and adding the completed record to an augmented dataset (Zhang: “use m(Xi) to fill missing-value in Ik”; see steps 5.3—5.5 of the he CRI algorithm in Section II-C).

Regarding claim 5, 14, the combination of Zhang and Zhao discloses: partitioning an input dataset (Zhang: see S of fig.1) into a data missing subset and a data complete subset (Zhang: see Sc and Sic of fig.1); and selecting the data missing record from the data missing subset (Zhang: see Sic[Wingdings font/0xE0] Cluster of fig.1; see “the CRI algorithm” in Sec.II-C. Zhao: see fig.2); wherein the hierarchical cluster comprises members of the data complete subset (Zhao: see “hierarchical cluster” in fig.2).

Regarding claim 6, 15, the combination of Zhang and Zhao discloses t: clustering the data complete subset into a plurality of hierarchical clusters (Zhang: fig.1; Zhao: fig.2), wherein the hierarchical cluster is a given hierarchical cluster out of the hierarchical clusters (Zhao: fig.2).

Regarding claim 7, the combination of Zhang and Zhao discloses the method of claim I wherein: determining the mean estimate for the missing field comprises determining a trimmed mean of records in a cluster for the missing field (this is obvious variation of the method in Zhang since “a trimmed mean” is an obvious variation of other means). As a further rationale, one of ordinary skill in the art before the effective filling date of the claimed invention was made would have found it obvious to replace the mean calculation of Zhang with the trimmed mean calculation of the claimed invention since doing this would amount to a simple substitution of one known method (directly mean calculation defined by Eq(4) in Zhang) for another (the trimmed mean calculation after discarding the smallest and largest valuers defined by the claimed inventions) to obtain predictable results.

Regarding claim 8, 16, the combination of Zhang and Zhao discloses: determining the record similarity comprises calculating a cosine similarity between the data missing record and the closest record (Zhao: see “match score” “cosine similarity” in Sec.I-B, the last paragraph; Zhang: see “the nearest cluster” in “the CRI algorithm” in Sec.II-C).

Regarding claim 9, 17, the combination of Zhang and Zhao discloses t: adjusting an observed value comprises multiplying the observed value by the cosine similarity (Zhang: see Eq(2)).

Regarding claim 10, 18, the combination of Zhang and Zhao discloses: imputing the value comprises subtracting the observed value adjusted by the cosine similarity from a trimmed mean or adding the observed value adjusted by the cosine similarity to a trimmed mean (Zhang: see the CRI algorithm in Sec.II-C).

Regarding claim 11, 19, the combination of Zhang and Zhao discloses: choosing between adding and subtracting based on the cosine similarity (Zhao: see “cosine similarity” in Sec.I-B, the last paragraph).

Regarding claim 2, 12, each od the them is an inherent variation of claim 1, thus it is interpreted and rejected for the reasons set forth above in the rejection of claim 1.

6.	Claims 4 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Zhao and further in view of Purwar et al (“Hybrid prediction model with missing value imputation for medical data”, 2015, hereinafter “Purwar”).

Regarding claim 4, the combination of Zhang and Zhao does not explicitly disclose “training a neural network using the augmented dataset” as recited in the claim. However, in the same field of endeavor, Purwar teaches a “supervised learning”, such as “multiplayer perceptron with backpropogation”, i.e., an artificial neural network, after performing missing data imputation (see the procedure shown fig.1; see Sec. 2.1.1.1, para.3). It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Purwar into the teachings of the combination of Zhao and Zhang---to train a neural network using the imputed dataset taught by Purwar, in order to improve accuracy of predictive classification (Purwar, see Abstract).

Regarding claim 20, the combination of Zhang, Zhao, and Purwar disclose the claimed inventions since claim 20 is a combination of claims 1 and 3-11, thus it is interpreted and rejected for the reasons set forth above in the rejections of claims 1 and 3-11.

Response to Arguments
7.	Applicant’s arguments, filed on 07/12/2022, have been fully considered but they are not persuasive. 

7-1.	On page 13-14 of applicant’s response, regarding claim 1, applicant traverse:
“Applicant is unable to find where Zhang imputes a value for a missing field of a data missing record based on a record similarity between the data missing record and a single closest record, as now recited by claim 1.  A further example Zhang's calculation of missing values based on all the records is shown at figure 2 and page 1083…Thus, Zhang's kernel calculation does not calculate based on a single closest record.”

(The emphases added by the examiner.)

The examiner respectfully notices that the calculation of a missing value in the method in Zhang is not based on all the records/clusters, but instead is based on its nearest record/cluster Ci. As explained in the rejections the claims, the CRI algorithm, recites:
4.1. compute distances between Ik and each cluster
4.2. assign Ik to its nearest cluster Ci
5. For each cluster Ci,
5.1. For each missing-valued instance Ik in cluster Ci,
5.2. use (2)to compute m(Xi)
5.3. For each missing-valued instance Ik in cluster Ci,
5.4. if using deterministic imputation method
5.5. use m(Xi) to fill missing-value in Ik

It is apparent that m(Xi) for filling missing value in Ik is determined by only instances who belong to the nearest cluster Ci of Ik rather than all the records/clusters. Regarding Fig.2 of Zhang, that is only a “special case”, i.e., only one cluster which is the nearest cluster, as mentioned by Zhang. See Sec.III, 1st para: “In the previous discussions of our strategy for handling missing values, we know that the situation of K=1 (i.e., only one cluster) is the special case, which is equal to the situation of without clustering on the whole dataset.” Fig.2 doesn’t mean that Zhang's calculation of missing values based on all the records/clusters. Therefore, the argument is unpersuasive.

7-2.	On page 15 of applicant’s response, regarding claim 7, applicant traverse the “obviousness rejection”. 

The examiner respectfully points out that the difference between the prior art and the claimed invention is how to determine a mean value of records. The former is to directly calculate the mean value of the records by Eq(4). The latter is to (1) discard the smallest and largest records, and then (2) calculate the mean value of the remaining records. One of ordinary skill in the art before the effective filling date of the claimed invention was made would have found it obvious to replace the mean calculation of Zhang with the trimmed mean calculation of the claimed invention since doing this would amount to a simple substitution of one known mean calculation method (a direct mean calculation defined by Eq(4) in Zhang) for another known mean calculation method (a trimmed mean calculation defined by applicant’s specification after discarding the smallest and largest valuers) to obtain predictable results. Therefore, the argument is unpersuasive.

7-3.	On page 16-18 of applicant’s response, regarding claims 8-12, and 16-20, the arguments focus on “a single closest record” which is similar with the argument of claim 1, thus they are unpersuasive as set forth in the Sec. 7-1 of this office action.
Conclusion
8.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to RUIPING LI whose telephone number is (571)270-3376. The examiner can normally be reached 8:30am--5:30pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, EMILY TERRELL can be reached on (571)270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit https://patentcenter.uspto.gov; https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center, and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RUIPING LI/Primary Examiner, Ph.D., Art Unit 2666                                                                                                                                                                                                        7/19/2022