Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s election without traverse of  Species IV (Claims 13-18) in the reply filed on 03/24/2022 is acknowledged.
Claims 1-12 withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected elected species, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 03/24/2022.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Berezin et al. (US Patent 5,539,752) and Cao, Su-Qun (AU 2013100982).
As to claim 13, Berezin discloses a method for detecting a defect pattern on a wafer [ (Each wafer die, such as a die 32a, is scanned and the information derived therefrom is converted to digital form and processed by the computer 36. In operation the scan mechanism 40 compares one die of the wafer 32 to the next. Presumably, every die will be identical to the next and any differences indicate an error, i.e., a defect. As will be described further with reference to FIG. 2, the computer 36 constructs files comprising records of scanned defects (Fig. 1, Col. 5, lines 58-64)], the method being performed by a computing device (Fig. 1, item 36), the method comprising: 
obtaining binarized inspection data including data indicating defectiveness of each of a plurality of chips formed on the wafer [The wafer 32 may be moved relative to scan mechanism 40 by a transport mechanism (not shown) in the x and y directions perpendicular to the scan mechanism 40 to cause the microscope to sweep over the surface of the wafer in a series of adjacent, possibly overlapping, scan lines. Each wafer die, such as a die 32a, is scanned and the information derived therefrom is converted to digital form (binarized  data) and processed by the computer 36. In operation the scan mechanism 40 compares one die of the wafer 32 to the next. Presumably, every die will be identical to the next (no-defect) and any differences indicate an error, i.e., a defect (no-defect/defect or defectiveness of the plurality of dies) (Col. 5, lines 54-64)]; 
calculating a defective chip density in each of quadrats, the quadrats formed on a plane on the wafer and partitioned to include a plurality of chips in each quadrat [The number of defects to be chosen (MaxDefects) can be expressed as a total number, a percentage of defects on a wafer, or a combination thereof. Any number can be chosen so long as it represents a statistically meaningful sample of defects (col. 9, lines 24-28). Further, the subpopulations can alternatively define, instead of size ranges, defect locations expressed, for example, as boxes, quadrants, radial zones or the like (Col. 9, lines 36-39). The choice of the randomizing function ("RandomDie" or "RandomDefect") determines how the defects are randomly chosen from all of the defects in the subpopulation (quadrant) (Col. 9, lines 40-42). The advantage of using the RandomDie function is that it offers a better chance of selecting meaningful defects that might exist in less populated die (low defect density), as opposed to statistically concentrating on defects in areas of high defect density (col. 9, lines 48-52) (i.e., defective die density is calculated in each quadrant (subpopulation) from the density of the dies in the quadrat)] ;
extracting a feature for the binarized inspection data, wherein the feature comprises a feature calculated based on the defective chip density for each quadrat [When the defects to be reviewed are randomly selected (extracted) from the three subpopulations A, B and C, the defect types are found in different proportions in each of the subpopulations, thereby accurately reflecting the number of different defect types for different defect size ranges (defect feature for inspection data) on the layer ( col. 11, lines 59-64. The random selection of defects is based on meaningful defects that might exist in less populated die (low defect density) (col. 9, lines 48-52) (i.e., defects of different sizes (features) are extracted based on low defect density areas. The review process described herein may be performed manually or by machine vision recognition of defect attributes (features) (col. 14, lines 5-7)].
Berezin further discloses a defect is either part of a cluster or is discardable. In particular, some defects are assigned a classification code indicating they are part of a cluster, i.e., defects in close proximity to one another on the wafer 32, and other defects of the same cluster are assigned a code that indicates them to be "discardable." The discardable defects therefore can be removed from consideration in later classification and yield determining procedures (col. 7, lines 29-36). 
Berezin does not disclose, performing unsupervised learning for generating a defect pattern clustering model using the feature of the binarized inspection data. 
Cao discloses an efficient unsupervised feature selection method based on unsupervised optimal discriminant vector in a learning machine to find the important features without using class labels. It derives the optimal discriminant vector in unsupervised pattern. Then, it defines the single feature importance measurement based on each dimension value of unsupervised optimal discriminant vector to determine the importance of each feature. After removing, the features with little importance measurement from the feature subset and the features remaining after feature selection are then used to train a learning machine for purposes of pattern classification (defect pattern clustering model), fault diagnosis, clustering and/or novelty detection. The proposed method is able to find important features and is a reliable and efficient feature selection methodology. (Abstract, par. [0008]). The method is used to classify steel plats’ faults (par. [0027]).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made to use the teachings of Cao to modify the method of Berezin by performing unsupervised learning for generating a defect pattern clustering model using the feature of the binarized inspection data in order to achieve a reliable and efficient defect feature selection methodology and to find important defect features (par. [0008]).
As to claim 14, Berezin further discloses wherein calculating the defective chip density in each of the quadrats comprises: 
counting the number of defective chips in each of the quadrats [In step 404 of FIG. 4, a count is made of the total number of defects (N.sup.T) in each defect subpopulation (quadrat) (Col. 11, lines 19-20 and col. 12, lines 26-27); and 
calculating the defective chip density in each of the quadrats by standardizing or normalizing the number of defective chips in each of the quadrats [In Table I, for example, the count of defects of type t1 in subpopulation A is equal to 2, i.e., N.sub.At1 =2. In step 414 the expected occurrence of the defect type (standardizing or normalizing the number of defective chips) is computed. The expected occurrence is an extrapolation from the number of defects actually found of the defect type (N.sub.t), to a number of how many should be expected for that type in the entire subpopulation. The expected occurrence (E.sub.t) is determined by dividing the number of the defect type found (N.sub.t) in the subpopulation (quadrat) by the number selected for review (N.sup.S) in the subpopulation (quadrat), and then multiplying that amount by the total number of defects (N.sup.T) in the subpopulation (i.e., E.sub.t =(N.sub.t /N.sup.S)*N.sup.T) (col. 12, lines 28-39).
Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Berezin et al. (US Patent 5,539,752) and Cao, Su-Qun (AU 2013100982) as applied to claim 13 above, and further in view of Young, Steven, "Deep Super Learner: A deep Ensemble for Classification Problems", arXiv: 1803.02323v1 [cs.LG] 6 Mar 2018.
As to claim 15, neither Berezin nor Cao discloses, wherein performing the unsupervised learning comprises performing hyper-parameter optimization to obtain an optimal hyper-parameter comprising a resolution of a first axis and a resolution of a second axis of the quadrat that minimizes a value of a loss function, wherein the loss function is an evaluation function of a degree of grouping of the defect pattern clustering model generated as a result of performing the unsupervised learning.
Young discloses deep learning is a powerful machine learning method that extracts lower level features and feeds them forward for the next layer to identify higher level features that improve performance. However, deep neural networks have drawbacks, which include many hyper-parameters and infinite architectures, opaqueness into results, and relatively slower convergence on smaller datasets. This paper proposes deep super learning as an approach which achieves log loss and accuracy results competitive to deep neural networks while employing traditional machine learning algorithms in a hierarchical structure (Abstract). Given the drawbacks of DNN and the poor performance of traditional machine learning algorithms (supervised or unsupervised learning) in some domains and/ or prediction tasks, this paper investigates whether traditional machine learning algorithms can be used to address the drawbacks of DNN and achieve levels of performance comparable to DNN (section 1.1). In a classification (clustering similar classes) task the hyper-parameter is tuned (optimized) by Minimizing LogLoss (loss function), also known as cross entropy shown in equation 2 which evaluates the accuracy of the estimated probability p(x, y) of instance x is in class y with respect to the actual probability  f(x,y) of instance x to be of class y (the loss function is an evaluation function of degree of grouping of the clustering model) (section 3.4). the dataset is images of handwritten digits. The images are 28 pixels by 28 pixels in greyscale (resolution of first axis and resolution of second axis) (section 3.3).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made to use the teachings of Young to modify the combined method of Berezin and Cao by performing the unsupervised learning comprises performing hyper-parameter optimization to obtain an optimal hyper-parameter comprising a resolution of a first axis and a resolution of a second axis of the quadrat that minimizes a value of a loss function, wherein the loss function is an evaluation function of a degree of grouping of the defect pattern clustering model generated as a result of performing the unsupervised learning in order to perform deep super learning approach which achieves log loss and accuracy results competitive to deep neural networks while employing traditional machine learning algorithms in a hierarchical structure (Abstract). In the combined method of Berezin, Cao and Young the event X is a defect pattern, the class y is defectiveness (defect/no-defect) and the log loss function is the evaluation function of degree of grouping of defect patterns clustering (classification) model generated by unsupervised learning.
Allowable Subject Matter
Claims 16-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  the prior art of record does not teach “wherein the binarized inspection data further comprises measurements for each chip that is a basis for calculating the data indicating the defectiveness of each of the plurality of chips; and the feature further comprises: a distribution feature based on a polar coordinate system calculated on a basis of on a measurement distribution pattern obtained as a result of density estimation based on the polar coordinate system of a defective chip; and a degree of risk of defects calculated based on the measurements distribution pattern obtained as a result of density estimation for the measurements for each chip” recited in claim 16 and “wherein performing the unsupervised learning comprises performing the unsupervised learning for generating the defect pattern clustering model using a self-organizing map algorithm; and the method further comprises: assigning, for each cluster generated by the defect pattern clustering model, a grade based on a measurement distribution pattern obtained as a result of density estimation for measurements for each chip of inspection data for each wafer clustered into the cluster, wherein the inspection data comprises measurements of each chip having a first axis coordinate and a second axis coordinate” recited in claim 18  in combination with the other features of the claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMIR ANWAR AHMED whose telephone number is (571)272-7413. The examiner can normally be reached flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward Urban can be reached on (571)272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SAMIR A AHMED/Primary Examiner, Art Unit 2665