Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification/Claims

The disclosure is objected to because of the following informalities:
Claim 1;
Line 8: for clarity insert “from a plurality of ML models” after “(ML) models”
Lines 11-13: for clarity delete “providing the plurality…one or more defect classes,” (since at this point of the method it does not make sense to classify the images)
Lines 19-25: the 3 limitations in these 7 lines should be further indented to indicate that only they are part of the training (see line 18); the last 4 limitations are for inspection using the ML models after they are trained and should not be part of the training 
Claim 8, lines 6-8: for clarity replace “providing the plurality…one or more defect classes” with “providing one or more machine Learning (ML) models from a plurality of ML models”
Claim 9, line 1: “is” is extraneous
Claim 16, lines 9-11: for clarity replace “providing the plurality…one or more defect classes” with “providing one or more machine Learning (ML) models from a plurality of ML models”
Claim 18, line 2: “comprises” is extraneous
  
Appropriate correction is required.


Claim Interpretation - 35 USC § 112(f)

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f), is invoked.

As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f).  The presumption that 35 U.S.C. 112(f) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  

Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f).  The presumption that 35 U.S.C. 112(f) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 

Claim limitations in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.

This application includes one or more claim limitations in claims 1-6 and 16-20 that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) include the placeholder(s) “unit” (such as “imaging unit” included in claim 1) either expressly or, in the case of dependent claims, by inheritance.

A review of the specification shows that the following appears to be the corresponding structure(, material, or acts for performing the claimed function) described in the specification for the 35 U.S.C. 112(f) limitation: Fig. 1, refs. 104, 118 and paragraphs 34, 35 and 37. 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) , it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f).

For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 7, 8, 10-13, 15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928) and De Stefano et al. (“Learning Bayesian Networks by Evolution for Classifier Combination,” 10th International Conference on Document Analysis and Recognition; Date of Conference: 26-29 July 2009).

Regarding claim 1 (and similarly claims 7, 8 and 16), Zhang discloses:
providing one or more imaging units;
providing a computing unit;
receiving a plurality of images taken from one or more dies on a semiconductor wafer under inspection by the one or more imaging units, wherein the plurality of images are captured using a plurality of imaging modalities;
[Fig. 1 and paragraphs 25 (“…in FIG. 1. The system includes one or more computer subsystems...36 and…102”), 26 (“…the specimen is a wafer”), 27 (“…imaging tool 10…direct light to specimen 14”), 34 (“The imaging tool further includes one or more detection channels…FIG. 1 includes two detection channels…detector 28 and…34… In some instances, both detection channels…detect scattered light…one or more of the detection channels may…detect another type of tight from the specimen (e.g., reflected light)”), 38 (“The one or more detection channels may include…photo-multiplier tubes (PMTs), charge coupled devices (CCD), time delay integration (TDI) cameras, and any other suitable detectors known in the art. The detectors may also include non-imaging detectors or imaging detectors”).  Note that a wafer comprises a plurality of dies]
providing one or more Machine Learning (ML) models, the one of more ML modes being associated with at least a computer processor, a database and a memory associated with the computing unit;
providing the plurality of images to the one or more (ML) models from a plurality of ML models, the computer processor identifying and classifying one or more defects present in the semiconductor wafer into one or more defect classes, (wherein the plurality of ML models are configured in a Directed Acyclic Graph (DAG) architecture, wherein each node in the DAG architecture represents a ML model, wherein the one or more ML models are configured as root nodes in the DAG architecture, the plurality of ML models being configured to be trained to classify the one or more defects on one or more dies in the semiconductor wafer),
[Figs. 1, 3 and paragraphs 54 (“The component(s)…100…executed by…computer subsystem 36 and/or…102, include deep learning model 104…for determining information from an image generated for a specimen by an imaging tool”), 58 (“ In another embodiment, the deep learning model is a machine learning model”), 68 (“…the deep learning model may output an image classification…with a confidence associated…The image classification may have any suitable format (such as an image or defect ID, a defect description…The image classification results may be stored”), 75 (“…the deep learning model described herein is a trained deep learning model”), 98 (“…Model training 328 may generate one or more trained models”).  Note that the  use of DAG is taught by De Stefano; see the analysis below]

wherein the training comprises:
providing a plurality of labelled images and a plurality of reference images of the semiconductor wafer stored in the database to the one or more ML models from a plurality of ML models;
configuring each ML model from the plurality of ML models to classify the plurality of labelled images into one or more defect classes using corresponding reference image from the plurality of reference images;
[Fig. 3 and paragraphs 97 (“…data and labels 318 is separated into training data 322, validation data 324, and test data 326”), 98 (“…Training data 322 may be input to model training 328, which…generate one or more trained models, which may then be sent to model selection 330, which is performed using validation data 324…to determine which of the models is the best model…Best model 332…may be sent to imaging tool 300 for use in a production or runtime mode (post-training mode)…then be applied to additional images…generated by the imaging tool”).  Note that the collection of data and labels 318 is considered a database.  Note further that the validation data and the training data are considered labeled images and reference images, respectively]
storing the one or more defect classes;
[Fig. 3 and paragraph 68 (“…the deep learning model may output an image classification…with a confidence associated…The image classification may have any suitable format (such as an image or defect ID, a defect description…The image classification results may be stored”)]
inspecting one or more dies contained on a semiconductor wafer for defects by imaging the one or more dies;
attempting to match the images of the one or more dies to any one or more of the one or more defect classes;
[Fig. 3 and paragraph 98 (“…Best model 332…may be sent to imaging tool 300 for use in a production or runtime mode (post-training mode)…then be applied to additional images…generated by the imaging tool”)]
if a match exists between the one or more dies and the one or more defect classes, classifying the one or more matching dies as defective and communicating the identity of and rejecting as defective the one or more defective dies
[Fig. 3 and paragraphs 58 (“…the deep learning model is a machine learning model”), 68 (“…the deep learning model may output an image classification…with a confidence associated…The image classification may have any suitable format (such as an image or defect ID, a defect description…The image classification results may be stored”)]

Zhang does not expressly disclose the following, which are taught by De Stefano:
wherein the plurality of ML models are configured in a Directed Acyclic Graph (DAG) architecture, wherein each node in the DAG architecture represents a ML model, wherein the one or more ML models are configured as root nodes in the DAG architecture, the plurality of ML models being configured to be trained to classify the one or more defects on one or more dies in the semiconductor wafer
nd paragraph (“…Once this conditional probability has been learned, the combiner provides the output for each unknown input sample, as the most probable class given the expert observations, by the following expression:…(1)…where C is the set of classes”); Section 3, the 2nd paragraph (“…A DAG must have at least one source and at least one sink. In a DAG structure nodes are partially ordered: a node i comes before a node j if it exists a directed path from i to j…The data structure that we have devised for encoding DAG structures, called multilist (ML), consists of two basic lists. The first one, called main list, contains all the nodes of the DAG…To each node of the main list is associated a second list called sublist, representing the out going connections among that node and the other nodes in the DAG”).  Note that Fig. 2 provides examples of the DAG structure, with each node being a classifier.  The example in Fig. 2(b) has two root nodes.  That each classifier is a machine learning model is disclosed by Zhang above.  Note further that Fig. 1 and Section 2 teach how to combine classifier results at each non-root node]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify Zhang with the teaching of De Stefano as set forth above.  The reasons for doing so at least would have been that combining classifiers has been shown to be effective, as De Stefano indicated in the abstract and the 1st paragraph of Section I. Introduction.

Regarding claim 3 (and similarly claim 10), Zhang further discloses:
wherein each of the plurality of ML models is one of, a supervised model a semi-supervised model and an unsupervised model
[Paragraph 56 (“Deep learning is part of a broader family of machine learning methods based on learning representations of data…One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning”).  Note that an ML model is either supervised, unsupervised or semi-supervised (to various degree of “semi”)]

Regarding claim 4 (and similarly claim 11), Zhang further discloses:
wherein the plurality of modalities includes at least one of: X- ray imaging, Inner-Crack-Imaging (ICI), grayscale imaging, black and white imaging, and color imaging
[Fig. 1 and paragraph 38 (“The one or more detection channels may include…charge coupled devices (CCD), time delay integration (TDI) cameras, and any other suitable detectors known in the art”).  Note that a CCD camera was known to be able to capture images in color, grayscale or black-and-white and which to capture is a design choice]

Regarding claim 5 (and similarly claim 12), Zhang further discloses:
wherein the plurality of ML models are deep learning models
[Figs. 1, 3 and paragraphs 54 (“The component(s)…100…executed by…computer subsystem 36 and/or…102, include deep learning model 104…for 

Regarding claim 13, Zhang further discloses:
wherein the plurality of labelled images comprises labels related to the one or more defect classes,
[Paragraph 68 (“…The image classification may have any suitable format (such as…defect ID, a defect description”).  Note that the defect ID indicates a defect class]
wherein the plurality of labelled images is generated using historical images of the semiconductor wafer
[Figs, 1. 3 (refs. 302, 306, 328) and paragraphs 25 (“…he system includes imaging tool 10 configured for generating images of a specimen”), 26 (“…the specimen is a wafer”), 95 (“…The deep learning model development workflow may include data collection 302 from imaging tool 300”), 96 (“The deep learning model development workflow may also include data labeling 306”), 98 (“Training data 322 may be input to model training 328”).  Note that the labelled images are generated at 306, prior to being used for training at 328, and therefore are historical images]

Regarding claim 15 (and similarly claim 20), De Stefano further discloses:
wherein post -processing includes accurately classifying the plurality of images into the one or more defect classes using the classification information from each of the plurality of ML models
[Fig. 1; Abstract (“”Combining classifier methods have shown their effective-ness in a number of applications.”); Section 2, the 2nd paragraph (“…Once this conditional probability has been learned, the combiner provides the output for each unknown input sample, as the most probable class given the expert observations, by the following expression:…(1)…where C is the set of classes”).  Note further that Fig. 1 and Section 2 teach how to combine classification results from multiple classifiers such as ML models trained to perform classification]

Regarding claim 17, Zhang further discloses:
wherein the one or more imaging units include, at least one of, an Automated Optical Inspection (AOI) apparatus, an Automated X-ray Inspection (AXI) apparatus, a Joint Test Action Group (JTAG) apparatus, and an In-circuit test (ICT) apparatus
Fig. 1 and paragraphs 27 (“…the imaging tool is configured as an optical based imaging tool. In this manner, in some embodiments, the images are generated by an optical based imaging tool”), 34 (“…the imaging tool shown in FIG. 1 includes two detection channels, one formed by collector 24, element 26, and detector 28 and another formed by collector 30, element 32, and detector 34”), 40 (“Computer subsystem 36 of the imaging tool may be coupled to the detectors of the imaging tool…”), 43 (“… the imaging tool may be configured as an electron 

>>><<<
Claims 2, 9, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928) and De Stefano et al. (“Learning Bayesian Networks by Evolution for Classifier Combination,” 10th International Conference on Document Analysis and Recognition; Date of Conference: 26-29 July 2009) as applied to claims 1, 3-5, 7, 8, 10-13, 15-17 and 20 above, and further in view of Hasan et al. (US 2021/0151034).

Regarding claim 2 (and similarly claim 9), the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 1 but not expressly the following, which is taught by Hasan:
wherein the one or more ML models are provided with the plurality of images and the plurality of labelled images belonging to an imaging modality from the plurality of imaging modalities


	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Hasan as set forth above.  The reasons for doing so at least would have been that training ML models (e.g., subnetworks) independently may enable the system a better learning of intra-modal dynamics, as Hasan indicated in paragraph 31.

>><<
Regarding claim 14 (and similarly claim 19), the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 8 (respectively, claim 16) but not expressly the following, which is taught by Hasan:
wherein one of a late fusion technique or an early fusion technique or a hybrid fusion technique is used to combine features extracted from the plurality of modalities
[Fig. 1 and paragraphs 29 (“…The subnetworks 102, 104, and 106 may each be independently trained to determine…different modalities”), 30 (“…Features 120 

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Hasan as set forth above.  The reasons for doing so at least would have been that training ML models (e.g., subnetworks) independently may enable the system a better learning of intra-modal dynamics, as Hasan indicated in paragraph 31.

>>><<<
Claims 6 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928) and De Stefano et al. (“Learning Bayesian Networks by Evolution for Classifier Combination,” 10th International Conference on Document Analysis and Recognition; Date of Conference: 26-29 July 2009) as applied to claims 1,  above, and further in view of Hertzmann et al. (US 2017/0220903).

Regarding claim 6, the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 1 and additionally the following:
wherein the plurality of labelled images comprises labels related to the one or more defect classes,
[Zhang: Paragraph 68 (“…The image classification may have any suitable format (such as…defect ID, a defect description”).  Note that the defect ID indicates a defect class]

	The combined invention does not expressly disclose the following, which is taught by Herzmann:
wherein the plurality of labelled images is generated using a labelling model
[Figs. 1, 2 and paragraphs 43 (“…once trained the labeling model 112 is used to recognize these patterns and assign corresponding labels”), 44 (“…labeling module 214 employs the labeling model 112 to process an input of a subsequent image 216 to generate a labeled image 218”)]

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Herzmann as set forth above.  The reasons for doing so at least would have 

>><<
Regarding claim 18, the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 16 and additionally the following:
wherein the computing unit receives the plurality of labelled images comprising labels related to the one or more defect classes, 
[Zhang: Paragraph 68 (“…The image classification may have any suitable format (such as…defect ID, a defect description”).  Note that the defect ID indicates a defect class]
(wherein the labelling model) generates the plurality of labelled images using historical images of the semiconductor wafer
[Zhang: Figs, 1. 3 (refs. 302, 306, 328) and paragraphs 25 (“…he system includes imaging tool 10 configured for generating images of a specimen”), 26 (“…the specimen is a wafer”), 95 (“…The deep learning model development workflow may include data collection 302 from imaging tool 300”), 96 (“The deep learning model development workflow may also include data labeling 306”), 98 (“Training data 322 may be input to model training 328”).  Note that the labelled images are generated at 306, prior to being used for training at 328, and therefore are historical images.  Note further that the applied teaching is to use historical images to generate labelled images.  That the labelled images are generated using a labelling model is taught by Hermann; see the analysis below]

The combined invention does not expressly disclosed the following, which is taught by Hermann:
(that the labelled images are received) from a labelling model,

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Herzmann as set forth above.  The reasons for doing so at least would have been to reduce labeling errors caused by manual labeling, as Herzmann indicated in paragraphs 3 and 4.

Conclusion and Contact Information

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Riley et al. (US 2019/0294923)—[Fig. 3 and paragraphs  78 (“…As shown in FIG. 3, training images 300 and labeled defects 302 may be input to training algorithm 304…produces trained model 306…used for defect detection and/or classification”), 87 (“… different labels may be associated with each training defect image and/or each defect in each training defect image”), 89 (“… train the machine learning model by inputting the class labels and the training images…with one or more training reference images…For example, reference images…can be inserted as the second…channel of the machine learning model. The reference images may include defect free images of the specimen. In this manner, the defect free images and the defect images may be input to the machine learning model as different data sets rather than a single training set of images.”)]
Alletto et al. (US 2020/0082197)—[Fig. 3 and paragraphs 28 (“…the machine learning model is trained…while an image with no noise is used as reference data (also referred to as correct data or label data) and an image with noise is used as conversion data (also referred to as training data)”)]
Nekarda et al. (US 2009/0181384)[Paragraph 116 (“Bayesian networks. A directed acyclic graph is used to represent a collection of variables in conjunction with their joint probability distribution, which is then used to determine the probability of class membership for a sample”)]
Badanes et al. (US 2021/0209418)—[Paragraph 63 (“…the runtime image can be from images of the specimen (e.g. a wafer, a die or parts thereof) captured during the manufacturing process, derivatives of the captured images obtained by various pre-processing stages (e.g. ...SEM images roughly centered around the defect to be classified…)”)]
Fu et al. (US 2021/0090274)—[Fig. 2 and paragraphs 95 (“…the image labelling model 236 labels visual information and depth information obtained by the camera array”), 143 (“… generate a labelled image using an image labelling model (e.g., image labelling model 236)”)]
Azvine et al. (WO 2015/044629)—[Figs. 2, 3, 6a-6e; P. 5, lines 26-27 (“Figures 6a to 6e are component diagram illustrating exemplary data structures employed and generated by the embodiments of Figures 2 to 5”)]
De Stefano et al. (“Using Bayesian Network for combining classifiers,” 14th International Conference on Image Analysis and Processing; Date of Conference: 10-14 October 2007)—[Figs. 1, 2]
Kijsirikul et al. (“Multiclass Support Vector Machines Using Adaptive Directed Acyclic Graph,” Proceedings of the 2002 International Joint Conference on Neural Networks; Date of Conference: 12-17 May 2002)—Fig. 3 and Section IV.A (“An Adaptive DAG (ADAG) is a DAG with a reversed triangular structure. In an N-class problem, the system comprises N(N-I)/2 binary classifiers…To classify using the ADAG, starting at the top level, the binary function at each node is evaluated. The node is then exited via the outgoing edge with a message of the preferred class. In each round, the number of candidate classes is reduced by half. Based on the preferred classes from its parent nodes, the binary function of the next-level node is chosen. The reduction process continues until reaching the final node at the lowest level. The value of the decision function is the value associated with the message from the final leaf node (see Figure 3).”)]
Wang et al. (“Improving Classification Efficiency of Orthogonal Defect Classification via a Bayesian Network Approach,” International Conference on Computational Intelligence and Software Engineering; Date of Conference: 11-13 Dec. 2009)—[Figs. 1-3]

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUBIN HUNG whose telephone number is (571)272-7451. The examiner can normally be reached M-F 7:30-16:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.