Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Arguments

This action is in response to amendment filed on 7/26/2022, which has been entered.

Claims 1-20 are still pending.

Note:  In preparing next response Applicant is advised to indicate where in the specification can support be found should claims be amended to include new limitations.  Please also indicate whether paragraph numbers are those of the specification as originally filed or of the published application.

In view of applicant’s amendment the objections to claims 1, 8, 9, 16 and 18 have been withdrawn.

On page 8 of the 7/26/2022-filed response applicant ‘respectfully acknowledges that a “computing unit” and “imaging unit” may be interpreted as covering the corresponding structure described in the specification and equivalents thereof.’

Applicant’s arguments have been fully considered but they are not persuasive; see below.

Applicant argued:

A.	that ‘De Stefano does not disclose a “Polytree Directed Acyclic Graph” of the current claims.’  (P. 9, the penultimate paragraph)

However, De Stefano discloses two polytree DAGs: Fig. 2(a) has a single root node 5 and Fig. 2(b) has two root nodes 1 and 4. In the case of Fig. 2(a), 5 is considered as corresponding to the provided one or more ML models while in the case of Fig. 2(b), 1 and 4 are, as they are the input nodes.  Therefore, the argument is not persuasive.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 7, 8, 10-13, 15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928) and De Stefano et al. (“Learning Bayesian Networks by Evolution for Classifier Combination,” 10th International Conference on Document Analysis and Recognition; Date of Conference: 26-29 July 2009).

Regarding claim 1 (and similarly claims 7, 8 and 16), Zhang discloses:
providing one or more imaging units;
providing a computing unit;
receiving a plurality of images taken from one or more dies on a semiconductor wafer from the one or more imaging units, wherein the plurality of images are captured using a plurality of imaging modalities;
[Fig. 1 and paragraphs 25 (“…in FIG. 1. The system includes one or more computer subsystems...36 and…102”), 26 (“…the specimen is a wafer”), 27 (“…imaging tool 10…direct light to specimen 14”), 34 (“The imaging tool further includes one or more detection channels…FIG. 1 includes two detection channels…detector 28 and…34… In some instances, both detection channels…detect scattered light…one or more of the detection channels may…detect another type of tight from the specimen (e.g., reflected light)”), 38 (“The one or more detection channels may include…photo-multiplier tubes (PMTs), charge coupled devices (CCD), time delay integration (TDI) cameras, and any other suitable detectors known in the art. The detectors may also include non-imaging detectors or imaging detectors”).  Note that a wafer comprises a plurality of dies]
providing one or more Machine Learning (ML) models from a plurality of  Machine Learning (ML) models, the one of more ML modes being associated with at least a computer processor, a database and a memory associated with the computing unit;
providing the plurality of images to the one or more (ML) models from a plurality of ML models, the computer processor identifying and classifying one or more defects present in the semiconductor wafer into one or more defect classes,
[Figs. 1, 3 and paragraphs 54 (“The component(s)…100…executed by…computer subsystem 36 and/or…102, include deep learning model 104…for determining information from an image generated for a specimen by an imaging tool”), 58 (“ In another embodiment, the deep learning model is a machine learning model”), 68 (“…the deep learning model may output an image classification…with a confidence associated…The image classification may have any suitable format (such as an image or defect ID, a defect description…The image classification results may be stored”), 75 (“…the deep learning model described herein is a trained deep learning model”), 98 (“…Model training 328 may generate one or more trained models”).  Note that the  use of DAG is taught by De Stefano; see the analysis below]

wherein the training comprises:
providing a plurality of labelled images and a plurality of reference images of the semiconductor wafer stored in the database to the one or more ML models from a plurality of ML models;
configuring each ML model from the plurality of ML models to classify the plurality of labelled images into one or more defect classes using corresponding reference image from the plurality of reference images;
[Fig. 3 and paragraphs 97 (“…data and labels 318 is separated into training data 322, validation data 324, and test data 326”), 98 (“…Training data 322 may be input to model training 328, which…generate one or more trained models, which may then be sent to model selection 330, which is performed using validation data 324…to determine which of the models is the best model…Best model 332…may be sent to imaging tool 300 for use in a production or runtime mode (post-training mode)…then be applied to additional images…generated by the imaging tool”).  Note that the collection of data and labels 318 is considered a database.  Note further that the validation data and the training data are considered labeled images and reference images, respectively]
storing the one or more defect classes;
[Fig. 3 and paragraph 68 (“…the deep learning model may output an image classification…with a confidence associated…The image classification may have any suitable format (such as an image or defect ID, a defect description…The image classification results may be stored”)]
inspecting one or more dies contained on a semiconductor wafer for defects by imaging the one or more dies;
attempting to match the images of the one or more dies to any one or more of the one or more defect classes;
[Fig. 3 and paragraph 98 (“…Best model 332…may be sent to imaging tool 300 for use in a production or runtime mode (post-training mode)…then be applied to additional images…generated by the imaging tool”)]
if a match exists between the one or more dies and the one or more defect classes, classifying the one or more matching dies as defective and communicating the identity of and rejecting as defective the one or more defective dies
[Fig. 3 and paragraphs 58 (“…the deep learning model is a machine learning model”), 68 (“…the deep learning model may output an image classification…with a confidence associated…The image classification may have any suitable format (such as an image or defect ID, a defect description…The image classification results may be stored”)]

Zhang does not expressly disclose the following, which are taught by De Stefano:
the plurality of ML models are configured in a Polytree Directed Acyclic Graph (Polytree DAG) architecture,
each node in the Polytree DAG architecture represents a ML model, and
the one or more ML models are configured as root nodes in the Polytree DAG architecture, the plurality of ML models being configured to be trained to classify the one or more defects on one or more dies in the semiconductor wafer
[Figs. 1 , 2; Abstract (“”Combining classifier methods have shown their effective-ness in a number of applications.”); Section 2, the 2nd paragraph (“…Once this conditional probability has been learned, the combiner provides the output for each unknown input sample, as the most probable class given the expert observations, by the following expression:…(1)…where C is the set of classes”); Section 3, the 2nd paragraph (“…A DAG must have at least one source and at least one sink. In a DAG structure nodes are partially ordered: a node i comes before a node j if it exists a directed path from i to j…The data structure that we have devised for encoding DAG structures, called multilist (ML), consists of two basic lists. The first one, called main list, contains all the nodes of the DAG…To each node of the main list is associated a second list called sublist, representing the out going connections among that node and the other nodes in the DAG”).  Note that Fig. 2 provides examples of the DAG structure, with each node being a classifier.  The example in Fig. 2(b) has two root nodes.  That each classifier is a machine learning model is disclosed by Zhang above.  Note further that Fig. 1 and Section 2 teach how to combine classifier results at each non-root node]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify Zhang with the teaching of De Stefano as set forth above.  The reasons for doing so at least would have been that combining classifiers has been shown to be effective, as De Stefano indicated in the abstract and the 1st paragraph of Section I. Introduction.

Regarding claim 3 (and similarly claim 10), Zhang further discloses:
wherein each of the plurality of ML models is one of, a supervised model a semi-supervised model and an unsupervised model
[Paragraph 56 (“Deep learning is part of a broader family of machine learning methods based on learning representations of data…One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning”).  Note that an ML model is either supervised, unsupervised or semi-supervised (to various degree of “semi”)]

Regarding claim 4 (and similarly claim 11), Zhang further discloses:
wherein the plurality of modalities includes at least one of: X- ray imaging, Inner-Crack-Imaging (ICI), grayscale imaging, black and white imaging, and color imaging
[Fig. 1 and paragraph 38 (“The one or more detection channels may include…charge coupled devices (CCD), time delay integration (TDI) cameras, and any other suitable detectors known in the art”).  Note that a CCD camera was known to be able to capture images in color, grayscale or black-and-white and which to capture is a design choice]

Regarding claim 5 (and similarly claim 12), De Stefano further discloses:
wherein the Polytree DAG architecture are multi-level Polytree DAG architecture
[Fig. 2 ]

Regarding claim 13, Zhang further discloses:
wherein the plurality of labelled images comprises labels related to the one or more defect classes,
[Paragraph 68 (“…The image classification may have any suitable format (such as…defect ID, a defect description”).  Note that the defect ID indicates a defect class]
wherein the plurality of labelled images is generated using historical images of the semiconductor wafer
[Figs, 1. 3 (refs. 302, 306, 328) and paragraphs 25 (“…he system includes imaging tool 10 configured for generating images of a specimen”), 26 (“…the specimen is a wafer”), 95 (“…The deep learning model development workflow may include data collection 302 from imaging tool 300”), 96 (“The deep learning model development workflow may also include data labeling 306”), 98 (“Training data 322 may be input to model training 328”).  Note that the labelled images are generated at 306, prior to being used for training at 328, and therefore are historical images]

Regarding claim 15 (and similarly claim 20), De Stefano further discloses:
wherein post -processing includes accurately classifying the plurality of images into the one or more defect classes using the classification information from each of the plurality of ML models
[Fig. 1; Abstract (“”Combining classifier methods have shown their effective-ness in a number of applications.”); Section 2, the 2nd paragraph (“…Once this conditional probability has been learned, the combiner provides the output for each unknown input sample, as the most probable class given the expert observations, by the following expression:…(1)…where C is the set of classes”).  Note further that Fig. 1 and Section 2 teach how to combine classification results from multiple classifiers such as ML models trained to perform classification]

Regarding claim 17, Zhang further discloses:
wherein the one or more imaging units include, at least one of, an Automated Optical Inspection (AOI) apparatus, an Automated X-ray Inspection (AXI) apparatus, a Joint Test Action Group (JTAG) apparatus, and an In-circuit test (ICT) apparatus
Fig. 1 and paragraphs 27 (“…the imaging tool is configured as an optical based imaging tool. In this manner, in some embodiments, the images are generated by an optical based imaging tool”), 34 (“…the imaging tool shown in FIG. 1 includes two detection channels, one formed by collector 24, element 26, and detector 28 and another formed by collector 30, element 32, and detector 34”), 40 (“Computer subsystem 36 of the imaging tool may be coupled to the detectors of the imaging tool…”), 43 (“… the imaging tool may be configured as an electron beam based imaging tool”), 49 (“…the imaging tool may be an ion beam based imaging tool…such as…focused ion beam (FIB) systems, helium ion microscopy (HIM) systems, and secondary ion mass spectroscopy (SIMS) systems”), 52 (“…the optical and electron beam imaging tools described herein may be configured as inspection tools”).  Note that while not expressly disclosed, automating image acquisition was well known prior to the effective filing date of the claimed invention to achieve an acquisition speed that cannot be achieved manually]

>>><<<
Claims 2, 9, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928) and De Stefano et al. (“Learning Bayesian Networks by Evolution for Classifier Combination,” 10th International Conference on Document Analysis and Recognition; Date of Conference: 26-29 July 2009) as applied to claims 1, 3-5, 7, 8, 10-13, 15-17 and 20 above, and further in view of Hasan et al. (US 2021/0151034).

Regarding claim 2 (and similarly claim 9), the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 1 but not expressly the following, which is taught by Hasan:
wherein the one or more ML models are provided with the plurality of images and the plurality of labelled images belonging to an imaging modality from the plurality of imaging modalities
[Figs. 1, 4 and paragraphs 31 (“Subnetwork 102 may be trained with respect to textual modality. Subnetwork 104…audio modality. Subnetwork 106…visual modality. Training the subnetworks 102, 104, and 106 independently may enable the system 100 a better learning of intra-modal dynamics”), 58-60.  Note that the applied teaching is to train different machine learning models (e.g., subnetworks) on data different modalities.  That the modalities are imaging modalities is disclosed by Zhang, per the analysis of claim 1 above, especially paragraph 38]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Hasan as set forth above.  The reasons for doing so at least would have been that training ML models (e.g., subnetworks) independently may enable the system a better learning of intra-modal dynamics, as Hasan indicated in paragraph 31.

>><<
Regarding claim 14 (and similarly claim 19), the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 8 (respectively, claim 16) but not expressly the following, which is taught by Hasan:
wherein one of a late fusion technique or an early fusion technique or a hybrid fusion technique is used to combine features extracted from the plurality of modalities
[Fig. 1 and paragraphs 29 (“…The subnetworks 102, 104, and 106 may each be independently trained to determine…different modalities”), 30 (“…Features 120 may be textual features…Features 121…audio features…Features 123…visual features”), 31 (“Subnetwork 102…trained with respect to textual modality. Subnetwork 104…audio modality. Subnetwork 106…visual modality”), 32 (“…The last hidden layer at the end of each of the subnetworks 102, 104, and 106 may be used as input to an attention block 108”), 36 (“At 130, the hidden layer of the subnetworks 102, 104, and 106 may be multiplied…by its respective weight determined by the attention module 109. The results may be concatenated together…and passed through a dense layer 110…then through a softmax layer 111. The softmax layer 111 may provide a determination of an emotion expressed in the content item 101”).  Note that 108-111 of Fig. 1 in combination carry out fusion]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Hasan as set forth above.  The reasons for doing so at least would have been that training ML models (e.g., subnetworks) independently may enable the system a better learning of intra-modal dynamics, as Hasan indicated in paragraph 31.

>>><<<
Claims 6 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928) and De Stefano et al. (“Learning Bayesian Networks by Evolution for Classifier Combination,” 10th International Conference on Document Analysis and Recognition; Date of Conference: 26-29 July 2009) as applied to claims 1, 3-5, 7, 8, 10-13, 15-17 and 20 above, and further in view of Hertzmann et al. (US 2017/0220903).

Regarding claim 6, the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 1 and additionally the following:
wherein the plurality of labelled images comprises labels related to the one or more defect classes,
[Zhang: Paragraph 68 (“…The image classification may have any suitable format (such as…defect ID, a defect description”).  Note that the defect ID indicates a defect class]

	The combined invention does not expressly disclose the following, which is taught by Herzmann:
wherein the plurality of labelled images is generated using a labelling model
[Figs. 1, 2 and paragraphs 43 (“…once trained the labeling model 112 is used to recognize these patterns and assign corresponding labels”), 44 (“…labeling module 214 employs the labeling model 112 to process an input of a subsequent image 216 to generate a labeled image 218”)]

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Herzmann as set forth above.  The reasons for doing so at least would have been to reduce labeling errors caused by manual labeling, as Herzmann indicated in paragraphs 3 and 4.

>><<
Regarding claim 18, the combined invention of Zhang and De Stefano discloses all limitations of its parent claim 16 and additionally the following:
wherein the computing unit receives the plurality of labelled images comprising labels related to the one or more defect classes, 
[Zhang: Paragraph 68 (“…The image classification may have any suitable format (such as…defect ID, a defect description”).  Note that the defect ID indicates a defect class]
(wherein the labelling model) generates the plurality of labelled images using historical images of the semiconductor wafer
[Zhang: Figs, 1. 3 (refs. 302, 306, 328) and paragraphs 25 (“…he system includes imaging tool 10 configured for generating images of a specimen”), 26 (“…the specimen is a wafer”), 95 (“…The deep learning model development workflow may include data collection 302 from imaging tool 300”), 96 (“The deep learning model development workflow may also include data labeling 306”), 98 (“Training data 322 may be input to model training 328”).  Note that the labelled images are generated at 306, prior to being used for training at 328, and therefore are historical images.  Note further that the applied teaching is to use historical images to generate labelled images.  That the labelled images are generated using a labelling model is taught by Hermann; see the analysis below]

The combined invention does not expressly disclosed the following, which is taught by Hermann:
(that the labelled images are received) from a labelling model,

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Herzmann as set forth above.  The reasons for doing so at least would have been to reduce labeling errors caused by manual labeling, as Herzmann indicated in paragraphs 3 and 4.

Conclusion and Contact Information

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Suermondt et al. (US 6,947,936)—[Col. 9, lines 38-40 (“The present invention is also applicable…where a node can have multiple parents (a poly-tree)”)]

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUBIN HUNG whose telephone number is (571)272-7451. The examiner can normally be reached M-F 7:30-16:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/YUBIN HUNG/Primary Examiner, Art Unit 2662                                                                                                                                                                                                        August 13, 2022