DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Request for Continued Examination filed on 12 February, 2021 and Applicant Amendment and Arguments filed on 28 January, 2021.
Claims 1-8, 10-15 and 17-20 are pending in this application. Claims 9 and 16 are cancelled. 


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 28 January, 2021 has been entered.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed 

Claims 1, 4, 6, 11-15 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar et al. (US Pub. 2016/0378559 A1) in view of YOKOCHI et al. (US Pub. 2015/0268177 A1) and further in view of and Cadambi et al. (US. Pub. 2012/0124591 A1) and Zenoni et al. (US Pub. 2017/0214927 A1).
Bhandarkar and Cadambi were cited in the previous Office Action.

As per claim 1, Bhandarkar teaches the invention substantially as claimed including A system (Bhandarkar, Fig.1) comprising: 
a plurality of worker nodes (Bhandarkar, Fig. 1, 106, Node Y-worker Node, 108 Node Z-worker node), wherein each of the worker nodes includes: at least one CPU running a worker job manager (Bhandarkar, Fig.1, 134 Node Manager; [0024] lines 4-7, The worker node manager 134 is a component of the worker node computer 106 that includes a process executing on the worker node computer (including at least one CPU)), wherein the worker job manager is 5configured to divide a first job into a plurality of tasks (Bhandarkar, [0025] lines 7-8, the node service 136 is deployed as a YARN auxiliary service 138 managed by the worker node manager 134; Fig. 1, 138 Aux service, 142 Local daemon (managed by worker node manager); [0027] lines 4-8, The local daemon 142 performs local spawns (as divide) 146 and 148. The local spawns 146 and 148 launch user processes 150 and 152, respectively. The user processes 150 and 152 can perform different portions of the job (as plurality of tasks of received job) allocated to the worker node computer 106); and
a master node in electronic communication with the plurality of the worker nodes (Bhandarkar, Fig. 1, 104 Master node; [0021] lines 10-14, The application master 118 is a component of the master node computer 104 that includes a process executed on the master node computer 104 that manages task scheduling and execution of the user program 112 on multiple worker node computers (as electronic communication with plurality of the worker nodes), wherein the master node receives a user program (Bhandarkar, Fig. 4, 402, Receiving, from a client computer and by a master node manager executing on a master node computer…a user program configured to execute in an environment), and wherein 15the master node is configured to: 
divide the user program into at least the first job (Bhandarkar, [0023] lines 18-19, the HNP 124 divides the user program 112 into tasks (as at least the first job) that execute on worker node computers 106 and 108); and 
distribute the first job to one of the worker nodes (Bhandarkar, [0023] lines 19-22, The HNP 124 assigns the tasks to the worker node computers 106 and 108, and maps the containers of the worker node computers 106 and 108 to the respective tasks).

Bhandarkar fails to specifically teach the plurality of tasks includes defect detection and defect classification, and the received user program is an input image data about a semiconductor wafer or reticle.

However, YOKOCHI teaches wherein the plurality of tasks includes defect detection and defect classification, and the received user program is an input image data about a semiconductor wafer or reticle (YOKOCHI, [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60. (As the pixel image of the surface of the wafer is received); [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected; [0039] lines 1-3, the region of the wafer 60 is divided into an outer peripheral portion 61 and an inner peripheral portion; [0040] lines 1-3, the outer peripheral portion 61 and the inner peripheral portion 62 of the wafer 60 are divided into eight equal fan-shaped portions; [0054] lines 3-4, The inspection target is a cell portion 60c within the wafer 60. [0055] lines 1-2, The cell portion 60c is inspected by the defect detection device 100; [0058] lines 1-2, The cell portion 60c is classified using a characteristic quantity of the inspection signal; also see Abstract, lines 1-3, defect detection method includes inspecting an inspection target, classifying the inspection target; [Examiner noted: receiving a pixel image of the wafer, and the wafer is divided into different portions for defect detection and defect classification]). 

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar with YOKOCHI because YOKOCHI’s teaching of performing defect detection and classification for the received wafer image data would have provided Bhandarkar’s system with the advantage and capability to allow the system to easily determining and detecting the different defect portions of the semiconductor wafer which improving the system efficiency.

Both Bhandarkar and YOKOCHI fail to specifically teach wherein each of the worker nodes further includes at least one GPU in electronic communication with the CPU; wherein each of the worker job managers includes a module with a deep learning model, wherein the deep learning model is configured to determine whether to assign one of 10the plurality of tasks to one of the CPU instead of one of the GPU in the worker node or whether to assign one of the plurality of tasks to one of the GPU instead of one of the CPU in the worker node such that completion time of the plurality of tasks is minimized; wherein the deep learning model is further configured to assign each of the plurality of tasks of the first job to one of the CPU or one of the GPU in the worker node; and wherein each of the worker job managers is further configured to prioritize the plurality of tasks in the first job ahead of tasks in a later job.

However, Cadambi teaches wherein each of the worker nodes further includes at least one GPU in electronic communication with the CPU (Cadambi, Fig. 2, 130, worker 1-3 (as worker nodes), 245 GPU; 132 Node-level dispatcher; [0017] lines 11-13, Each worker node includes at least one central processing unit (CPU) and at least one GPU, lines 15-17, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs); [Examiner noted: the Node-level dispatcher is running in the worker node by using the CPU and is dispatching the tasks to local resources (GPU), therefore, the GPU is in electronic communication with the CPU]);
wherein each of the worker job managers includes call interception module and dispatcher, wherein the call interception module and dispatcher are configured to determine whether to assign one of 10the plurality of tasks to one of the CPU instead of one of the GPU in the worker node or whether to assign one of the plurality of tasks to one of the GPU instead of one of the CPU in the worker node such that completion time of the plurality of tasks is minimized (Cadambi, Fig. 2, 132, node-level dispatcher, 230 Call interception module, 235 Dispatcher; [0017] lines 15-18, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs) as directed by the cluster manager (as determining whether to assign the tasks to CPUs or GPUs based on the direction of the cluster manager); [0056] lines 5-7, tasks are dispatched to the respective CPU or GPU resource. Dispatching may be performed by dispatcher; [0005] lines 8-10, In a client-server application, an important metric is response time, or the latency per request. Latency per request can be improved by using, e.g., GPUs; also see [0018] lines 14-18, a dynamic data collection for building CPU/GPU performance models is performed for each new application to find suitable resources for that application and optimize performance by estimating performance of a task of an application on the heterogeneous resources; [0023] lines 5-8, provides scheduling of application tasks onto heterogeneous resources to allow each task to achieve its desired Quality of Service (QoS). In the present application, the QoS refers to client request response time. [Examiner noted: the latency/response time are improved, therefore the completion time of tasks is minimized]); and
wherein the call interception module and dispatcher are further configured to assign each of the plurality of tasks of the first job to one of the CPU or one of the GPU in the worker node (Cadambi, Fig. 4, 430, Dispatch task to CPU or GPU [0017] lines 15-17, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs); [0056] lines 5-7, tasks (as plurality of tasks) are dispatched to the respective CPU or GPU resource. Dispatching may be performed by dispatcher).
wherein each of the worker job managers is further configured to prioritize the plurality of tasks in the first job ahead of tasks in a later job (Cadambi, [0048] lines 15-20, All other applications 110 with lesser priority metrics than the selected application's priority metric should also exceed in providing their respective performance threshold…the user request within the application 110 may be serviced in first in, first out order (as prioritize)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar and YOKOCHI with Cadambi because Cadambi’s teaching of determining and assigning the tasks to whether CPUs or GPUs would have provided Bhandarkar and YOKOCHI’s system with the advantage and capability to allow the system to perform different types of tasks in either CPU or GPU which improving the system resource utilization, efficiency and performance.

	Although, Bhandarkar, YOKOCHI and Cadambi teach each worker job managers includes call interception module and dispatcher to determine whether to assign one of 10the plurality of tasks to one of the CPU instead of one of the GPU and assign each of the plurality of tasks of the first job to one of the CPU or one of the GPU in the worker node, Bhandarkar, YOKOCHI and Cadambi fail to specifically teach the call interception module and dispatcher are a module with a deep learning model, and the determination of whether to assign one of 10the plurality of tasks and assigning each of the plurality of tasks of the first job is performed by the deep learning model.

However, Zenoni teaches a module with a deep learning model (Zenoni, Fig. 4, 450 Workload allocation system, 453 Neural network; [0006] lines 9-12, a neural network is used to train and refine a model that maps different hardware characteristics and video processing workload characteristics to points; [0031] lines 1-2, In FIG. 4, the workload allocation system includes a neural network 453; [0040] lines 6-9, the neural network may continuously "learn" during operation, which may enable more accurate scoring and assignment of hardware systems to workloads), and 
the determination of whether to assign one of 10the plurality of tasks and assigning each of the plurality of tasks of the first job is performed by the deep learning model. (Zenoni, [0032] lines 1-13, The neural network 453 may include model training and refinement module 454, which may correspond to hardware circuits of the workload allocation system 450…During a training period, the model training and refinement module 454 may receive training examples 451…each training example may include hardware characteristics of one or more hardware systems, workload characteristics of a video processing workload, and whether the one or more hardware systems were sufficient to execute the video processing workload; [0036] lines 4-10, the neural network 453 may receive video processing workload information 462, determine workload score for the video processing workload, and assign one or more hardware systems (which may be local and/or cloud systems) to the video processing workload based on the calculated workload score (as the neural network (as deep leaning model) determining whether to assign the workload and assigning the workload)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI and Cadambi with Zenoni because Zenoni’s teaching of using the neural network learning model for determining and assigning the tasks based on the calculated workload score would have provided Bhandarkar, YOKOCHI and Cadambi’s system with the advantage and capability to improve the system resource utilization and efficiency.

As per claim 4, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar further teaches wherein the master node is further configured to divide the user program into a second job and to distribute the second job to one of the worker nodes (Bhandarkar, [0004] lines 5-7, The distributed parallel computing system includes a master node computer and one or more worker node computers. [0023] lines 18-19, the HNP 124 divides the user program 112 into tasks (as including first and second job) that execute on worker node computers 106 and 108); 
In addition, YOKOCHI teaches the user program is input image data (YOKOCHI, [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60; [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected; [0039] lines 1-3, the region of the wafer 60 is divided into an outer peripheral portion 61 and an inner peripheral portion; [0040] lines 1-3, the outer peripheral portion 61 and the inner peripheral portion 62 of the wafer 60 are divided into eight equal fan-shaped portions; [0054] lines 3-4, The inspection target is a cell portion 60c within the wafer 60).

As per claim 6, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar teaches further comprising at least one CPU worker node in electronic communication with the master node (Bhandarkar, Fig.1, 104 Master node, 106 worker node (as CPU worker node) (the master node is electronically communicating with worker node for distributing tasks); wherein the CPU worker node includes one or more of the 5CPU without any of the GPU, and wherein one of the CPU in the CPU worker node runs the worker job manager (Bhandarkar, [0054] lines 2-8, the scheduler allocates (406) containers to the user program. Each container is a portion of the computing resources available to the user program at a worker node computer of the parallel computing system. The container can include context information, e.g., a working directory, resource (CPU or memory) (as include one or more CPU without GPU); [0024] lines 4-7, The worker node manager 134 is a component of the worker node computer 106 that includes a process executing on the worker node computer (as CPU for executing the worker node manager)).

As per claim 11, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. YOKOCHI further teaches wherein the master node is in electronic communication with a processing tool, wherein the processing tool includes a semiconductor inspection tool or a semiconductor metrology tool (YOKOCHI, Fig. 1, 20 Control unit (as master node), 10 inspection unit; [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60; [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected).

As per claim 12, Bhandarkar teaches the invention substantially as claimed including A method comprising: 
receiving a user program at a master node (Bhandarkar, Fig. 4, 402, Receiving, from a client computer and by a master node manager executing on a master node computer…a user program configured to execute in an environment); 
dividing, using the master node, the user program into at least a first job (Bhandarkar, [0023] lines 18-19, the HNP 124 divides the user program 112 into tasks (as at least a first job) that execute on worker node computers 106 and 108);  21Attorney Docket No.: 078697.00123 (P5088) 
distributing, using the master node, the first job to a first worker node of a plurality of worker nodes in electronic communication with the master node (Bhandarkar, Fig. 1, 104 Master node; 106, Node Y-worker Node, 108 Node Z-worker node; [0021] lines 10-14, The application master 118 is a component of the master node computer 104 that includes a process executed on the master node computer 104 that manages task scheduling and execution of the user program 112 on multiple worker node computers (as electronic communication with plurality of the worker nodes); [0023] lines 19-22, The HNP 124 assigns the tasks to the worker node computers 106 and 108, and maps the containers of the worker node computers 106 and 108 to the respective tasks), 
wherein each of the worker nodes includes: at least one CPU running a worker job manager (Bhandarkar, Fig.1, 134 Node Manager; [0024] lines 4-7, The worker node manager 134 is a component of the worker node computer 106 that includes a process executing on the worker node computer(including at least one CPU); and 
dividing, using the worker job manager in the first worker node, the first job into a plurality of tasks (Bhandarkar, [0025] lines 7-8, the node service 136 is deployed as a YARN auxiliary service 138 managed by the worker node manager 134; Fig. 1, 138 Aux service, 142 Local daemon (managed by worker node manager); [0027] lines 4-8, The local daemon 142 performs local spawns (as divide) 146 and 148. The local spawns 146 and 148 launch user processes 150 and 152, respectively. The user processes 150 and 152 can perform different portions of the job (as plurality of tasks of first job) allocated to the worker node computer 106); 

Bhandarkar fails to specifically teach the received user program is an input image data from a semiconductor inspection tool or a semiconductor metrology tool, wherein the input image data is about a semiconductor wafer or reticle, and wherein the plurality of tasks includes defect detection and defect classification.

However, YOKOCHI teaches the received user program is an input image data from a semiconductor inspection tool or a semiconductor metrology tool, wherein the input image data is about a semiconductor wafer or reticle and wherein the plurality of tasks includes defect detection and defect classification (YOKOCHI, Fig. 1, 10 inspection unit; [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60. (As the pixel image of the surface of the wafer is received); [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected; [0039] lines 1-3, the region of the wafer 60 is divided into an outer peripheral portion 61 and an inner peripheral portion; [0040] lines 1-3, the outer peripheral portion 61 and the inner peripheral portion 62 of the wafer 60 are divided into eight equal fan-shaped portions; [0054] lines 3-4, The inspection target is a cell portion 60c within the wafer 60. [0055] lines 1-2, The cell portion 60c is inspected by the defect detection device 100; [0058] lines 1-2, The cell portion 60c is classified using a characteristic quantity of the inspection signal; also see Abstract, lines 1-3, defect detection method includes inspecting an inspection target, classifying the inspection target; [Examiner noted: receiving a pixel image of the wafer, and the wafer is divided into different portions for defect detection and defect classification]).).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar with YOKOCHI because YOKOCHI’s teaching of performing defect detection and classification for the received wafer image data would have provided Bhandarkar’s system with the advantage and capability to allow the system to easily determining and detecting the different defect portions of the semiconductor wafer which improving the system efficiency.

Both Bhandarkar and YOKOCHI fail to specifically teach wherein each of the worker nodes further includes at least one GPU in electronic communication with the CPU; determining, using the module in the first worker node, whether to assign one of the plurality of tasks to one of the CPU instead of one of the GPU in the worker node or whether to assign one of the plurality of tasks to one of the GPU instead of one of the CPU in the worker node such that completion time of the plurality of tasks is minimized; assigning, using the module in the first worker node, each of the plurality of tasks in the first 10job to one of the CPU or one of the GPU in the first worker node to minimize completion time of the plurality of tasks; and prioritizing, using the worker job manager in the first worker node, the plurality of tasks in the first job ahead of tasks in a later job.

However, Cadambi teaches wherein each of the worker nodes further includes at least one GPU in electronic communication with the CPU (Cadambi, Fig. 2, 130, worker 1-3 (as worker nodes), 245 GPU; 132 Node-level dispatcher; [0017] lines 11-13, Each worker node includes at least one central processing unit (CPU) and at least one GPU, lines 15-17, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs); [Examiner noted: the Node-level dispatcher is running in the worker node by using the CPU and is dispatching the tasks to local resources (GPU), therefore, the GPU is in electronic communication with the CPU]);
determining, using the node-level dispatcher [work job manager] in the first worker node, whether to assign one of the plurality of tasks to one of the CPU instead of one of the GPU in the worker node or whether to assign one of the plurality of tasks to one of the GPU instead of one of the CPU in the worker node such that completion time of the plurality of tasks is minimized (Cadambi, Fig. 2, 132, node-level dispatcher, 230 Call interception module, 235 Dispatcher; [0017] lines 15-18, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs) as directed by the cluster manager (as determining whether to assign the tasks to CPUs or GPUs based on the direction of the cluster manager); [0056] lines 5-7, tasks are dispatched to the respective CPU or GPU resource. Dispatching may be performed by dispatcher; [0005] lines 8-10, In a client-server application, an important metric is response time, or the latency per request. Latency per request can be improved by using, e.g., GPUs; also see [0018] lines 14-18, a dynamic data collection for building CPU/GPU performance models is performed for each new application to find suitable resources for that application and optimize performance by estimating performance of a task of an application on the heterogeneous resources; [0023] lines 5-8, provides scheduling of application tasks onto heterogeneous resources to allow each task to achieve its desired Quality of Service (QoS). In the present application, the QoS refers to client request response time. [Examiner noted: the latency/response time are improved, therefore the completion time of tasks is minimized]);
assigning, using the node-level dispatcher [work job manager] in the first worker node, each of the plurality of tasks in the first 10job to one of the CPU or one of the GPU in the first worker node to minimize completion time of the plurality of tasks (Cadambi, Fig. 4, 430, Dispatch task to CPU or GPU; [0017] lines 15-17, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs) as directed by the cluster manager; [0056] lines 5-7, tasks are dispatched to the respective CPU or GPU resource. Dispatching may be performed by dispatcher; [0005] lines 8-10, In a client-server application, an important metric is response time, or the latency per request. Latency per request can be improved by using, e.g., GPUs; also see [0018] lines 14-18, a dynamic data collection for building CPU/GPU performance models is performed for each new application to find suitable resources for that application and optimize performance by estimating performance of a task of an application on the heterogeneous resources; [0023] lines 5-8, provides scheduling of application tasks onto heterogeneous resources to allow each task to achieve its desired Quality of Service (QoS). In the present application, the QoS refers to client request response time); and
prioritizing, using the worker job manager in the first worker node, the plurality of tasks in the first job ahead of tasks in a later job (Cadambi, [0048] lines 15-20, All other applications 110 with lesser priority metrics than the selected application's priority metric should also exceed in providing their respective performance threshold…the user request within the application 110 may be serviced in first in, first out order).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar and YOKOCHI with Cadambi because Cadambi’s teaching of determining/assigning the tasks to whether CPUs or GPUs and prioritizing the tasks would have provided Bhandarkar and YOKOCHI’s system with the advantage and capability to allow the system to perform different types of tasks in either CPU or GPU which improving the system resource utilization, efficiency and performance.

Although, Bhandarkar, YOKOCHI and Cadambi teach the worker job managers (node-level dispatcher) to determine whether to assign one of the plurality of tasks to one of the CPU instead of one of the GPU and assign each of the plurality of tasks in the first job to one of the CPU or one of the GPU in the first worker node, Bhandarkar, YOKOCHI and Cadambi fail to specifically teach the worker job managers includes a module with a deep learning model, and the determination of whether to assign one of the plurality of tasks and assigning each of the plurality of tasks in the first job is performed by the deep learning model.

However, Zenoni teaches a module with a deep learning model (Zenoni, Fig. 4, 450 Workload allocation system, 453 Neural network; [0006] lines 9-12, a neural network is used to train and refine a model that maps different hardware characteristics and video processing workload characteristics to points; [0031] lines 1-2, In FIG. 4, the workload allocation system includes a neural network 453; [0040] lines 6-9, the neural network may continuously "learn" during operation, which may enable more accurate scoring and assignment of hardware systems to workloads), and 
the determination of whether to assign one of the plurality of tasks and assigning each of the plurality of tasks in the first job is performed by the deep learning model (Zenoni, [0032] lines 1-13, The neural network 453 may include model training and refinement module 454, which may correspond to hardware circuits of the workload allocation system 450…During a training period, the model training and refinement module 454 may receive training examples 451…each training example may include hardware characteristics of one or more hardware systems, workload characteristics of a video processing workload, and whether the one or more hardware systems were sufficient to execute the video processing workload; [0036] lines 4-10, the neural network 453 may receive video processing workload information 462, determine workload score for the video processing workload, and assign one or more hardware systems (which may be local and/or cloud systems) to the video processing workload based on the calculated workload score (as the neural network (as deep leaning model) determining whether to assign the workload and assigning the workload)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI and Cadambi with Zenoni because Zenoni’s teaching of using the neural network learning model for determining and assigning the tasks based on the calculated workload score would have provided Bhandarkar, YOKOCHI and Cadambi’s system with the advantage and capability to improve the system resource utilization and efficiency.

As per claim 13, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. Bhandarkar further teaches dividing, using the master node, the user program into a second job (Bhandarkar, [0004] lines 5-7, The distributed parallel computing system includes a master node computer and one or more worker node computers. [0023] lines 18-19, the HNP 124 divides the user program 112 into tasks (as including second job) that execute on worker node computers 106 and 108); 
15distributing, using the master node, the second job to a second worker node of the plurality of the worker nodes (Bhandarkar, Fig. 1, 108 Node Z-worker node (as second worker node), [0023] lines 19-22, The HNP 124 (within the master node) assigns the tasks (including second job) to the worker node computers 106 and 108, and maps the containers of the worker node computers 106 and 108 to the respective tasks (as assigning second job to second worker node respectively));
dividing, using the worker job manager in the second worker node, the second job into a plurality of tasks (Bhandarkar, [0025] lines 7-8, the node service 136 is deployed as a YARN auxiliary service 138 managed by the worker node manager 134; Fig. 1, 138 Aux service, 142 Local daemon (managed by worker node manager); [0027] lines 4-8, The local daemon 142 performs local spawns (as divide) 146 and 148. The local spawns 146 and 148 launch user processes 150 and 152, respectively. The user processes 150 and 152 can perform different portions of the job (as plurality of tasks of received job) allocated to the worker node computer 106; [0028] lines 1-4, Each local daemon, including the local daemon 142 executing on the worker node computer 106 and a local daemon executing on the worker node computer 108 (not shown) [Examiner noted: the node Z (second worker node) also including all the components as node Y and perform the same functionally as node Y]). 
In addition, YOKOCHI teaches the user program is input image data (YOKOCHI, [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60; [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected; [0039] lines 1-3, the region of the wafer 60 is divided into an outer peripheral portion 61 and an inner peripheral portion; [0040] lines 1-3, the outer peripheral portion 61 and the inner peripheral portion 62 of the wafer 60 are divided into eight equal fan-shaped portions; [0054] lines 3-4, The inspection target is a cell portion 60c within the wafer 60), and Cadambi teaches assigning, using the node-level dispatcher [worker job manager] in the second worker node, each of the plurality of tasks in the 20second job to one of the CPU or one of the GPU in the second worker node to minimize completion time of the plurality of tasks in the second job (Cadambi, Fig. 4, 430, Dispatch task to CPU or GPU [0017] lines 15-17, Each worker node includes a node-level dispatcher that intercepts and dispatches the scheduled tasks to local resources (e.g., CPUs and/or GPUs); [0056] lines 5-7, tasks are dispatched to the respective CPU or GPU resource. Dispatching may be performed by dispatcher; [0005] lines 8-10, In a client-server application, an important metric is response time, or the latency per request. Latency per request can be improved by using, e.g., GPUs; also see [0018] lines 14-18, a dynamic data collection for building CPU/GPU performance models is performed for each new application to find suitable resources for that application and optimize performance by estimating performance of a task of an application on the heterogeneous resources; [0023] lines 5-8, provides scheduling of application tasks onto heterogeneous resources to allow each task to achieve its desired Quality of Service (QoS). In the present application, the QoS refers to client request response time). Further, Zenoni teaches when assigning each of the plurality of tasks, it is using the module with a deep learning model (Zenoni, Fig. 4, 450 Workload allocation system, 453 Neural network; [0006] lines 9-12, a neural network is used to train and refine a model that maps different hardware characteristics and video processing workload characteristics to points; [0031] lines 1-2, In FIG. 4, the workload allocation system includes a neural network 453; [0032] lines 1-13, The neural network 453 may include model training and refinement module 454, which may correspond to hardware circuits of the workload allocation system 450…During a training period, the model training and refinement module 454 may receive training examples 451…each training example may include hardware characteristics of one or more hardware systems, workload characteristics of a video processing workload, and whether the one or more hardware systems were sufficient to execute the video processing workload; [0036] lines 4-10, the neural network 453 may receive video processing workload information 462, determine workload score for the video processing workload, and assign one or more hardware systems (which may be local and/or cloud systems) to the video processing workload based on the calculated workload score (as the neural network (as deep leaning model) determining whether to assign the workload and assigning the workload)). [0040] lines 6-9, the neural network may continuously "learn" during operation, which may enable more accurate scoring and assignment of hardware systems to workloads).

As per claim 14, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. Zenoni further teaches wherein the method further comprises retraining the deep learning model (Zenoni, Fig. 4, 450 Workload allocation system, 453 Neural network; [0006] lines 9-12, a neural network is used to train and refine a model that maps different hardware characteristics and video processing workload characteristics to points; [0031] lines 1-2, In FIG. 4, the workload allocation system includes a neural network 453; [0040] lines 6-9, the neural network may continuously "learn" during operation, which may enable more accurate scoring and assignment of hardware systems to workloads).

As per claim 15, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. Cadambi further teaches wherein the worker job managers operate under a first in first out job queue (Cadambi, [0048] lines 15-20, All other applications 110 with lesser priority metrics than the selected application's priority metric should also exceed in providing their respective performance threshold…the user request within the application 110 may be serviced in first in, first out order).

As per claim 19, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. Bhandarkar further teaches wherein the first job is distributed to the first worker node in parallel and in real-time with other jobs from the user program distributed to the plurality of worker nodes (Bhandarkar, [0023] lines 19-22, The HNP 124 assigns the tasks to the worker node computers 106 and 108, and maps the containers of the worker node computers 106 and 108 to the respective tasks; [0029] lines 3-5, the user program 112 is ready to execute in parallel on each worker node computer; [0033] lines 1-3, At runtime, the requirements are expressed as resource allocation constraints enforced through negotiation between a scheduler 120 and a resource manager 114). In addition, YOKOCHI teaches the user program is input image data (YOKOCHI, [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60; [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected; [0039] lines 1-3, the region of the wafer 60 is divided into an outer peripheral portion 61 and an inner peripheral portion; [0040] lines 1-3, the outer peripheral portion 61 and the inner peripheral portion 62 of the wafer 60 are divided into eight equal fan-shaped portions; [0054] lines 3-4, The inspection target is a cell portion 60c within the wafer 60).
	

Claims 2 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar, YOKOCHI, Cadambi and Zenoni, as applied to claim 1 above, and further in view of Zhao et al. (US Pub. 2017/0083364 A1).
Zhao was cited in the previous Office Action.

As per claim 2, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach wherein there are more of the CPU than the GPU in one of the worker nodes.

However, Zhao teaches wherein there are more of the CPU than the GPU in one of the worker nodes (Zhao, Fig. 2A, 101 (as worker node), 102 CPU A, 112, CPU B, 122 GPU (as more CPU than GPU)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Zhao because Zhao’s teaching of the worker node has more CPU than the GPU would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to allow the system to easily assigning and managing the task based on different task queue that managed by CPU which improving the system efficiency and performance.

As per claim 5, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach further comprising another CPU in one of the worker nodes in electronic communication with the CPU running the worker job manager.

However, Zhao teaches further comprising another CPU in one of the worker nodes in electronic communication with the CPU running the worker job manager (Zhao, Fig. 2A, 101 (as worker node), 102 CPU A, 112, CPU B (as another CPU); [0062] lines 1-11,  Each of the processing units 102, 112, 122, 132 may utilize one or more queues (or task queues) for temporarily storing and organizing tasks (and/or data associated with tasks) to be executed by the processing units 102, 112, 122, 132. For example, the first CPU 102 may retrieve tasks and/or task data from task queues 166, 168, 176 for local execution by the first CPU 102 and may place tasks and/or task data in queues 170, 172, 174 for execution by other devices. The second CPU 112 may retrieve tasks and/or task data from queues 174, 178, 180 for local execution by the second CPU 112 [Examiner noted: the first CPU (as running a work job manager) and communicate with second CPU, and allowing the second CPU to retrieve the tasks from task queue (the first CPU placed) for exaction]).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Zhao because Zhao’s teaching of the using another CPU as tasks manager (as worker job manager) would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to allow the system to easily assigning and managing the task based on different task queue that managed by CPU which improving the system efficiency and performance.

Claims 3 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar, YOKOCHI, Cadambi and Zenoni, as applied to claim 1 above, and further in view of Chakraborty et al. (US Pub. 2013/0093776 A1).
Chakraborty was cited in the previous Office Action.

As per claim 3, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach wherein there are more of the GPU than the CPU in one of the worker nodes.

However, Chakraborty teaches wherein there are more of the GPU than the CPU in one of the worker nodes (Chakraborty, Fig. 9, remote server computer (as worker node), 710 compute server (has CPU, see Fig. 1, processing unit 102), 720A-N, graphics server (as GPUs); [0066] lines 1-3, compute server 710 may execute some or all of the components described with respect to computer; [0067] lines 1-4, The graphics server 720 may be configured to provide resources for graphics operations, such as rendering, capturing, and compressing operations. The graphics server may also be configured with a plurality of GPU resources).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Chakraborty because Chakraborty’s teaching of providing plurality of GPU resources in each of the graphics server (as worker node) would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to reduce the image/graphic application processing time which improving the system performance.

As per claim 7, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar teaches at least one worker node in electronic communication with the master node (Bhandarkar, Fig. 1, 104 Master node; [0021] lines 10-14, The application master 118 is a component of the master node computer 104 that includes a process executed on the master node computer 104 that manages task scheduling and execution of the user program 112 on multiple worker node computers (as electronic communication with the worker node).

Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach the worker node is a GPU worker node, and wherein the GPU worker node includes one or more of the GPU without any of the CPU other than to run the worker job manager.

However, Chakraborty teaches the worker node is a GPU worker node, and wherein the GPU worker node includes one or more of the GPU without any of the CPU other than to run the worker job manager (Chakraborty, Fig. 3, 320A-N remote server computer (as GPU worker node); Fig. 9, 710 compute server (has CPU, see Fig. 1, processing unit 102), 720A-N, graphics server (as GPUs); [0066] lines 1-3, compute server 710 may execute some or all of the components described with respect to computer; [0067] lines 1-4, The graphics server 720 may be configured to provide resources for graphics operations, such as rendering, capturing, and compressing operations. The graphics server may also be configured with a plurality of GPU resources; [0068] lines 1-16, The compute server 710 may run one or more applications 712. In one aspect the application may be associated with a graphics device driver (as worker job manager)…As one example, the graphics device driver 714…may be able to send first data to the graphics server manager 740, the first data indicative of a request for GPU resources. The graphics server manager 740 may send second data to the graphics device driver 714, the application 712, and/or the compute server 710, the second data indicating routing for GPU instructions from the graphics server 720; see Fig. 8, computer server 710 has graphics device driver 714 is communicating with graphic server 720).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Chakraborty because Chakraborty’s teaching of providing plurality of GPU resources in each of the graphics server (as worker node) would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to reduce the image/graphic application processing time which improving the system performance.


Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar, YOKOCHI, Cadambi and Zenoni, as applied to claim 1 above, and further in view of Sjodin et al. (US Pub. 2014/0337823 A1).
Sjodin was cited in the previous Office Action.

As per claim 8, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach further comprising an interface layer configured to communicate with an IMC client using an application programming interface.

However, Sjodin teaches an interface layer configured to communicate with an IMC client using an application programming interface (Sjodin, [0003] lines 2-9, This communication is often performed between a server and a client device. In order to assist in communication between a client device and a server, an Application Programming Interface (API) is often used to provide an interface layer at the client).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Sjodin because Sjodin’s teaching of using an interface layer for communicating with client by using application programming interface would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to be able to assist the communication between the server and client which improving the system performance (see, Sjodin, [0003] lines 3-4, In order to assist in communication).


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar, YOKOCHI, Cadambi and Zenoni, as applied to claim 1 above, and further in view of Levine et al. (US Pub. 2017/0334066 A1).
Levine was cited in the previous Office Action.

As per claim 10, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 1 above. Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach further comprising a neural network in electronic communication with the GPU to execute the deep learning model.

However, Levine teaches a neural network in electronic communication with the GPU to execute the deep learning model (Levine, [0002] lines 2-3, deep machine leaning; [0080] lines 6-9, a processor (e.g., a GPU) of training engine 120 (as deep leaning model) and/or other computer system operating over the neural network (e.g., over neural network 125)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Levine because Levine’s teaching of using neural network for communicating with GPU would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to predict the performance of GPU tasks which improving the system performance (See, Levine [0002] the portions of prediction).


Claims 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar, YOKOCHI, Cadambi and Zenoni, as applied to claim 12 above, and further in view of JIN (US Pub. 2016/0260241 A1).
	JIN was cited in the previous Office Action.

As per claim 17, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach wherein the input image data is distributed to the GPUs in equal 5batches.
	
However, JIN teaches wherein the input image data is distributed to the GPUs in equal 5batches (JIN, Fig. 5, first image tile 0 and second image tile 4 are distributed to 122 graphics processor, first image tile 1 and second image tile 6 are distributed to 124 graphics processor (as equally distributed); [0054] lines 1-8, The scheduler 110 may divide at least one draw command into batches having a predetermined unit, and respectively assign the batches to the graphics processors 120. For example, the scheduler 110 may divide one hundred draw commands for one hundred primitives into batches, each having twenty draw command for twenty primitives, and assign the one hundred draw commands as five batches to each of the graphics processors 120).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with JIN because JIN’s teaching of assigning the image data into GPU with equal batches would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to allow the system to equally distributing the tasks which providing the load balancing and system efficiency.

As per claim 18, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. YOKOCHI further teaches wherein the input image data is from multiple wafer locations (YOKOCHI, [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60; [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected; [0039] lines 1-3, the region of the wafer 60 is divided into an outer peripheral portion 61 and an inner peripheral portion; [0040] lines 1-3, the outer peripheral portion 61 and the inner peripheral portion 62 of the wafer 60 are divided into eight equal fan-shaped portions; [0054] lines 3-4, The inspection target is a cell portion 60c within the wafer 60). 

Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach wherein the input image data is processed in a same batch.

However, JIN teaches wherein the input image data is processed in a same batch (JIN, Fig. 5, first image tile 0 and second image tile 4 are distributed to 122 graphics processor (as processes in a same batch); [0007] lines 6-10, determining a tile rendering order for each tile of the first image and the second image based on a result of the determining of the tile of the second image having a highest similarity to the tile of the first image; [0054] lines 9-17, when the scheduler 110 assigns batches to each of the graphics processors 120, the scheduler 110 may assign one batch to one graphics process 120 twice for each of the first and second images. In other words, when assigning a first batch of the batches to a first graphics processor of the graphics processors 120, the scheduler 110 may assign the first batch to the first graphics processor for the first image and the first batch to the first graphics processor for the second image).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with JIN because JIN’s teaching of assigning the image data into GPU with the same batches based on the image data has highest similarity would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to reducing the processing time since the similar image data tasks are grouped together and performed by the same batch which improving the system performance and efficiency.


Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Bhandarkar, YOKOCHI, Cadambi and Zenoni, as applied to claim 12 above, and further in view of Salter (US Pub. 2013/0028506 A1).
Salter was cited in the previous Office Action.

As per claim 20, Bhandarkar, YOKOCHI, Cadambi and Zenoni teach the invention according to claim 12 above. Bhandarkar teaches wherein the first job is distributed to the first worker node (Bhandarkar, [0023] lines 19-22, The HNP 124 assigns the tasks to the worker node computers 106 and 108, and maps the containers of the worker node computers 106 and 108 to the respective tasks). YOKOCHI teaches the first job as the input image data (YOKOCHI, [0022] lines 1-4, The defect detection device 100 acquires a pixel image of the surface of a wafer 60 mounted, for example, on a stage or the like. For example, the inspection unit 10 acquires a pixel image of the surface of the wafer 60; [0035] lines 3-4, the defect detection device 100 and the inspection target is inspected).

Bhandarkar, YOKOCHI, Cadambi and Zenoni fail to specifically teach the first job as the input image data is acquired in memory.

However, Salter teaches the first job as the input image data is acquired in memory (Salter, [0022] lines 25-31, each of the plurality of inspection system 108 may transmit inspection data to an inspection data database 116 of the memory 114 of the control system 101 via a data connection. In this regard, the inspection data may be maintained in the memory 114 and retrieved at a later time by processor 102, allowing the system 100 to perform the various steps of the present invention at any time following inspection of the wafers 105).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bhandarkar, YOKOCHI, Cadambi and Zenoni with Salter because Salter’s teaching of acquiring the input image data which is maintained in the memory would have provided Bhandarkar, YOKOCHI, Cadambi and Zenoni’s system with the advantage and capability to easily retrieve the wafer image data for future processing which improving the system efficiency.

Response to Arguments
Applicant’s arguments with respect to claims 1-8, 10-15 and 17-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954.  The examiner can normally be reached on M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        
/Z.X./Examiner, Art Unit 2195