DETAILED ACTION
Claims 1-20 are currently presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on January 11, 2021 has been entered and considered by the examiner. By the amendment, claims 1, 3-11 and 13-20 are amended.
In light of amendments made, the 101 rejection of the claims is Withdrawn.

                                                                Response to Arguments 
5.            Following Applicants arguments, the prior art rejection of the claims is Maintained with more detailed explanation.

Applicant’s arguments on page 14-15

Examiner rejected claims 1-3, 6, 8, 11-13, 16 and 18 under 35 U.S.C. § 102(a)(2) as being anticipated by Zhu. Zhu explicitly discloses in par. [0079] that “[the] prediction input data may be in different forms depending on different functions of neural network models” (see Zhu, par. [0079]). Applicant respectfully points out that different neural network having different inputs merely indicates expected set of inputs for a particular type of neural network. Exemplary neural networks and their corresponding inputs is not the same as the Zhu system determining a plurality of AI models representative of the applications that may run on a piece of hardware since no such determination operation is executed by the Zhu system.




Examiner response

Examiner cited the new art Chandra et al., (PUB NO: US 2019/0266015 A1) for the amended limitations. 
In the similar field of invention, Chandra teaches determining a plurality of artificial intelligence (AI) models that are representative of applications that run on a piece of hardware, wherein the piece of hardware is a hardware accelerator for processing Al-related operations; (see para 32-33 and see fig 1- FIG. 1 is a system diagram of a framework 100 that supports DNN workloads on an edge device in accordance with respective examples. The framework 100 includes one or more cameras 102A, 102B, 102C that provide image data, e.g., streams, to DNN models 112A, 112B, and 112C. The DNN models 112A-112C may then be ran on various streams to create DNN workloads. Each DNN model 112A-112C has a given architecture and framework 114. After the DNN model has been downloaded to a local device available to the edge device, the framework 100 determines the DNN model's resource requirements using a profiler 110. The resource requirements may include how much how much time the DNN model takes to run on a number of CPU cores or GPU cores under different utilizations.)
 
Regarding applicant arguments 15-16
Assuming arguendo that “the im2col+GEMM solution” of Zhu supposedly “determines the workload of the respective layer of the AI model” (see the Office Action, page 10), Applicant respectfully notes that the Zhu system does not determine supposed workload for a respective layer of a respective neural network cited in par. [0079] of Zhu. Applicant, therefore, respectfully submits that Zhu does not disclose “determining computational workloads of the plurality of AI models based on layer information associated with a respective layer of a respective AI model in the plurality of AI models.”
Furthermore, Examiner has not shown that “the im2col+GEMM solution” of Zhu is applied to “a respective layer of a respective AI model in the plurality of AI models” of Zhu. Hence, Applicant 
Furthermore, im2col+GEMM solution is a well-known technique in the development of a convolution layer of an AI model. In particular, GEMM-based algorithms rely on im2col or im2row memory transformations to convert the convolution problem into a GEMM problem. A GEMM (or dense general matrix multiply) problem is merely a form of mathematical problem that can be efficiently executed. Hence, Examiner’s assertion of “the im2col+GEMM solution” of Zhu as the computational workload of the instant application has not complied with MPEP 2111, which states “[the] broadest reasonable interpretation does not mean the broadest possible interpretation. Rather, the meaning given to a claim term must be consistent with the ordinary and customary meaning of the term.” Since any rejection violating MPEP 2111 is improper, Applicant respectfully submits that Examiner’s rejection using Zhu is improper and should be withdrawn.

On pages 11-12 of the Office Action, Examiner alleged that the supposed connected target network layer can be interpreted as the set of workload clusters of the instant application. Applicant respectfully points out that the workload clusters of the instant applications indicate clusters of the determined workloads (see the instant application, pars. [0034]-[0038]). By alleging the layers of a neural network as the workload clusters, Applicant respectfully submits that Examiner has (i) violated MPEP 2111, and (ii) failed to show clusters of supposed im2col+GEMM solutions of Zhu. In contrast, the instant application teaches “clustering the determined computational workloads into a set of workload clusters.” Here, the system of the instant application clusters the determined workload, which is not the same as a layer of an AI model, into a set of clusters.

For the same reason, the target neural network model of Zhu is merely a neural network that does not consider the computational workloads of a plurality of other neural networks. Therefore, Applicant 

Examiner response
Examiner cited another reference for this limitation. 
Examiner cited the new art FU (PUB NO: US 20190325276 A1) for the amended limitations.
Fu teaches determining computational workloads of the set-plurality of AI models based on layer information associated with a respective layer of a respective AI model in the set-plurality of AI models; (See para 39- In operation 210, the neural network training system can collect device data from a plurality of devices communicatively coupled in a network and divide the computational workload of a deep learning neural network comprised within, for image recognition. In embodiments, the plurality of devices can comprise an IoT environment (e.g., IoT environment 100 of FIG. 1). Computational workload can include, but is not limited to, the amount of processing required for an input image to be processed through convolutional layers (e.g., forward pass derivative, backpropagation, dropout layers, etc.) and sub-sampling layers (e.g., pooling, max-pooling) and/or fully connected layers and Gaussian connections. The computational workload can be divided by components of the neural network training systems (e.g., image recognition model) and can further separate into two separate training layers comprising a first feature abstraction layer with one or more sub -layer iteration sequences between one or more convolutional layers and one or more sub-sampling layers and a second classification layer with one or more sub -layer iteration sequences between one or more fully connected layers and one or more Gaussian connection.


FU also teaches clustering the determined computational workloads into a set of workload clusters, wherein a respective workload cluster indicates computational workloads associated with one or more layers of the plurality of AI models;(see para 32- Image recognition model 106 can further  In embodiments, dataset 302a can be processed through a first application layer (top layer) in neural network layer 304. Depending on the class of image data comprised in subsequent datasets, dataset 302b and dataset 302c can pass through feature abstraction of neural network layer 304. Dataset 302a can comprise image data including generic objects and image classes and can be retrieved from one or more data sources. For purposes of specialized image analysis, subsequent dataset 302b and dataset 302c can be processed through the neural network stacking framework 300). Neural network layer 306b is stacked with neural network layer 304 via stacker 110 of FIG. 1. Stacking neural network layer 306b creates a first stacked neural network layer 308b1. Examiner consider the set of workload clusters as element 306b1, 306bn, 306c1 and 306cn as shown in figure 4 formed by stacking 306b and 306c with network layer 304 with the dataset 304b and 304c. The neural network training system divide the computational workload of a deep learning neural network for each layer. Each dataset is associated with the workload (or image data) for one or more layers of the plurality of neural network models.




Claim Rejections - 35 USC § 103
6.            In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

7.               Claims 1-3, 6, 8-9 11-13, 16 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over ZHU et al., (PUB NO: US 2020/0050939 A1), hereinafter ZHU, in view of FU et al.,(PUB NO: US 20190325276 A1) hereinafter FU and further in view of Chandra et al.,(PUB NO: US 2019/0266015A1) hereinafter Chandra.

Regarding claim 1 and 11
ZHU teaches a computer implemented method, (para 162-method or algorithm steps may be implemented by hardware, a software module executed by a processor, or a combination thereof.) the method comprising:

generating, based on the set of workload clusters, a synthetic AI model configured to generate, for the piece of hardware, a workload that represents statistical properties of the determined workload.(See para 49 Moreover, the connecting structure of the target network layers of the target neural network model corresponds to the connecting structure of the network layers of the initial neural network model. See also para 74 Based on a load model method of the Layer class, the target operation parameter of each network layer recorded in the model parameter file is loaded in the target network layer corresponding to each network layer to obtain the target neural network model, thus implementing deployment of the initial neural network model trained by using the Tensorflow learning framework into the terminal device. see also 121-the target neural network model deployed in the GPU of the terminal device perform calculation on the input data, and determine a classification or recognition result of the input data. See para 89 and 94 - for statistical properties of the workload based on optimized convolution layer for this application.)
                                                
    PNG
    media_image1.png
    138
    431
    media_image1.png
    Greyscale

Examiner note: Examiner consider the target neural network model as the synthetic AI model formed from the connected target network layer (i.e., set of workload cluster).  And the target network model corresponds to the connecting structure of the network layers of the initial neural network model (i.e., set of AI models). Hence, the target neural network model generates the workload on the target neural network layer based on the respective layer of initial neural network model. Examiner consider workload for each layer is different according to the para 80. The workload here is determining the size of the input image of the convolution layer which is observed as the statistical properties of the workload.

ZHU does not explicitly teach
determining computational workloads of the plurality of AI models based on layer information associated with a respective layer of a respective AI model in the plurality of AI models;
clustering the determined computational workloads into a set of workload clusters, where a respective workload cluster indicates computational workload associated with one or more layers of the plurality of AI models;

In the related field of invention, FU teaches 
FU (PUB NO: US 20190325276 A1) teaches determining computational workloads of the set-plurality of AI models based on layer information associated with a respective layer of a respective AI model in the set-plurality of AI models; (See para 39- In operation 210, the neural network training system can collect device data from a plurality of devices communicatively coupled in a network and divide the computational workload of a deep learning neural network comprised within, for image recognition. In embodiments, the plurality of devices can comprise an IoT environment (e.g., IoT environment 100 of FIG. 1). Computational workload can include, but is not limited to, the amount of processing required for an input image to be processed through convolutional layers (e.g., forward pass derivative, backpropagation, dropout layers, etc.) and sub-sampling layers (e.g., pooling, max-pooling) and/or fully connected layers and Gaussian connections. The computational workload can be divided by components of the neural network training systems (e.g., image recognition model) and can further separate into two separate training layers comprising a first feature abstraction layer with one or more sub -layer iteration sequences between one or more convolutional layers and one or more sub-sampling layers and a second classification layer with one or more sub -layer iteration sequences between one or more fully connected layers and one or more Gaussian connection.

FU also teaches clustering the determined computational workloads into a set of workload clusters, wherein a respective workload cluster indicates computational workloads associated with one or more layers of the plurality of AI models;(see para 32- Image recognition model 106 can further included stacker 110. Stacker 110 can be used to load pre-trained neural network layers into respective IoT devices (e.g., IoT mobile device 120, IoT remote device 126, and IoT device 132) and dynamically form a neural network using previous identified pre-trained neural network layers. See also para 47-58- In embodiments, dataset 302a can be processed through a first application layer (top layer) in neural network layer 304. Depending on the class of image data comprised in subsequent datasets, dataset 302b and dataset 302c can pass through feature abstraction of neural network layer 304. Dataset 302a can comprise image data including generic objects and image classes and can be retrieved from one or more data sources. For purposes of specialized image analysis, subsequent dataset 302b and dataset 302c can be processed through the neural network stacking framework 300). Neural network layer 306b is stacked with neural network layer 304 via stacker 110 of FIG. 1. Stacking neural network layer 306b creates a first stacked neural network layer 308b1. 

Examiner note: Examiner consider the set of workload clusters as element 306b1, 306bn, 306c1 and 306cn as shown in figure 4 formed by stacking 306b and 306c with network layer 304 with the dataset 304b and 304c. The neural network training system divide the computational workload of a deep learning neural network for each layer. Each dataset is associated with the workload (or image data) for one or more layers of the plurality of neural network models.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of neural network model deployment as disclosed by ZHU include determining computational workloads of the plurality of AI models based on layer information associated with a respective layer of a respective AI model in the plurality of AI models; and clustering as taught by FU in the system of ZHU for reducing neural network computation complexity in the Internet of Things (IoT). [¶ 001]

The combination of Zhu and FU does not teach determining a plurality of artificial intelligence (AI) models that are representative of applications that run on a piece of hardware, wherein the piece of hardware is a hardware accelerator for processing Al-related operations.

In the similar field of invention, Chandra teaches determining a plurality of artificial intelligence (AI) models that are representative of applications that run on a piece of hardware, wherein the piece of hardware is a hardware accelerator for processing Al-related operations; (see para 32-33 and see fig 1- FIG. 1 is a system diagram of a framework 100 that supports DNN workloads on an edge device in accordance with respective examples. The framework 100 includes one or more cameras 102A, 102B, 102C that provide image data, e.g., streams, to DNN models 112A, 112B, and 112C. The DNN models 112A-112C may then be ran on various streams to create DNN workloads. Each DNN model 112A-112C has a given architecture and framework 114. After the DNN model has been downloaded to a local device available to the edge device, the framework 100 determines the DNN model's resource requirements using a profiler 110. The resource requirements may include how much how much time the DNN model takes to run on a number of CPU cores or GPU cores under different utilizations.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of neural network model deployment as disclosed by ZHU  and FU to include determining a plurality of artificial intelligence (AI) models that are representative of applications that run on a piece of hardware, wherein the piece of hardware is a hardware accelerator for processing Al-related operations as taught by Chandra in the system of ZHU and FU for scheduling 

Regarding claim 2 and 12
ZHU further teaches obtaining the layer information using a collection technique, wherein the collection technique includes one or more of: graphics processing unit (GPU) application programming interface (API) calls, TensorFlow calls, Caffe2, and MXNet. (See para 74 Based on a load model method of the Layer class, the target operation parameter of each network layer recorded in the model parameter file is loaded in the target network layer corresponding to each network layer to obtain the target neural network model, thus implementing deployment of the initial neural network model trained by using the Tensorflow learning framework into the terminal device. Also see also para 94 After the optimized implementation of the convolution layer is performed in the embodiment of this application, based on an IOS system, the implementation speed of the convolution layer of the embodiment of this application is compared with implementation speeds in conventional solutions such as Caffe2, Tensorflow, and ncnn)



Regarding claim 3 and 13
Zhu further teaches generating a set of computational layers such that a respective computational layer corresponds to a workload cluster in the set of workload clusters; (See para 39 The network structure of the initial neural network model generally includes a plurality of interconnected network layers of multiple types. For example, if the initial neural network model is of a CNN network form, an optional network structure of the CNN may include: a normalization layer (BatchNorm), a convolution layer (Convolution), a deconvolution layer (Deconvolution), an activation layer (e.g., rectified linear unit (ReLU)), an additive layer (e.g., Eltwise layer), an activation layer with parameters (e.g., parametric ReLu 


Examiner note: Examiner consider plurality of interconnected layers forms the set of computational layers and each layer corresponds to the respective function (or respective workload) in the set of various functions. (Workload cluster) Examiner consider the function here is image recognition.  For example, a scaling layer known for resizing the image, splicing layer known for concat and the like.

combining the set of computational layers to form the synthetic AI model. (See Para 74 The target network layers are connected by using the Net class, so that the connecting structure of the target network layers corresponds to the connecting structure of the network layers of the initial neural network model. Based on a load model method of the Layer class, the target operation parameter of each network layer recorded in the model parameter file is loaded in the target network layer corresponding to each network layer to obtain the target neural network model)

Examiner note: Examiner consider the target network layers as the set of computational layers which are connected to form the target neural network model (or synthetic AI model).



Regarding claim 6 and 16
ZHU further teaches including a rectified linear unit (ReLU) layer and a normalization layer to a respective computational layer of the set of computational layers in the sysntetic AI model, wherein the computational layer is a convolution layer. (See para 39 -The network structure of the initial neural network model generally includes a plurality of interconnected network layers of multiple types. For example, if the initial neural network model is of a CNN network form, an optional network structure of the CNN may include: a normalization layer (BatchNorm), a convolution layer (Convolution), a deconvolution layer (Deconvolution), an activation layer (e.g., rectified linear unit (ReLU)), an additive layer (e.g., Eltwise layer), an activation layer with parameters (e.g., parametric ReLu (PReLU)), a downsampling layer (Pooling), a scaling layer (Resize), a depthwise separable convolution layer (Depthwise Convolution), and a splicing layer (Concat), and the like. Obviously, the example of the CNN network structure herein is optional, and settings of a specific network structure may further be adjusted according to actual needs) see para 87 conventional im2col is used for computing of the convolution layer of the CNN model in the terminal device[corresponds to the convolutional layer is a computational layer] see para 100 Meanwhile, the embodiment of this application may support abundant layers, and network layers are customized based on the layer class, thus achieving high expansion performance for the network layers of the neural network model. Further, the embodiment of this application can optimize the implementation of the convolution layer based on GPU parallel operating, thus improving the speed of forward prediction of a CNN model. See also para 115 a Net class application module 600, configured to add a network layer to the target neural network model by using a preset network layer adding method of the Net class, where the added network layer is inherited from the Layer class)

Examiner note: Examiner consider the optimization of the convolution layer by adding ReLU layer and normalization layer to the convolution layer by using a preset network layer adding method of the Net class, where convolution layer is a computational layer.




Regarding claim 8 and 18
ZHU further teaches wherein the layer information includes number of filters, filter size, stride information of a respective layer, and padding information associated with the layer of the AI model. (See Para 87 -91-In the conventional implementation of the convolution layer, an efficient method is the im2col+GEMM solution. First, im2col (an algorithm that converts original image data into a matrix) is used to convert the feature maps and filters into a matrix, and then generalized matrix multiplication (GEMINI) is invoked to obtain an inner product the two matrices, so that the convolution operation is transformed into matrix multiplication. In a case of more filters (that is, more channels of feature maps) and a larger filter size, this method achieves higher efficiency.)

Regarding claim 9 and 19
ZHU and FU does not teach wherein a respective computational workload of a workload cluster in the set of workload clusters incorporates an execution frequency of an AI model associated with the workload. 
However, Chandra further teaches wherein a respective computational workload of a workload cluster in the set of workload clusters incorporates an execution frequency of an AI model associated with the workload.( See para 42 - FIG. 2 is a system diagram of a profiler for DNN models in accordance with respective examples. Runtime parameters 200 that used by the profiler include a sampling rate 202, a batch size 204, a precision 206, and a CPU core utilization 208. The batch size 204 is an indication of how many frames/images are processed at one time by a DNN workload. For example, a DNN workload may process 16, 32, 64, 100, etc. frames at once. The profiler 110 uses these inputs 200 and predicts the run time 220 and memory usage 222 of a DNN workload based on the DNN model. The profiler may use a machine learning algorithm, such as a linear regression model, to create a model 210 for a DNN model. Once learned, the performance model 210 may be used to determine an estimate 

Examiner note: Examiner consider the machine learning algorithm, such as a linear regression model determines the respective workload for the DNN model and incorporates the estimated runtime of the DNN model associated with the workload.


8.         Claims 4, 10, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over ZHU et al., (PUB NO: US 2020/0050939 A1), hereinafter ZHU, in view of FU et al.,(PUB NO: US 20190325276 A1) hereinafter FU and further in view of Chandra et al.,(PUB NO: US 2019/0266015A1) hereinafter Chandra and further in view of Bonebakker et al.,(PAT NO: US 7401012 B1) hereinafter Bonebakker.


Regarding claim 4 and 14
ZHU teaches determining, for the computational layer, an input size that corresponds to the representative workload, wherein the input size used in the computational layer of the synthetic AI model generates the representative workload. (See para 79 and 89, in a neural network model having an image recognition function, the prediction input data may be an image feature of an image.)

    PNG
    media_image2.png
    173
    539
    media_image2.png
    Greyscale


Examiner note: Examiner consider the representation workload is the image input size in the convolution layer. The input image size used in the convolution layer of neural network model. (Or AI synthetic model)

 However, ZHU, Fu and Chandra does not clearly teaches determining a representative workload indicative of the workload cluster.

In the related field of invention, Bonebakker teaches determining a representative workload indicative of the workload cluster. (col 2 line 26-28 -the system uses the distance metric to cluster a set of workloads, and then identifies one or more representative workloads for each cluster.) 

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of neural network model deployment as disclosed by ZHU, FU and Chandra to include determining a representative workload indicative of the workload cluster as taught by Bonebakker in the system of ZHU, FU and Chandra for characterizing the computer system workloads which are the important factors in designing computer system. [Col 1 line 64-65]



Regarding claim 10 and 20
ZHU, FU and Chandra does not explicitly teach evaluating, for benchmarking the piece of hardware performance of the piece of hardware by executing the synthetic AI model on the piece of hardware.

In the related field of invention, Bonebakker further teaches evaluating, for benchmarking the piece of hardware performance of the piece of hardware by executing the synthetic AI model on the piece 

Examiner note: Examiner consider the processor as a piece of hardware whose analysis is done by executing performance model (or synthetic AI model) to evaluate the performance. 


9.         Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over ZHU et al., (PUB NO: US 2020/0050939 A1), hereinafter ZHU, in view of FU et al.,(PUB NO: US 20190325276 A1) hereinafter FU and further in view of Chandra et al.,(PUB NO: US 2019/0266015A1) hereinafter Chandra, further in view of Bonebakker et al., (PAT NO: US 7401012 B1) hereinafter Bonebakker, and still further in view of JIANG (PUB NO: US 2020/0005151 A1), hereinafter JIANG.

Regarding 5 and 15
ZHU teaches determining a set of input sizes corresponding to layers of the plurality of AI models; (See para 79 and 89, in a neural network model having an image recognition function, the prediction input data may be an image feature of an image. See also para 100see para 100-the embodiment of this application may support abundant layers, and network layers are customized based on the layer class, thus achieving high expansion performance for the network layers of the neural network model )

    PNG
    media_image2.png
    173
    539
    media_image2.png
    Greyscale


Examiner note: Examiner consider para 89 for determining the image input size in the convolution layer of neural network model. (Or AI model). Under the broadest reasonable sense, abundant layers of the neural network model forms the set of input size.

However, ZHU, FU, Chandra and Bonebakker does not teach
forming a set of input groups of the set of input sizes;
determining a representative input size for a respective input group in the set of input groups; and
adjusting the representative input size for the representative workload to determine the input size.

In the related field of invention, JIANG teaches
forming a set of input groups of the set of input sizes; (para 10 and fig 2a-resizing a convolutional layer input of an artificial neural network with at least two different scales to obtain multiple groups of intermediate features maps also see para 34 The input layer is first resized 210 with S different scales, such as 3 different scales. This results in S groups of intermediate feature maps 202 with different resolutions, for example 32 * 32, 24 * 24, and 16 * 16).  

determining a representative input size for a respective input group in the set of input groups; (see para 34 and fig 2a- These obtained intermediate feature maps 204(or group) are first resized 230 to the same size of the input layer 200)and
Examiner note: Examiner consider the obtained intermediate feature maps 204 as the respective group which are resized to form the representative input size. The representative input size is the same size of the input layer 200 which is formed by resizing. 
and 
adjusting the representative input size for the representative workload to determine the input size. (See page 146 -148 and fig 8 -padding may also be applied upon or after output of the convolutional layer or upon input to the subsequent convolutional layer to match the size of the pad corresponding to the input. The CNN processing apparatus may also generate the output 804 corresponding to a size of an input, of a subsequent layer, to which padding is applied. When the size of the padding-applied input to the subsequent layer is W*H*D, the CNN processing apparatus performs the current convolution operation between the kernel set 803 and the input 802 having a size of W*H*C to generate the output 804 in a size of W*H*D.)
Examiner note: Examiner consider W*H*D as the representative input size and adjusted is made by padding for the convolutional layer in order to determine the input size of W*H*C and the input generates the output 804 in a size of W*H*D.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of neural network model deployment as disclosed by ZHU, FU and Chandra to include forming a set of input groups of the set of input sizes; determining a representative input size for a respective input group in the set of input groups; and adjusting the representative input size for the representative workload to determine the input size as taught by JIANG in the system of ZHU, FU, Chandra and Bonebakker for resizing a convolutional layer input of an artificial neural network with at least two different scales to obtain multiple groups of intermediate features maps, convolving the intermediate feature maps with a filter, resizing the convolution results to the size of the layer input, and concatenating the resized convolution results to form an output of the 


10.             Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over ZHU et al., (PUB NO: US 2020/0050939 A1), hereinafter ZHU, in view of FU et al.,(PUB NO: US 20190325276 A1) hereinafter FU and further in view of Chandra et al.,(PUB NO: US 2019/0266015A1) hereinafter Chandra, still further in view of NG et al.,(PUB NO: US 2019/0355366 A1) hereinafter NG.


Regarding claim 7 and 17
ZHU, Fu and Chandra does not teach forming the synthetic AI model further comprises including a fully connected layer and a softmax layer to the synthetic AI model. 
In the related field of invention, NG teaches forming the synthetic AI model further comprises including a fully connected layer and a softmax layer to the synthetic AI model.(See para 49- A time-delay neural network is a multilayer artificial neural network which is able to classify patterns with shift-invariance and/or model context at each layer of the network. The first five layers of the network 200 operate at frame-level and are fully-connected. Each layer receives the temporally-spliced input from the previous layer. Thus the length of the text window increases with the depth of the network. The sixth layer is a statistical pooling layer, which converts the frame-level information output by its former layer to sentence-level in terms of mean and variance. Two more fully connected layers are then added before a final log -softmax activation layer.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of neural network model deployment as disclosed by ZHU, FU and Chandra to include forming the synthetic AI model further comprises including a fully as taught by NG in the system of ZHU, FU and Chandra for performing speaker recognition using artificial neural network. [¶ 010]


Conclusion

THIS ACTION IS MADE FINAL.  
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
11.      All claims 1-20 are rejected.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 10554738 B1  Ren
Discussing the apparatus that uses the machine learning model to predict workload values of the apparatus and other compute devices.
US 20190156178 A1   Thronton et al.
Discussing data models described as "deep learning" models as a means to detect and classify objects within complex data sets such as image data.
US 20040167765 A1   Abu EL Ata
Discussing generating dynamic representations of the business solution through predictive modeling and providing automated calibration of a predictive model against predefined performance benchmarks.
12.      Any inquiry concerning this communication or earlier communications from the examiner should be directed to PURSOTTAM GIRI whose telephone number is (469)295-9101.  The examiner can normally be reached on 7:30-5:30 PM, Monday to Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached on 5712722589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PURSOTTAM GIRI/Examiner, Art Unit 2128                   

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128