DETAILED ACTION
Response to Amendment
Claims 21-24, 26-31, 33-37 and 39-43 are pending. Claims 21-24, 26-31, 33-37 and 39-40 are amended. Claims 25, 32 and 38 are canceled. Claims 41-43 are new.
Response to Arguments
Applicant’s arguments, see page 8, filed 26 January, 2022, with respect to the nonstatutory double patenting rejections of claims 21-40, along with the terminal disclaimer filed 18 January, 2022, have been fully considered and are persuasive.  The nonstatutory double patenting rejections of claims 21-24, 26-31, 33-37 and 39-40 have been withdrawn. 
Applicant’s arguments, see pages 8-10, filed 26 January, 2022, with respect to the 35 USC 103 rejections of claims 21-24, 26-31, 33-37 and 39-40, along with the accompanying amendments received on the same date, and along with the examiner amendments below, have been fully considered and are persuasive.  The 35 USC 103 rejections of claims 21-24, 26-31, 33-37 and 39-40 have been withdrawn. 
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Howard Hamilton (Reg.#71,224) on 31 January, 2022.

Claim 21 is changed from:

To: 
21.	(Currently Amended) A server device comprising: a storage device to store a graphics execution environment, the graphics execution environment including a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to: generate output via a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame; extract a feature learned by the first DNN model based on the generated output; generate training data for a  associated with respective layers of the second DNN model, the respective layers including a fully connected layer.

Claim 26 is changed from:
26.	(Currently Amended) The server device as in claim 21, wherein the library of machine learning primitives include primitives to perform tensor convolution, at least one activation function, and a pooling operation.
To:
26.	(Currently Amended) The server device as in claim 21, wherein the library of machine learning primitives includes primitives to perform tensor convolution, at least one activation function, and a pooling operation.

Claim 27 is changed from:
27.	(Currently Amended) The server device as in claim 21, wherein the the one or more primitives to implement the linear algebra subprograms include primitives to perform matrix operations.
To:


Claim 28 is changed from:
28.	(Currently Amended) A non-transitory machine-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: generating output via a first deep neural network (DNN) model via a deep learning framework accelerated via the one or more processors, wherein the first DNN model is a pre-trained DNN model for computer vision that enables context-independent classification of an object within an input video frame and the one or more processors include a general-purpose graphics processor; extracting, via the deep learning framework, a feature learned by the first DNN model; and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature, the second DNN model a context-dependent extension of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework, the one or more primitives to implement linear algebra subprograms.
To:
28.	(Currently Amended) A non-transitory machine-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: generating output via a first deep neural network (DNN) model via a  associated with respective layers of the second DNN model, the respective layers including a fully connected layer.

Claim 34 is changed from:
34.	(Currently Amended) The non-transitory machine-readable medium as in claim 28, wherein the the one or more primitives to implement the linear algebra subprograms include primitives to perform matrix operations.
To:
34.	(Currently Amended) The non-transitory machine-readable medium as in claim 28, wherein the one or more primitives to implement the linear algebra subprograms include primitives to perform matrix operations.


Claim 35 is changed from:
35.	(Currently Amended) A data processing system on a server device, the data processing system included within a graphics execution environment stored on a server device, the data processing system comprising instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of a computing device configured to host the graphics execution environment, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations comprising:
generating output via a first deep neural network (DNN) model via a deep learning framework accelerated via the one or more general-purpose graphics processors, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame and the one or more processors include a general-purpose graphics processor; via the deep learning framework, detecting an output associated with the first DNN model to extract a feature learned by the first DNN model; generating training data based on the output associated with the first DNN; and training, via the deep learning framework, a second DNN model for computer vision using the training data to enable the second DNN model to learn the extracted feature, the second DNN model a context-dependent extension of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework, the one or more primitives to implement linear algebra subprograms.
To:
 associated with respective layers of the second DNN model, the respective layers including a fully connected layer.

Claim 40 is changed from:

To:
40.	(Currently Amended) The data processing system as in claim 35, wherein the one or more primitives to implement the linear algebra subprograms include primitives to perform matrix operations.

Allowable Subject Matter
Claims 21-24, 26-31, 33-37 and 39-43 are allowed.
The following is an examiner’s statement of reasons for allowance: The arguments along with amendments received on 26 January, 2022 and amendments authorized by the attorney on 31 January, 2022 were persuasive.

In particular, examiner notes the reference “Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems” 2019 [does not predate] notes:

    PNG
    media_image1.png
    700
    681
    media_image1.png
    Greyscale
. Upon looking at reference 9 for instance, Chetlur, Sharan, et al.: ”cudnn: Efficient primitives for deep learning.” Published in 2014 and therefore predating this application, the library does not have an implementation for the FC layer. As the second DNN indicated by the current application describes an embodiment has for instance an FC layer (see for instance paragraph 242 of the publication of the application), it is interpreted this second DNN would have the BLAS associated with these layers and therefore would not be taught by prior art such as the Chetlur et al. reference. 

The other references cited herein below either describe the primitives to implement linear algebra subprograms associated with respective layers of the second DNN model, but do not predate the application, and/or do not have sufficient teaching of this limitation:

“High Performance Convolutional Neural Networks for Document Processing”, 2006: Efficiently computing large matrix-vector and matrixmatrix products is a well studied problem. Basic Linear Algebra Subroutines (BLAS) contains subprograms for basic operations on vectors 

“MIOpen: An Open Source Library For Deep Learning Primitives” 2019 [does not predate]: Developing hardware-optimized libraries for most critical and time-sensitive operations is a well-known practice. For linear algebra such libraries are known as BLAS (Basic Linear Algebra Subsystem) and have different implementations for different systems [8], [16]–[18]. In similar spirit different deep learning libraries have been written, to make it easier for client applications to implement different deep learning primitives. Alex Krischevsky’s cuda-convnet is one of the initial libraries to implement convolutions and inspired many others [19], [20]. Chetlur et al. developed cuDNN, a deep neural network library for nVIDIA GPUs [21]. MIOpen falls in this category since it provides a C programming language based API for deep learning primitives. While these libraries aim to accelerate deep learning primitives on GPUs, research also been conducted to improve the performance of inference only loads on different CPUs such as MKL-DNN [20]

“GPU Simulator of Multilayer Neural Network Based on Multi-Valued Neurons”, 2016: To mitigate these problems, several resources exist which provide convenient GPU utilities. The CUBLAS library [33] re-implements the BLAS linear algebra routines for NVIDIA’s CUDA cards, and the PyCUDA & PyOpenCL [34] modules give high and low-level control of GPUs for the
Python interpreted programming language.

CUDNN: EFFICIENT PRIMITIVES FOR DEEP LEARNING” Presentation August, 2016 Approaches to Speedup CNNs •Lower the convolutions into a matrix multiplication •Using BLAS (basic linear algebra subroutines) •Using GPUs (graphic processing units)

US 20200081744 A1 [does not predate]: The computation associated with a linear layer in a neural network is a multiplication between two matrices, also called General Matrix to Matrix Multiplication (GEMM) in Basic Linear Algebra Subprograms (BLAS) terminology. While existing computer systems, for example, heterogeneous systems, can implement GEMM, traditional BLAS implementation on a heterogeneous system are not optimized for the GEMM shapes and sizes used in inferencing with neural networks

US 20190114541 A1: [does not predate] The model parser 110 analyzes a network structure NSI of the DNN including a plurality of layers. For example, the network structure NSI may include specifications and requirements of the DNN (e.g., latency or power), etc. For example, the specifications and requirements of the DNN may include, e.g., layer topology (e.g., depth or branch), network compression scheme (e.g., pruning), types of computing operation for each layer (e.g., BLAS, CONV, pooling or RELU), data property (e.g., format, security, size, types of input source/channel, physical location or virtual location), memory layout for operands of input, kernel/filter and output (e.g., padding, stride or data dimensional property), data compression scheme (e.g., quantization, Lempel Ziv (LZ) or Huffman), etc.

US 20170024849 A1: The HybNet approach partitions the network into a GPU part (beginning convolutional and pooling layers) and a CPU part (ending fully connected layers). The HybNet moves parts of the neural network to the CPU so that the limited GPU global 

US 20160342888 A1: Additionally, GPUs—and in particular GPGPUs—rely heavily on on-chip resources including register file and shared memory to improve data locality and preserve off-chip memory bandwidth. Efficient utilization of such on-chip resources is a non-trivial problem for high-level application developers. In an attempt to avoid low-level code optimization difficulty, Caffe employs a structure wherein the convolutional layer is on top of Nvidia cuBLAS. Unfortunately, such a “BLAS-based” approach cannot be applied to many other layers.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661