DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 (and by dependency claims 2-9) are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11354542. Although the claims at issue are not identical, they are not patentably distinct from each other because Claim 1 of the current application is a broader and reworded/reordered version of claim 1 of US 11354542 B2. 
US 11354542 B2
1. A server device comprising: a storage device to store a graphics execution environment, the graphics execution environment including a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to: generate output via a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame; extract a feature learned by the first DNN model based on the generated output; generate training data for a second DNN model based on the extracted feature; and train a second DNN model based on the extracted feature, the second DNN model a context-dependent extension of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and to train the second DNN model includes to train the second DNN model via one or more primitives provided by the deep learning framework, the one or more primitives to implement linear algebra subprograms associated with respective layers of the second DNN model, the respective layers including a fully connected layer.
Current Application
1. A data processing system on a computing device, the data processing system comprising: one or more storage devices comprising a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations comprising: extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model via the framework, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame; and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including context-dependent data, the second DNN model an update of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework.


Claim 10 (and by dependency claims 11-15) are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11354542. Although the claims at issue are not identical, they are not patentably distinct from each other because Claim 10 of the current application is a broader and reworded version of claim 1 of US 11354542 B2. 

US 11354542 B2
1. A server device comprising: a storage device to store a graphics execution environment, the graphics execution environment including a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to: generate output via a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame; extract a feature learned by the first DNN model based on the generated output; generate training data for a second DNN model based on the extracted feature; and train a second DNN model based on the extracted feature, the second DNN model a context-dependent extension of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and to train the second DNN model includes to train the second DNN model via one or more primitives provided by the deep learning framework, the one or more primitives to implement linear algebra subprograms associated with respective layers of the second DNN model, the respective layers including a fully connected layer.
Current Application
10. A method comprising: extracting, by one or more general-purpose graphics processors of a data processing system, a feature learned by a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame, instructions executed by the general-purpose processor to extract the feature are provided via a deep learning framework, the deep learning framework is provided by a graphics execution environment on a server computing device, and the deep learning framework provides instructions to accelerate deep learning operations; and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including context-dependent data, the second DNN model an update of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework.



Claim 16 (and by dependency claims 17-20) are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 7 of U.S. Patent No. 11354542. Although the claims at issue are not identical, they are not patentably distinct from each other because Claim 16 of the current application is a broader and reworded version of claim 7 of US 11354542 B2. 
7. A non-transitory machine-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: generating output via a first deep neural network (DNN) model via a deep learning framework accelerated via the one or more processors, wherein the first DNN model is a pre-trained DNN model for computer vision that enables context-independent classification of an object within an input video frame and the one or more processors include a general-purpose graphics processor; extracting, via the deep learning framework, a feature learned by the first DNN model; and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature, the second DNN model a context-dependent extension of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework, the one or more primitives to implement linear algebra subprograms associated with respective layers of the second DNN model, the respective layers including a fully connected layer.
16. A non-transitory machine-readable medium storing instructions which, when executed by one or more processors including one or more general-purpose graphics processors, cause the one or more processors to perform operations comprising: extracting, by the one or more general-purpose graphics processors, a feature learned by a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame, the instructions executed by the general-purpose processor to extract the feature are provided via a deep learning framework, the deep learning framework is provided by a graphics execution environment on a server computing device, and the deep learning framework provides instructions to accelerate deep learning operations; and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including context-dependent data, the second DNN model an update of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework.







Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3, 5-10, 12-14, 16, 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) in view of He et al. (US 20160379112 A1) in view of Yang et al. (US 20160342888 A1) in view of Zou et al. (US 20170300767 A1).

Regarding claims 1 and 10 and 16, Matsuda et al. disclose a data processing system on a computing device, the data processing system comprising, and method comprising, and non-transitory machine-readable medium storing instructions which, when executed by one or more processors including one or more general-purpose graphics processors, cause the one or more processors to perform operations comprising: including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device (acceleration of DNN learning for a specific application, [0001]), comprising: extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model via the framework, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame (image recognition, [0001], [0032], In this respect, for image recognition, if there is any category that can clearly distinguish objects, learning of DNNs for image recognition can efficiently be done category by category in place of the languages of the examples above, using the present invention, [0084], training a first DNN formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, [0011], the computer storing a category-independent sub-network used commonly for the plurality of categories, [0014], separating the first sub-network from other sub-networks and storing it as a category-independent sub-network in a storage medium, [0017], connecting it to the output stage of independent sub-network 230, user already has an independent sub-network, [0047] [indicates context independent], fixing independent sub-network 120, the DNN consisting of independent sub-network 120, [0048]); and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including context-dependent data, the second DNN model an update of the first DNN model (training a second DNN formed by connecting the third sub-network to an output side of the first sub-network with training data belonging to the second category, and thereby realizing learning of the first and second DNNs, [0011], a category-dependent sub-network used for a specific category, computer training the sub-network used for a specific category using training data belonging to the specific category while fixing parameters of the category-independent sub-network, [0014], a deep neural network training device, training a first deep neural network formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, and training a second deep neural network formed by connecting the third sub-network to an output side of the first sub-network with training data belonging to the second category, and thereby realizing training of the first and second deep neural networks, [0018], dependent sub-network, [0044], obtaining dependent sub-network 124 for English, [0047], DNN, not-yet-learned dependent sub-network of a new language (for example, Chinese) (dependent sub-network for Chinese) 234 is connected to the output side of independent sub-network 120, [0048], user, by obtaining dependent sub-network 124 for English and connecting it to the output stage of independent sub-network 230, [0047], [0086] [the category dependent network (= second DNN) is connected to the output of the category independent network (first DNN), thereby making it an extension/refinement of the first model). Matsuda et al. partly disclose the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework (various programming tool kits or program library installed in computer 340 [0065]).

Matsuda et al. do not explicitly disclose one or more storage devices comprising a graphics execution environment, the deep learning framework to cause the one or more general-purpose graphics processors to perform operations, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework.

He et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device (parallelize the training of the DNNs across multiple processing units, e.g., cores of a multi-core processor or multiple general-purpose graphics processing units (GPGPUs), [0016], Processing unit(s) 112 can be or include one or more single-core processors, multi-core processors, CPUs, GPUs, GPGPUs, or hardware logic components configured, e.g., via specialized programming from modules or APIs, to perform functions described herein, processing units 112 in computing device 102(3) can be a combination of one or more GPGPUs and one or more FPGAs, [0032]), the deep learning framework to cause the one or more general-purpose graphics processors to perform operations comprising: extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model via the framework, wherein the first DNN model is a pre-trained DNN model for computer vision to enable context-independent classification of an object within an input video frame (DNNs may be context-dependent DNNs or context-independent DNNs, [0016], camera(s) to provide device optimized functions such as speech recognition, image recognition and search, and speech synthesis, [0017], personal video recorders, [0020], first neural network, [0092]); and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including context-dependent data, the second DNN model an update of the first DNN model, wherein the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework (DNNs may be context-dependent DNNs or context-independent DNNs, [0016], determine the estimated value of the target feature based at least in part on an output of the hidden layer 406(3) of the second neural network 402(3) and to adjust the determined estimated value of the target feature based at least in part on an output of the hidden layer 406(2) of the first neural network 402(2), [0092]).

As a user can be a type of context, an interpretation partly supported by for instance paragraph 73 of He et al., the context-independent and context-dependent networks are interpreted as the user-independent and user-dependent networks. 

Matsuda et al. and He et al. are in the same art of deep neural networks (Matsuda et al., [0001]; He et al., [0016]). The combination of He et al. with Matsuda et al. will enable the use of context-independent and context-dependent DNNs. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the context-independent and context-dependent DNNs of He et al. with the invention of Matsuda et al. as this was known at the time of filing, the combination would have predictable results, and as He et al. indicate, “Various DNN training and operation techniques described herein can permit more efficiently analyzing data from disparate data sources.  Various examples can provide more effective ongoing training of neural networks, e.g., based on sensor readings, providing improved accuracy with reduced computational power compared to repeatedly retraining the neural networks.  Various examples operate multiple neural networks, permitting the operation of those neural networks to be carried out in parallel.  This parallel operation can permit operating the neural network with reduced computational load and memory requirements compared to operating a monolithic neural network” ([0148]), indicating the accuracy and efficiency advantage to having the context dependent and independent DNNs of He incorporated into the DNN configuration of Matsuda.

Matsuda et al. and He et al. do not explicitly disclose provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device, the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework.

Yang et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device ((“Contributing to the effective application of CNNs are large and powerful model(s) constructed from large-scale data set(s) and high performance computing platforms including general purpose graphics processing units (GPGPUs) providing teraflop computational capabilities”, [0004], “One category these applications is deep learning, wherein a convolutional neural network (CNN) is oftentimes employed”, [0035], “In order to accelerate the CNN learning process, many-core architectures including GPUs have been employed in state-of-art CNN frameworks… Additionally, Nvidia has recently released a library--cuDNN--to accelerate a set of core CNN layers on GPUs”, [0037],   Deep Learning on GPGPUs, [0059], “GPGPUs employ many-core architectures to achieve the high throughput.  Each GPU contains multiple next generation streaming multiprocessors (SMXs) on Nvidia latest architecture, and each SMX has multiple sets of streaming processors (SPs).  Each set of SPs execute in SIMD model.  These threads sharing a same instruction and running on a set of SPs are called a warp.  In Nvidia architecture, a warp contains 32 threads.  Due the limited size of hardware cache on GPGPUs, threads with a warp need to access consecutive off-chip memory to achieve the high bandwidth.  Such a requirement is also called coalesced memory access”, [0060], “We now discuss our experimental methodology before our characterizations and optimizations for deep learning applications. Since deep learning frameworks--i.e., Caffe, cuda-convnet and cuDNN have been commonly used and specifically optimized for GPGPUs, we describe memory efficiency on three major layers including the convolutional layer, t h e pooling layer and the softmax layer while employing these frameworks/library”, [0068]). 

Yang et al. also largely teach the deep learning framework is to provide a library of machine learning primitives, the machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework (“The graphics multiprocessor 234 has an execution pipeline including but not limited to an instruction cache 252, an instruction unit 254, an address mapping unit 256, a register file 258, one or more general purpose graphics processing unit (GPGPU) cores 262, and one or more load/store units 266.  The GPGPU cores 262 and load/store units 266 are coupled with cache memory 272 and shared memory 270 via a memory and cache interconnect 268”, [0070], “register file 258 provides temporary storage for operands connected to the data paths of the functional units (e.g., GPGPU cores 262, load/store units 266) of the graphics multiprocessor 324”, [0072], “multiple sets of graphics or compute execution units (e.g., GPGPU core 336A-336B, GPGPU core 337A-337B, GPGPU core 338A-338B),” [0076], “Hardware acceleration for the machine learning application 802 can be enabled via a machine learning framework 804.  The machine learning framework 804 can provide a library of machine learning primitives.  Machine learning primitives are basic operations that are commonly performed by machine learning algorithms.  Without the machine learning framework 804, developers of machine learning algorithms would be required to create and optimize the main computational logic associated with the machine learning algorithm, then re-optimize the computational logic as new parallel processors are developed.  Instead, the machine learning application can be configured to perform the necessary computations using the primitives provided by the machine learning framework 804.  Exemplary primitives include tensor convolutions, activation functions, and pooling, which are computational operations that are performed while training a convolutional neural network (CNN).  The machine learning framework 804 can also provide primitives to implement basic linear algebra subprograms performed by many machine-learning algorithms, such as matrix and vector operations”, [0173], GPGPU Machine Learning Acceleration, [0174], GPGPU 900 can be configured to train neural networks, [0180], each of the multiple GPGPUs 1006A-D can be an instance of the GPGPU 900 of FIG. 9, [0181]).

Matsuda et al. and He et al. and Yang et al. are in the same art of deep neural networks (Matsuda et al., [0001]; He et al., [0016]; Yang et al., [0059]-[0066]). The combination of Yang et al. with Matsuda et al. and He et al. will enable the use of computer vision and acceleration of learning via GPGPUs. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the acceleration of Yang et al. with the invention of Matsuda et al. and He et al. as this was known at the time of filing, the combination would have predictable results, and as Yang et al. indicate GPGPUs employ many-core architectures to achieve the high throughput ([0060]) and deep learning frameworks have been commonly used and specifically optimized for GPGPUs ([0068]), indicating the computational advantages to using the GPGPUs of Yang et al. in the invention of Matsuda and He.

Matsuda et al. and He et al. and Yang et al. do not explicitly disclose training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework.

Zou teaches training the second DNN model includes training the second DNN model via one or more primitives provided by the deep learning framework (The primitives can be predefined in a database (e.g., a road feature database, a road marking database, etc.), and thus, primitive detection engine 214 may refer to the database to make the determination, [0040], To detect road scene primitives, the primitive detection engine 214 uses machine learning and/or classic computer vision techniques.  More specifically, the primitive detection engine 214 can incorporate and utilize rule-based decision making and artificial intelligent (AI) reasoning to accomplish the various operations described herein.  The phrase "machine learning" broadly describes a function of electronic systems that learn from data.  A machine learning system, engine, or module can include a trainable machine learning algorithm that can be trained, such as in an external cloud environment, to learn functional relationships between inputs and outputs that are currently unknown, and the resulting model can be used by the primitive detection engine 214 to detect primitives, [0041], FIG. 3 depicts neural network 300 used for detecting primitives, according to aspects of the present disclosure.  By way of a general overview of embodiments of the present disclosure, neural network 300 is configured to receive a single image 312 as an input and decompose that single image into multiple aspects for parallel processing by processing cores 202a-202n.  The original "fisheye" image 312 is decomposed by a decomposition section 302 into multiple top-down and vertical views that differentiate one or more primitive occurrence(s), which are then fed into separately trained deep neural nets for simultaneous multi-primitive detection and classification in parallel, [0044], feature extraction section 304a outputs feature map 414 to the feature classification section 306a to classify the primitives using feature maps 414 based on primitives stored in an operatively connected database of primitives, [0046]).

Matsuda et al. and He et al. and Yang et al. and Zou are in the same art of deep neural networks (Matsuda et al., [0001]; He et al., [0016]; Yang et al., [0059]-[0066]; Zou, [0044]). The combination of Zou with Matsuda et al. and He et al. and Yang et al. will enable the use of training with primitives. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the training of Zou with the invention of Matsuda et al. and He et al. and Yang et al. as this was known at the time of filing, the combination would have predictable results, and as Zou indicates the benefit, “These aspects of the disclosure constitute technical features that yield the technical effect of reducing overall computational load, power consumption, hardware costs, and time” ([0025]) thus increasing the efficiency of the invention of Matsuda and He and Yang et al..

Regarding claims 3, 14 and 19, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 1, 10 and 16. Matsuda et al. and He et al. and Yang et al. further indicate the library of machine learning primitives includes primitives to perform tensor convolution, at least one activation function, and a pooling operation (Matsuda et al. further disclose various programming tool kits or program library installed in computer 340 [0065]; Yang et al., “The graphics multiprocessor 234 has an execution pipeline including but not limited to an instruction cache 252, an instruction unit 254, an address mapping unit 256, a register file 258, one or more general purpose graphics processing unit (GPGPU) cores 262, and one or more load/store units 266.  The GPGPU cores 262 and load/store units 266 are coupled with cache memory 272 and shared memory 270 via a memory and cache interconnect 268”, [0070], “register file 258 provides temporary storage for operands connected to the data paths of the functional units (e.g., GPGPU cores 262, load/store units 266) of the graphics multiprocessor 324”, [0072], “multiple sets of graphics or compute execution units (e.g., GPGPU core 336A-336B, GPGPU core 337A-337B, GPGPU core 338A-338B),” [0076], “Hardware acceleration for the machine learning application 802 can be enabled via a machine learning framework 804.  The machine learning framework 804 can provide a library of machine learning primitives.  Machine learning primitives are basic operations that are commonly performed by machine learning algorithms.  Without the machine learning framework 804, developers of machine learning algorithms would be required to create and optimize the main computational logic associated with the machine learning algorithm, then re-optimize the computational logic as new parallel processors are developed.  Instead, the machine learning application can be configured to perform the necessary computations using the primitives provided by the machine learning framework 804.  Exemplary primitives include tensor convolutions, activation functions, and pooling, which are computational operations that are performed while training a convolutional neural network (CNN).  The machine learning framework 804 can also provide primitives to implement basic linear algebra subprograms performed by many machine-learning algorithms, such as matrix and vector operations”, [0173], GPGPU Machine Learning Acceleration, [0174], GPGPU 900 can be configured to train neural networks, [0180], each of the multiple GPGPUs 1006A-D can be an instance of the GPGPU 900 of FIG. 9, [0181]).

Regarding claims 5, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 1. Yang et al. further indicate the graphics execution environment is a virtualized environment (“The graphics accelerator module 446 may be dedicated to a single application executed on the processor 407 or may be shared between multiple applications.  In one embodiment, a virtualized graphics execution environment is presented in which the resources of the graphics processing engines 431-432, N are shared with multiple applications or virtual machines (VMs).  The resources may be subdivided into "slices" which are allocated to different VMs and/or applications based on the processing requirements and priorities associated with the VMs and/or applications”, [0091], “Embodiments of the invention include an infrastructure for setting up the process state and sending a WD 484 to a graphics acceleration module 446 to start a job in a virtualized environment”, [0102]).

Regarding claims 6, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 5. Yang et al. further teach the one or more general-purpose graphics processors are configurable into partitions and the graphics execution environment is to execute as a virtualized environment by one or more partitions of the general-purpose graphics processors (“The graphics accelerator module 446 may be dedicated to a single application executed on the processor 407 or may be shared between multiple applications.  In one embodiment, a virtualized graphics execution environment is presented in which the resources of the graphics processing engines 431-432, N are shared with multiple applications or virtual machines (VMs).  The resources may be subdivided into "slices" which are allocated to different VMs and/or applications based on the processing requirements and priorities associated with the VMs and/or applications”, [0091]).

Regarding claims 7, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 1. Yang et al. further teach a network interface to enable communication with an external system, the external system including one or more general-purpose graphics processors; and wherein training the second DNN model for computer vision via the deep learning framework includes interfacing with an instance of the deep learning framework on the external system and training the second DNN model via the one or more general-purpose graphics processors of the external system (trained deep neural network to implement machine intelligence, [0172], create a multi-GPU cluster to improve training speed for particularly deep neural networks, [0175], “Memory controller hub 1616 also couples with an optional external graphics processor 1612, which may communicate with the one or more graphics processors 1608 in processors 1602 to perform graphics and media operations”, [0224], “FIG. 18 is a block diagram of a graphics processor 1800, which may be a discrete graphics processing unit, or may be a graphics processor integrated with a plurality of processing cores.  In some embodiments, the graphics processor communicates via a memory mapped I/O interface to registers on the graphics processor and with commands placed into the processor memory.  In some embodiments, graphics processor 1800 includes a memory interface 1814 to access memory.  Memory interface 1814 can be an interface to local memory, one or more internal caches, one or more shared external caches, and/or to system memory”, [0234]).

Regarding claims 8 and 13, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 7 and 12. He et al. and Yang et al. further teach training the second DNN model includes training the second DNN model to perform computer vision operations for autonomous navigation (He et al., satellite-based navigation system devices, [0020]; Yang et al., “The machine learning application 802 can implement any type of machine intelligence including but not limited to image recognition, mapping and localization, autonomous navigation, speech synthesis, medical imaging, or language translation”, [0172], Machine learning can be applied to solve a variety of technological problems, including but not limited to computer vision, autonomous driving and navigation, speech recognition, and language processing, [0209], Parallel processor accelerated machine learning has autonomous driving applications including lane and road sign recognition, obstacle avoidance, navigation, and driving control, [0210], At least a portion of the navigation and driving logic can be implemented in software executing on the multi-core processor, [0216]).

Regarding claims 9, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 1. Yang et al. further indicate the one or more general-purpose graphics processors include multiple general-purpose graphics processors, the multiple general-purpose graphics processors interconnected via peer-to-peer links between the multiple general-purpose graphics processors (“The GPGPUs 1006A-D can interconnect via a set of high-speed point to point GPU to GPU links 1016.  The high-speed GPU to GPU links can connect to each of the GPGPUs 1006A-D via a dedicated GPU link, such as the GPU link 910 as in FIG. 9.  The P2P GPU links 1016 enable direct communication between each of the GPGPUs 1006A-D without requiring communication over the host interface bus to which the processor 1002 is connected.  With GPU-to-GPU traffic directed to the P2P GPU links, the host interface bus remains available for system memory access or to communicate with other instances of the multi-GPU computing system 1000, for example, via one or more network devices.  While in the illustrated embodiment the GPGPUs 1006A-D connect to the processor 1002 via the host interface switch 1004, in one embodiment the processor 1002 includes direct support for the P2P GPU links 1016 and can connect directly to the GPGPUs 1006A-D”, [0181]).

Regarding claims 12 and 18, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 10 and 16. Matsuda et al. and Yang et al. further indicate detecting an output associated with the first DNN model; generating training data based on the output associated with the first DNN; and training the second DNN model based on the training data (Matsuda et al., training a second DNN formed by connecting the third sub-network to an output side of the first sub-network with training data belonging to the second category, and thereby realizing learning of the first and second DNNs, [0011], a category-dependent sub-network used for a specific category, computer training the sub-network used for a specific category using training data belonging to the specific category while fixing parameters of the category-independent sub-network, [0014], a deep neural network training device, training a first deep neural network formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, and training a second deep neural network formed by connecting the third sub-network to an output side of the first sub-network with training data belonging to the second category, and thereby realizing training of the first and second deep neural networks, [0018], obtaining dependent sub-network 124 for English, [0047], DNN, not-yet-learned dependent sub-network of a new language (for example, Chinese) (dependent sub-network for Chinese) 234 is connected to the output side of independent sub-network 120, [0048]; Yang et al., “Deep neural networks used in deep learning typically include a front-end network to perform feature recognition coupled to a back-end network which represents a mathematical model that can perform operations (e.g., object classification, speech recognition, etc.) based on the feature representation provided to the model… The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output.  The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network”, [0188]).

Claim(s) 2, 11, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) and He et al. (US 20160379112 A1) and Yang et al. (US 20160342888 A1) and Zou et al. (US 20170300767 A1) as applied to claims 1, 10 and 16 above, further in view of Langford et al. (US 20170308789 A1).

Regarding claims 2, 11, and 17, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 1, 10, and 16. Matsuda et al. and He et al. and Yang et al. and Zou do not disclose training the second DNN model includes training the second DNN model separately from the first DNN model.

Saon et al. teach training the second DNN model includes training the second DNN model separately from the first DNN model (DNN training can be performed by multiple nodes in a parallel manner to reduce the time required for training. Throughout this disclosure, the term “node” refers to a device or portion of a device configured as part of such a parallel DNN training arrangement. In at least one example, training engine 202 executes on each of a plurality of computing devices 210, and each computing device 210 has exactly one single-core processing unit, [0035], In some examples using context dependent DNNs, DNN 204 may include a total of eight layers (N=8). In various examples, the DNN 204 may be context-dependent DNNs or context-independent DNNs, [0036], The data analysis purposes may include using trained context-independent DNNs for activities such as image recognition, handwriting recognition, computer vision, video tracking, or so forth, [0045], GPU-type processor, [0050]).

Matsuda et al. and He et al. and Yang et al. and Zou and Langford et al. are in the same art of deep neural networks (Matsuda et al., [0001]; He et al., [0016]; Yang et al., [0059]-[0066]; Zou, [0044]; Langford et al., [0013]). The combination of Langford et al., with Matsuda et al. and He et al. and Yang et al. and Zou will enable the use of separate training. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the training of Langford et al. with the invention of Matsuda et al. and He et al. and Yang et al. and Zou as this was known at the time of filing, the combination would have predictable results, and as Langford et al. indicate “Such concurrent computations by the processing units 212 or other examples of nodes may result in a pipelining of computations that train the DNN 204, and, accordingly, to a reduction of computation time due to the resulting parallelism of computation. Concurrent computation and communication by the processing units 212 or other examples of nodes may result in reduced delay time waiting for data to arrive at a node and, accordingly, to a reduction of overall computation time” ([0037]), indicating the increase to computational efficiency when used in the invention of Matsuda et al. and He et al. and Yang et al. and Zou.

Claim(s) 4, 15 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) and He et al. (US 20160379112 A1) and Yang et al. (US 20160342888 A1) and Zou et al. (US 20170300767 A1) as applied to claims 1, 10 and 16 above, further in view of Zhang et al. (US 10884761 B2).

Regarding claims 4, 15, and 20, Matsuda et al. and He et al. and Yang et al. and Zou disclose the data processing system, method, and CRM as in claims 3, 14, and 19. Matsuda et al. and He et al. and Yang et al. and Zou do not disclose data processing system as in claim 3, wherein the library of machine learning primitives includes primitives to implement basic linear algebra subprograms.

Zhang et al. teach a library of machine learning primitives includes primitives to implement basic linear algebra subprograms (performing a neural network evaluation, col. 11, lines 10-15, For example, for a particular function that processes complex linear algebra functions as well as sizes of matrices in the function call, a projected number of matrix operations, etc., the selection module 204 may select a function and associated processor that minimizes power consumption for that particular function call, col. 11, lines 25-35, “Any number of available functions may substitute for the target function. As depicted, some libraries may include a Math Kernel Library (“MKL”) that may include core math functions, sparse solvers, FFTs, vector math, etc. In one embodiment, functions from the MKL library may run on one or more of the CPUs 112. In another embodiment, an MKL library may be included for execution on a different processor, such as an accelerator 124, which may include an Intel® Xeon Phi™ coprocessor (depicted in FIG. 4 as “Phi”). In another embodiment, the selection module 204 may select a function from a graphical processor library, such as the NVIDA® CUDA® Basic Linear Algebra Subroutines (“cuBLAS”) library or similar library, for execution on the GPU 116”, col. 13, lines 10-30, dt78, library for the FPGA 120, such as the FPGA Basic Linear Algebra Subroutines (“fBLAS”) library, col. 13, lines 25-35).

Matsuda et al. and He et al. and Yang et al. and Zou and Langford et al. are in the same art of neural networks (Matsuda et al., [0001]; He et al., [0016]; Yang et al., [0059]-[0066]; Zou, [0044]; Zhang et al., col. 11, lines 10-15). The combination of Zhang et al., with Matsuda et al. and He et al. and Yang et al. and Zou will enable the use of a BLAS. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the BLAS of Zhang et al. et al. with the invention of Matsuda et al. and He et al. and Yang et al. and Zou as this was known at the time of filing, the combination would have predictable results, and as Zhang et al. indicate minimizes power consumption (col. 11, lines 25-35), and can increase computing speed (col. 24, lines 35-55), indicating the increase to computational efficiency when used in the invention of Matsuda et al. and He et al. and Yang et al. and Zou.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661