DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-39 are pending under this Office action.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


The claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim 19 is directed to an abstract idea. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The rationale for this determination is explained below:
Claim 20, and it’s related dependent claims 21-27, are rejected under 35 U.S.C 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim 20 is directed to an abstract idea that a machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: perform a data transform comprising a combination of two or more data transforms, wherein the two or more data transforms are to be combined based, at least in part, on input and output data sizes of the two or more data transforms. 
The following analysis of facts of this particular patent application follows the rationale suggested in the "Federal Register Notice: 2019 Revised Patent Subject Matter Eligibility Guidance " (OG Notices: January 7, 2019, available from the US PTO website at https://www.govinfo.gov/content/pkg/FR-2019-01-07/pdf/2018-28282.pdf).  
The Guidelines states:
Limitations that were found not to be enough to qualify as ‘‘significantly
more’’ when recited in a claim with a judicial exception include (P6):
• An additional element merely recites the words ‘‘apply it’’ (or an
equivalent) with the judicial exception, or merely includes instructions to
implement an abstract idea on a computer, or merely uses a computer as
a tool to perform an abstract idea; 
• an additional element adds insignificant extra-solution activity to
the judicial exception; 
• an additional element does no more than generally link the use of a judicial
exception to a particular technological environment or field of use. 
In the instant case, at least one embodiment of the claimed invention is merely a machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: perform a data transform comprising a combination of two or more data transforms, wherein the two or more data transforms are to be combined based, at least in part, on input and output data sizes of the two or more data transforms.
Claim 20, and it’s related dependent claims 21-27, are rejected under §35 U.S.C. 101 as not falling within one of the four statutory categories of invention because the claimed invention is directed t computer program per se. See MPEP 2106(1). A claim directed toward a non-transitory computer readable medium having the program encoded thereon establishes a sufficient functional relationship between the program and a computer so as to remove it from the realm of “program per se”. MPEP 2111.05(111). Hence, adding the limitation of “non-transitory” before “computer-readable medium” would resolve this issue.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-39 are rejected under 35 U.S.C. 103 as being unpatentable over Matveev, etc. (US 20190138902 A1) in view of Sarel, etc. (US 20180293777 A1)
Regarding claim 1, Matveev teaches that a processor (See Matveev: Fig. 2, and [0053], “FIG. 2 shows a high-level block diagram of an exemplary computing device which may be used with embodiments of the present invention. Computing device 100 may include a controller or processor 105 that may be or include, for example, one or more central processing unit processor(s) (CPU), one or more Graphics Processing Unit(s) (GPU or GPGPU), a chip or any suitable computing or computational device, an operating system 115, a memory 120, a storage 130, input devices 135 and output devices 140”) comprising: 
one or more circuits to perform a data transform comprising a combination of two or more data transforms (See Matveev: Fig. 4, and [0059], “FIG. 4 is a block diagram illustrating a process for implementing a transformation on a computer, according to embodiments of the invention. Upon receipt of a first plurality of input arrays 410 (e.g., a batch of inputs to a CNN), the first plurality of input arrays can be converted 420 into a different format, for example, to produce a second plurality of input arrays. The converted first plurality of input arrays, in other words, the second plurality of input arrays can be written to memory and can be used as the input to the CNN. The second plurality of input arrays can be transformed 430. A first plurality of convolution arrays (e.g., kernels of a convolutional layer of the CNN) can be converted 440 into a different format, for example, to produce a second plurality of convolution arrays. The converted first plurality of input arrays, in other words, the second plurality of convolution arrays can be written to memory. An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays. The aggregate matrix multiply operation 450 can allow for a kernel to operate on multiple inputs arrays at the same time, due to, for example the conversion of the first plurality of input arrays and the first plurality of convolution array”), 
wherein the two or more data transforms are to be combined (See Matveev: Fig. 4, and [0059], “An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays”) based, at least in part, on input and output data sizes of the two or more data transforms (See Matveev: Figs. 1A-B, and [0049], “As is known in the art, the input can be represented as a matrix having elements defined by dimensions and channels. For a two-dimensional input array, row (R), column (C), and channel (CH) can be defined. For example, for a 10 by 10 image made up of red, green and blue the input array can be said to have a size of [10, 10, 3]. As is also known in the art, the convolutional layer can convolve the input with one or more filters. Each filter can be represented as a vector, matrix and/or array. For example, a convolutional filter array can be defined by row (R), column (C) and channel (CH). The output of a convolutional layer can have size that dependent on the number convolutional filter arrays (e.g., kernels). For example, assume an input having a size [R, C, 10] and assume there is 1 convolutional filter array of size [R, C, 10]. The output of the convolution in this example is an output array of size [R, C, 1]. Assume the same example, except there are 5 convolutional filter arrays of size [R, C, 10], then the output of the convolution in this example is an output array of size [R, C, 5]”).
However, Matveev fails to explicitly disclose that wherein the two or more data transforms are to be combined.
However, Sarel teaches that wherein the two or more data transforms are to be combined (See Sarel: Fig. 12, and [0165], “FIG. 12 illustrates an exemplary recurrent neural network 1200. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1200 can be described has having an input layer 1202 that receives an input vector, hidden layers 1204 to implement a recurrent function, a feedback mechanism 1205 to enable a ‘memory’ of previous states, and an output layer 1206 to output a result. The RNN 1200 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1205. For a given time step, the state of the hidden layers 1204 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1204. A second input (x.sub.2) can be processed by the hidden layer 1204 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t-1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1204 can vary depending on the specific implementation details of the RNN 1200”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Matveev to have wherein the two or more data transforms are to be combined as taught by Sarel in order to provide a general-purpose graphics processing apparatus that can perform parallel processing suited for training and deploying neural networks for machine learning (See Sarel: Fig. 8, and [0142], “Hardware acceleration for the machine learning application 802 can be enabled via a machine learning framework 804. The machine learning framework 804 can provide a library of machine learning primitives. Machine learning primitives are basic operations that are commonly performed by machine learning algorithms. Without the machine learning framework 804, developers of machine learning algorithms would be required to create and optimize the main computational logic associated with the machine learning algorithm, then re-optimize the computational logic as new parallel processors are developed. Instead, the machine learning application can be configured to perform the necessary computations using the primitives provided by the machine learning framework 804. Exemplary primitives include tensor convolutions, activation functions, and pooling, which are computational operations that are performed while training a convolutional neural network (CNN). The machine learning framework 804 can also provide primitives to implement basic linear algebra subprograms performed by many machine-learning algorithms, such as matrix and vector operations”). Matveev teaches a method and system that may perform an aggregate matric multiply in neural networks by relocating input and relocating convolution filters; while Sarel teaches a system and method that may dynamically select a convolutional implementation for the neural networks and provided GPGPU with parallel processing features for training and deploying neural networks for machine learning. Therefore, it is obvious to one of ordinary skill in the art to modify Matveev by Sarel to combine network data transform functions dynamically and provide parallel processing capability. The motivation to modify Matveev by Sarel is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 2, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev teaches that the processor of claim 1, wherein the data transform is performed by one or more parallel processing units, and the combination of two or more data transforms is based, at least in part, on a profile of resource requirements for each of the two or more data transforms (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 3, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev teaches that the processor of claim 1, wherein the combination of two or more data transforms results in a sequence of instructions implementing operations on data to train one or more neural networks (See Matveev: Figs. 1-2, and [0055], “Executable code 125 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 125 may be executed by controller 105 possibly under control of operating system 115. For example, executable code 125 may when executed cause NN training, coordination of NN training tasks, NN execution or inference, etc. according to embodiments of the present invention. In some embodiments, more than one computing device 100 or components of device 100 may be used for multiple functions described herein. For the various modules and functions described herein, one or more computing devices 100 or components of computing device 100 may be used. Devices that include components similar or different to those included in computing device 100 may be used, and may be connected to a network and used as a system. One or more processor(s) 105 may be configured to carry out embodiments of the present invention by for example executing software or code. Storage 130 may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device or other suitable removable and/or fixed storage unit. Data such as instructions, code, NN model data, parameters, etc. may be stored in a storage 130 and may be loaded from storage 130 into a memory 120 where it may be processed by controller 105. In some embodiments, some of the components shown in FIG. 2 may be omitted”).
Regarding claim 4, Matveev and Sarel teach all the features with respect to claim 3 as outlined above. Further, Matveev teaches that the processor of claim 3, wherein the sequence of instructions implement operations to be performed by one or more parallel processing units (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 5, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev teaches that the processor of claim 1, wherein the two or more data transforms are to be combined, based at least in part, on memory requirements of each of the two or more data transforms and memory availability of one or more parallel processing units (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 6, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev teaches that the processor of claim 1, wherein the two or more data transforms are to be combined, based at least in part, on compute time requirements of each of the two or more data transforms (See Matveev: Fig. 4, and [0060], “Allowing the kernel to operate on multiple input arrays at the same time can reduce the time it takes a batch of inputs to be processed through the convolutional layer of the CNN. The convolution results can be inverse transformed 460 to produce the output 470. In this manner, the process in FIG. 4 can result in a convolution of the batch of inputs of FIG. 3 with the convolution arrays of FIG. 3, without performing the shuffle/reshuffle or the point-wise matrix multiply (or the matrix multiply) that can be performed in the prior art, also reducing the time it takes to process a batch of inputs through the convolution layer of the CNN”).
Regarding claim 7, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev teaches that the processor of claim 1, wherein the two or more data transforms are to be combined based, at least in part, on available memory resources of a computing system (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 8, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Sarel teaches that the processor of claim 1, wherein the two or more data transforms are pre and post transforms, and prepare 3-dimensional image data for use in training a neural network (See Sarel: Fig. 18, and [0206], “In some embodiments, GPE 1810 includes a 3D pipeline 1812 for performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.). The 3D pipeline 1812 includes programmable and fixed function elements that perform various tasks within the element and/or spawn execution threads to a 3D/Media sub-system 1815. While 3D pipeline 1812 can be used to perform media operations, an embodiment of GPE 1810 also includes a media pipeline 1816 that is specifically used to perform media operations, such as video post-processing and image enhancement”). 
Regarding claim 9, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev and Sarel teach that a system (See Matveev: Fig. 2, and [0053], “FIG. 2 shows a high-level block diagram of an exemplary computing device which may be used with embodiments of the present invention. Computing device 100 may include a controller or processor 105 that may be or include, for example, one or more central processing unit processor(s) (CPU), one or more Graphics Processing Unit(s) (GPU or GPGPU), a chip or any suitable computing or computational device, an operating system 115, a memory 120, a storage 130, input devices 135 and output devices 140”), comprising:
one or more processors to perform a first set of two or more data transforms and a second set of two or more data transforms (See Matveev: Fig. 4, and [0059], “FIG. 4 is a block diagram illustrating a process for implementing a transformation on a computer, according to embodiments of the invention. Upon receipt of a first plurality of input arrays 410 (e.g., a batch of inputs to a CNN), the first plurality of input arrays can be converted 420 into a different format, for example, to produce a second plurality of input arrays. The converted first plurality of input arrays, in other words, the second plurality of input arrays can be written to memory and can be used as the input to the CNN. The second plurality of input arrays can be transformed 430. A first plurality of convolution arrays (e.g., kernels of a convolutional layer of the CNN) can be converted 440 into a different format, for example, to produce a second plurality of convolution arrays. The converted first plurality of input arrays, in other words, the second plurality of convolution arrays can be written to memory. An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays. The aggregate matrix multiply operation 450 can allow for a kernel to operate on multiple inputs arrays at the same time, due to, for example the conversion of the first plurality of input arrays and the first plurality of convolution array”), 
wherein the second set of two or more data transforms (See Matveev: Fig. 4, and [0059], “An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays”) are to be combined (See Sarel: Fig. 12, and [0165], “FIG. 12 illustrates an exemplary recurrent neural network 1200. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1200 can be described has having an input layer 1202 that receives an input vector, hidden layers 1204 to implement a recurrent function, a feedback mechanism 1205 to enable a ‘memory’ of previous states, and an output layer 1206 to output a result. The RNN 1200 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1205. For a given time step, the state of the hidden layers 1204 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1204. A second input (x.sub.2) can be processed by the hidden layer 1204 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t-1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1204 can vary depending on the specific implementation details of the RNN 1200”) from individual data transforms from the first set of two or more data transforms based, at least in part, on input and output data sizes of the individual data transforms (See Matveev: Figs. 1A-B, and [0049], “As is known in the art, the input can be represented as a matrix having elements defined by dimensions and channels. For a two-dimensional input array, row (R), column (C), and channel (CH) can be defined. For example, for a 10 by 10 image made up of red, green and blue the input array can be said to have a size of [10, 10, 3]. As is also known in the art, the convolutional layer can convolve the input with one or more filters. Each filter can be represented as a vector, matrix and/or array. For example, a convolutional filter array can be defined by row (R), column (C) and channel (CH). The output of a convolutional layer can have size that dependent on the number convolutional filter arrays (e.g., kernels). For example, assume an input having a size [R, C, 10] and assume there is 1 convolutional filter array of size [R, C, 10]. The output of the convolution in this example is an output array of size [R, C, 1]. Assume the same example, except there are 5 convolutional filter arrays of size [R, C, 10], then the output of the convolution in this example is an output array of size [R, C, 5]”).
Regarding claim 10, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Matveev teaches that the system of claim 9, wherein the second set is performed by one or more parallel processing units, and the combination of individual data transforms from the first set is based, at least in part, on resource requirements for each of the two or more data transforms (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 11, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Sarel teaches that the system of claim 9, wherein the second set performs a sequence of operations on three dimensional (3D) image data (See Sarel: Fig. 18, and [0206], “In some embodiments, GPE 1810 includes a 3D pipeline 1812 for performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.). The 3D pipeline 1812 includes programmable and fixed function elements that perform various tasks within the element and/or spawn execution threads to a 3D/Media sub-system 1815. While 3D pipeline 1812 can be used to perform media operations, an embodiment of GPE 1810 also includes a media pipeline 1816 that is specifically used to perform media operations, such as video post-processing and image enhancement”).
Regarding claim 12, Matveev and Sarel teach all the features with respect to claim 11 as outlined above. Further, Matveev teaches that the system of claim 11, wherein the second set is accelerated by one or more parallel processing units (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 13, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Matveev teaches that the system of claim 9, wherein the individual data transforms from the first set are to be combined, based at least in part, on memory requirements of each of the individual data transforms and memory availability of one or more parallel processing units (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 14, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Matveev teaches that the system of claim 9, wherein the individual data transforms from the first set are to be combined such that a time requirement for applying the first set is reduced (See Matveev: Fig. 4, and [0060], “Allowing the kernel to operate on multiple input arrays at the same time can reduce the time it takes a batch of inputs to be processed through the convolutional layer of the CNN. The convolution results can be inverse transformed 460 to produce the output 470. In this manner, the process in FIG. 4 can result in a convolution of the batch of inputs of FIG. 3 with the convolution arrays of FIG. 3, without performing the shuffle/reshuffle or the point-wise matrix multiply (or the matrix multiply) that can be performed in the prior art, also reducing the time it takes to process a batch of inputs through the convolution layer of the CNN”).
Regarding claim 15, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Matveev teaches that the system of claim 9, wherein the individual data transforms are to be combined based on available memory resources of a computing system implementing one or more neural networks (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 16, Matveev and Sarel teach all the features with respect to claim 15 as outlined above. Further, Matveev teaches that the system of claim 15, wherein the one or more neural networks are trained using data transformed by the second set (See Matveev: Figs. 1-2, and [0055], “Executable code 125 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 125 may be executed by controller 105 possibly under control of operating system 115. For example, executable code 125 may when executed cause NN training, coordination of NN training tasks, NN execution or inference, etc. according to embodiments of the present invention. In some embodiments, more than one computing device 100 or components of device 100 may be used for multiple functions described herein. For the various modules and functions described herein, one or more computing devices 100 or components of computing device 100 may be used. Devices that include components similar or different to those included in computing device 100 may be used, and may be connected to a network and used as a system. One or more processor(s) 105 may be configured to carry out embodiments of the present invention by for example executing software or code. Storage 130 may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device or other suitable removable and/or fixed storage unit. Data such as instructions, code, NN model data, parameters, etc. may be stored in a storage 130 and may be loaded from storage 130 into a memory 120 where it may be processed by controller 105. In some embodiments, some of the components shown in FIG. 2 may be omitted”).
Regarding claim 17, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Matveev teaches that the system of claim 9, wherein the first set of two or more data transforms and the second set of two or more data transforms contain pre and post transforms (See Matveev: Figs. 1-2, and [0105], “As is apparent to one of ordinary skill in the art, values as described herein for the number of arrays in the plurality of first input arrays, input channel depth, number of rows and columns in the input, number of arrays in the plurality of first convolution arrays, the filter channel depth, and the number or rows and columns in the convolution array, are values selected for their ease of use in explanation, and that typically in CNNs these values are much larger. For example, a first plurality of input arrays can include a 128 two-dimensional images, where each image is 1024×1024 pixels”).
Regarding claim 18, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Sarel teaches that the system of claim 9, wherein the first set of two or more data transforms and the second set of two or more data transforms prepare three dimensional (3D) image data for use in training one or more neural networks (See Sarel: Fig. 18, and [0206], “In some embodiments, GPE 1810 includes a 3D pipeline 1812 for performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.). The 3D pipeline 1812 includes programmable and fixed function elements that perform various tasks within the element and/or spawn execution threads to a 3D/Media sub-system 1815. While 3D pipeline 1812 can be used to perform media operations, an embodiment of GPE 1810 also includes a media pipeline 1816 that is specifically used to perform media operations, such as video post-processing and image enhancement”).
Regarding claim 19, Matveev and Sarel teach all the features with respect to claim 9 as outlined above. Further, Matveev teaches that the system of claim 9, wherein:
the second set of two or more data transforms are performed (See Matveev: Figs. 1-2, and [0018], “In one aspect, the invention involves a method for an improved convolution neural network (CNN). The method involves receiving a first plurality of input arrays that are stored in a first computer memory, wherein each element in each input array in the first plurality of input arrays is referenced at least as input batch number, input row, input column and an input channel depth, and wherein the first plurality of input arrays is stored in the first computer memory continuously along the input channel depth. The method also involves writing a second plurality of input arrays into the first computer memory such that the second plurality of input arrays is the first plurality of input arrays stored in the first computer memory continuously along the input batch number. The method also involves receiving a first plurality of convolution arrays that are stored in a second computer memory, wherein each element in each array of the first plurality of convolution arrays is referenced by convolution filter number, row filter, a column filter, and filter channel depth. The method also involves determining a second plurality of convolution arrays such that each array in the second plurality of convolution arrays comprises a subset of elements from at least two arrays of the plurality of first convolution arrays. The method also involves performing an aggregate matrix multiply between the second plurality of input arrays and the second plurality of convolution arrays to produce one or more output arrays that are the result of the convolution of the first plurality of input arrays with the first plurality convolution arrays”);
a third set of data transforms are performed (See Matveev: Fig. 8, and [0073], “The first plurality of convolution arrays can be ordered based on an input channel group number (IN-CH-G). The IN-CH-G can specify how many input channels of each array of the first plurality of convolutional arrays to store continuously in memory. For example, assume a number of input channels is 8, a number of arrays in the first plurality of convolution arrays is 4, and IN-CH-G of 4. In this example, the first array (new K0) of the second plurality of convolution arrays is the first four channels (0,1,2,3) of the first array of the first plurality of convolution array (e.g., original K0), the first four channels (0,1,2,3) of the second array of the first plurality of convolution array (e.g., original K1), the first four channels (0,1,2,3) of the third array of the first plurality of convolution array (e.g., original K2), and the first four channels (0,1,2,3) of the fourth array of the first plurality of convolution array (e.g., original K3). In this example, the second array (new K1) of the second plurality of convolution arrays is the second four channels (4,5,6,7) of the first array of the first plurality of convolution array (e.g., original K0), the second four channels (4,5,6,7) of the second array of the first plurality of convolution array (e.g., original K1), the second four channels (4,5,6,7) of the third array of the first plurality of convolution array (e.g., original K2), and the second four channels (4,5,6,7) of the fourth array of the first plurality of convolution array (e.g., original K3)”), and
the third set of data transforms is comprised of individual data transforms from the first set of two or more data transforms that were not selected to be in the second set of two or more data transforms (See Matveev: Fig. 8, and [0085], “As described above, in the example of FIG. 8, FCH is 8, the number of arrays in the first plurality of convolution arrays is K=4, IN-CH-G is 4, and OUT-CH-G is 2. The vector table 830 shows the second plurality of convolution arrays are organized as follows: the first array (new K0) includes the first four channels (0,1,2,3) of the first array and the second array (original K0 and original K1) of the first plurality of convolution arrays, the second array (new K1) includes the second four channels (4,5,6,7) of the first array and the second array (original K0 and original K1) of the first plurality of convolution arrays, the third array (new K2) includes the first four channels (0,1,2,3) of the third array and the fourth array (original K2 and original K3) of the first plurality of convolution arrays and the fourth array (new K3) includes the second four channels (4,5,6,7) of the third array and the fourth array (original K2 and original K3) of the first plurality of convolution arrays”).
Regarding claim 20, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev and Sarel teach that a machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors (See Matveev: Fig. 2, and [0053], “FIG. 2 shows a high-level block diagram of an exemplary computing device which may be used with embodiments of the present invention. Computing device 100 may include a controller or processor 105 that may be or include, for example, one or more central processing unit processor(s) (CPU), one or more Graphics Processing Unit(s) (GPU or GPGPU), a chip or any suitable computing or computational device, an operating system 115, a memory 120, a storage 130, input devices 135 and output devices 140”)  to at least:
perform a data transform comprising a combination of two or more data transforms (See Matveev: Fig. 4, and [0059], “FIG. 4 is a block diagram illustrating a process for implementing a transformation on a computer, according to embodiments of the invention. Upon receipt of a first plurality of input arrays 410 (e.g., a batch of inputs to a CNN), the first plurality of input arrays can be converted 420 into a different format, for example, to produce a second plurality of input arrays. The converted first plurality of input arrays, in other words, the second plurality of input arrays can be written to memory and can be used as the input to the CNN. The second plurality of input arrays can be transformed 430. A first plurality of convolution arrays (e.g., kernels of a convolutional layer of the CNN) can be converted 440 into a different format, for example, to produce a second plurality of convolution arrays. The converted first plurality of input arrays, in other words, the second plurality of convolution arrays can be written to memory. An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays. The aggregate matrix multiply operation 450 can allow for a kernel to operate on multiple inputs arrays at the same time, due to, for example the conversion of the first plurality of input arrays and the first plurality of convolution array”), 
wherein the two or more data transforms are to be combined (See Sarel: Fig. 12, and [0165], “FIG. 12 illustrates an exemplary recurrent neural network 1200. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1200 can be described has having an input layer 1202 that receives an input vector, hidden layers 1204 to implement a recurrent function, a feedback mechanism 1205 to enable a ‘memory’ of previous states, and an output layer 1206 to output a result. The RNN 1200 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1205. For a given time step, the state of the hidden layers 1204 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1204. A second input (x.sub.2) can be processed by the hidden layer 1204 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t-1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1204 can vary depending on the specific implementation details of the RNN 1200”) based, at least in part, on input and output data sizes of the two or more data transforms (See Matveev: Figs. 1A-B, and [0049], “As is known in the art, the input can be represented as a matrix having elements defined by dimensions and channels. For a two-dimensional input array, row (R), column (C), and channel (CH) can be defined. For example, for a 10 by 10 image made up of red, green and blue the input array can be said to have a size of [10, 10, 3]. As is also known in the art, the convolutional layer can convolve the input with one or more filters. Each filter can be represented as a vector, matrix and/or array. For example, a convolutional filter array can be defined by row (R), column (C) and channel (CH). The output of a convolutional layer can have size that dependent on the number convolutional filter arrays (e.g., kernels). For example, assume an input having a size [R, C, 10] and assume there is 1 convolutional filter array of size [R, C, 10]. The output of the convolution in this example is an output array of size [R, C, 1]. Assume the same example, except there are 5 convolutional filter arrays of size [R, C, 10], then the output of the convolution in this example is an output array of size [R, C, 5]”).
Regarding claim 21, Matveev and Sarel teach all the features with respect to claim 20 as outlined above. Further, Matveev teaches that the machine-readable medium of claim 20, wherein the data transform comprising a combination of two or more data transforms is performed by one or more parallel processing units (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 22, Matveev and Sarel teach all the features with respect to claim 20 as outlined above. Further, Matveev teaches that the machine-readable medium of claim 20, wherein the instructions, when performed, further cause the one or more processors to combine two or more data transforms based, at least in part, on a profile of resource requirements for each of the two or more data transforms (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 23, Matveev and Sarel teach all the features with respect to claim 20 as outlined above. Further, Matveev teaches that the machine-readable medium of claim 20, wherein instructions, when performed, further cause the one or more processors to perform a sequence of data transformation operations on data used to train one or more neural networks, where the sequence of data transformation operations are specified by the combination of two or more data transforms (See Matveev: Figs. 1-2, and [0055], “Executable code 125 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 125 may be executed by controller 105 possibly under control of operating system 115. For example, executable code 125 may when executed cause NN training, coordination of NN training tasks, NN execution or inference, etc. according to embodiments of the present invention. In some embodiments, more than one computing device 100 or components of device 100 may be used for multiple functions described herein. For the various modules and functions described herein, one or more computing devices 100 or components of computing device 100 may be used. Devices that include components similar or different to those included in computing device 100 may be used, and may be connected to a network and used as a system. One or more processor(s) 105 may be configured to carry out embodiments of the present invention by for example executing software or code. Storage 130 may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device or other suitable removable and/or fixed storage unit. Data such as instructions, code, NN model data, parameters, etc. may be stored in a storage 130 and may be loaded from storage 130 into a memory 120 where it may be processed by controller 105. In some embodiments, some of the components shown in FIG. 2 may be omitted”).
Regarding claim 24, Matveev and Sarel teach all the features with respect to claim 23 as outlined above. Further, Matveev teaches that the machine-readable medium of claim 23, wherein the sequence of data transformation operations are accelerated by one or more graphics processing units (See Matveev: Figs. 1-2, and [0047], “Embodiments of the invention include systems and methods that may reduce the amount machine operations during performance of convolutions in CNNs. Embodiments of the invention can allow for CNNs to be realizably implemented on a CPU. Further, while CPU based machines are discussed, GPUs or other types of processors may be used”).
Regarding claim 25, Matveev and Sarel teach all the features with respect to claim 20 as outlined above. Further, Matveev teaches that the machine-readable medium of claim 20, wherein the two or more data transforms are to be combined, based at least in part, on memory requirements of each of the two or more data transforms and memory availability of one or more parallel processing units (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 26, Matveev and Sarel teach all the features with respect to claim 20 as outlined above. Further, Matveev teaches that the machine-readable medium of claim 20, wherein the two or more data transforms are to be combined such that a required computing time to perform each of the two or more data transforms is reduced (See Matveev: Fig. 4, and [0060], “Allowing the kernel to operate on multiple input arrays at the same time can reduce the time it takes a batch of inputs to be processed through the convolutional layer of the CNN. The convolution results can be inverse transformed 460 to produce the output 470. In this manner, the process in FIG. 4 can result in a convolution of the batch of inputs of FIG. 3 with the convolution arrays of FIG. 3, without performing the shuffle/reshuffle or the point-wise matrix multiply (or the matrix multiply) that can be performed in the prior art, also reducing the time it takes to process a batch of inputs through the convolution layer of the CNN”).
Regarding claim 27, Matveev and Sarel teach all the features with respect to claim 20 as outlined above. Further, Sarel teaches that the machine-readable medium of claim 20, wherein the two or more data transforms are pre and post transforms, and prepare three dimensional (3D) image data for use in training a neural network (See Sarel: Fig. 18, and [0206], “In some embodiments, GPE 1810 includes a 3D pipeline 1812 for performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.). The 3D pipeline 1812 includes programmable and fixed function elements that perform various tasks within the element and/or spawn execution threads to a 3D/Media sub-system 1815. While 3D pipeline 1812 can be used to perform media operations, an embodiment of GPE 1810 also includes a media pipeline 1816 that is specifically used to perform media operations, such as video post-processing and image enhancement”).
Regarding claim 28, Matveev and Sarel teach all the features with respect to claim 1 as outlined above. Further, Matveev and Sarel teach that a method (See Matveev: Fig. 2, and [0053], “FIG. 2 shows a high-level block diagram of an exemplary computing device which may be used with embodiments of the present invention. Computing device 100 may include a controller or processor 105 that may be or include, for example, one or more central processing unit processor(s) (CPU), one or more Graphics Processing Unit(s) (GPU or GPGPU), a chip or any suitable computing or computational device, an operating system 115, a memory 120, a storage 130, input devices 135 and output devices 140”), comprising:
performing a first set of two or more data transforms using one or more parallel processing units (See Matveev: Fig. 4, and [0059], “FIG. 4 is a block diagram illustrating a process for implementing a transformation on a computer, according to embodiments of the invention. Upon receipt of a first plurality of input arrays 410 (e.g., a batch of inputs to a CNN), the first plurality of input arrays can be converted 420 into a different format, for example, to produce a second plurality of input arrays. The converted first plurality of input arrays, in other words, the second plurality of input arrays can be written to memory and can be used as the input to the CNN. The second plurality of input arrays can be transformed 430. A first plurality of convolution arrays (e.g., kernels of a convolutional layer of the CNN) can be converted 440 into a different format, for example, to produce a second plurality of convolution arrays. The converted first plurality of input arrays, in other words, the second plurality of convolution arrays can be written to memory. An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays. The aggregate matrix multiply operation 450 can allow for a kernel to operate on multiple inputs arrays at the same time, due to, for example the conversion of the first plurality of input arrays and the first plurality of convolution array”), wherein the first set of two or more data transforms are based, at least in part, on individual data transforms from a second set of two or more data transforms (See Matveev: Fig. 4, and [0059], “FIG. 4 is a block diagram illustrating a process for implementing a transformation on a computer, according to embodiments of the invention. Upon receipt of a first plurality of input arrays 410 (e.g., a batch of inputs to a CNN), the first plurality of input arrays can be converted 420 into a different format, for example, to produce a second plurality of input arrays. The converted first plurality of input arrays, in other words, the second plurality of input arrays can be written to memory and can be used as the input to the CNN. The second plurality of input arrays can be transformed 430. A first plurality of convolution arrays (e.g., kernels of a convolutional layer of the CNN) can be converted 440 into a different format, for example, to produce a second plurality of convolution arrays. The converted first plurality of input arrays, in other words, the second plurality of convolution arrays can be written to memory. An aggregate matrix multiply operation 450 can be performed to convolve the second plurality of input arrays and the second plurality of convolution arrays. The aggregate matrix multiply operation 450 can allow for a kernel to operate on multiple inputs arrays at the same time, due to, for example the conversion of the first plurality of input arrays and the first plurality of convolution array”); and
selecting individual data transforms for the first set of two or more data transforms from the second set of two or more data transforms based, at least in part, on input and output data sizes of the individual data transforms (See Matveev: Figs. 1A-B, and [0049], “As is known in the art, the input can be represented as a matrix having elements defined by dimensions and channels. For a two-dimensional input array, row (R), column (C), and channel (CH) can be defined. For example, for a 10 by 10 image made up of red, green and blue the input array can be said to have a size of [10, 10, 3]. As is also known in the art, the convolutional layer can convolve the input with one or more filters. Each filter can be represented as a vector, matrix and/or array. For example, a convolutional filter array can be defined by row (R), column (C) and channel (CH). The output of a convolutional layer can have size that dependent on the number convolutional filter arrays (e.g., kernels). For example, assume an input having a size [R, C, 10] and assume there is 1 convolutional filter array of size [R, C, 10]. The output of the convolution in this example is an output array of size [R, C, 1]. Assume the same example, except there are 5 convolutional filter arrays of size [R, C, 10], then the output of the convolution in this example is an output array of size [R, C, 5]”).
Regarding claim 29, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the second set of two or more data transforms is performed by one or more parallel processing units (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 30, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the combination of individual data transforms from the first set is based, at least in part, on resource requirements for each of the two or more data transforms in the first set (See Matveev: Figs. 1-2, and [0019], “In some embodiments, a shuffle operation is not performed on the first plurality of input arrays within the CNN prior to performing the aggregate matrix multiply. In some embodiments, a reshuffle operation is not performed on an output of the aggregate matrix multiply. In some embodiments, performing the aggregate matrix multiply further comprises processing at least two elements of at two of the first plurality input arrays in parallel”).
Regarding claim 31, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Sarel teaches that the method of claim 28, wherein the second set of two or more data transforms performs a sequence of operations on three dimensional (3D) image data (See Sarel: Fig. 18, and [0206], “In some embodiments, GPE 1810 includes a 3D pipeline 1812 for performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.). The 3D pipeline 1812 includes programmable and fixed function elements that perform various tasks within the element and/or spawn execution threads to a 3D/Media sub-system 1815. While 3D pipeline 1812 can be used to perform media operations, an embodiment of GPE 1810 also includes a media pipeline 1816 that is specifically used to perform media operations, such as video post-processing and image enhancement”).
Regarding claim 32, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the first set of two or more data transforms are performed and a third set of data transforms are performed (See Matveev: Fig. 8, and [0073], “The first plurality of convolution arrays can be ordered based on an input channel group number (IN-CH-G). The IN-CH-G can specify how many input channels of each array of the first plurality of convolutional arrays to store continuously in memory. For example, assume a number of input channels is 8, a number of arrays in the first plurality of convolution arrays is 4, and IN-CH-G of 4. In this example, the first array (new K0) of the second plurality of convolution arrays is the first four channels (0,1,2,3) of the first array of the first plurality of convolution array (e.g., original K0), the first four channels (0,1,2,3) of the second array of the first plurality of convolution array (e.g., original K1), the first four channels (0,1,2,3) of the third array of the first plurality of convolution array (e.g., original K2), and the first four channels (0,1,2,3) of the fourth array of the first plurality of convolution array (e.g., original K3). In this example, the second array (new K1) of the second plurality of convolution arrays is the second four channels (4,5,6,7) of the first array of the first plurality of convolution array (e.g., original K0), the second four channels (4,5,6,7) of the second array of the first plurality of convolution array (e.g., original K1), the second four channels (4,5,6,7) of the third array of the first plurality of convolution array (e.g., original K2), and the second four channels (4,5,6,7) of the fourth array of the first plurality of convolution array (e.g., original K3)”), and 
the third set of data transforms is comprised of individual data transforms from the second set of two or more data transforms that were not selected to be in the first set of two or more data transforms (See Matveev: Fig. 8, and [0085], “As described above, in the example of FIG. 8, FCH is 8, the number of arrays in the first plurality of convolution arrays is K=4, IN-CH-G is 4, and OUT-CH-G is 2. The vector table 830 shows the second plurality of convolution arrays are organized as follows: the first array (new K0) includes the first four channels (0,1,2,3) of the first array and the second array (original K0 and original K1) of the first plurality of convolution arrays, the second array (new K1) includes the second four channels (4,5,6,7) of the first array and the second array (original K0 and original K1) of the first plurality of convolution arrays, the third array (new K2) includes the first four channels (0,1,2,3) of the third array and the fourth array (original K2 and original K3) of the first plurality of convolution arrays and the fourth array (new K3) includes the second four channels (4,5,6,7) of the third array and the fourth array (original K2 and original K3) of the first plurality of convolution arrays”).
Regarding claim 33, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the individual data transforms from the second set of two or more data transforms are selected based on memory requirements of each of the individual data transforms and memory availability of one or more parallel processing units (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 34, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the individual data transforms from the second set of two or more data transforms are selected based on compute time requirements of each of the individual data transforms (See Matveev: Fig. 4, and [0060], “Allowing the kernel to operate on multiple input arrays at the same time can reduce the time it takes a batch of inputs to be processed through the convolutional layer of the CNN. The convolution results can be inverse transformed 460 to produce the output 470. In this manner, the process in FIG. 4 can result in a convolution of the batch of inputs of FIG. 3 with the convolution arrays of FIG. 3, without performing the shuffle/reshuffle or the point-wise matrix multiply (or the matrix multiply) that can be performed in the prior art, also reducing the time it takes to process a batch of inputs through the convolution layer of the CNN”).
Regarding claim 35, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the individual data transforms are from the second set of two or more data transforms are selected based on available memory resources of a computing system implementing one or more neural networks (See Matveev: Figs. 1-2, and [0020], “In some embodiments, the subset of elements is based on a memory size of the computer. In some embodiments, a computer processing unit executing the method of claim 1 does not incur additional cache misses beyond what is generated by executing the method of claim 1. In some embodiments, the first plurality of convolutional arrays includes at least one sparse array”).
Regarding claim 36, Matveev and Sarel teach all the features with respect to claim 35 as outlined above. Further, Sarel teaches that the method of claim 35, wherein the one or more neural networks are trained using data transformed by the second set of two or more data transforms (See Sarel: Fig. 8, and [0141], “FIG. 8 is a generalized diagram of a machine learning software stack 800. A machine learning application 802 can be configured to train a neural network using a training dataset or to use a trained deep neural network to implement machine intelligence. The machine learning application 802 can include training and inference functionality for a neural network and/or specialized software that can be used to train a neural network before deployment. The machine learning application 802 can implement any type of machine intelligence including but not limited to image recognition, mapping and localization, autonomous navigation, speech synthesis, medical imaging, or language translation”).
Regarding claim 37, Matveev and Sarel teach all the features with respect to claim 35 as outlined above. Further, Matveev teaches that the method of claim 35, wherein the one or more neural networks are used to perform inferencing on data transformed by the second set of two or more data transforms (See Matveev: Figs. 1-2, and [0055], “Executable code 125 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 125 may be executed by controller 105 possibly under control of operating system 115. For example, executable code 125 may when executed cause NN training, coordination of NN training tasks, NN execution or inference, etc. according to embodiments of the present invention. In some embodiments, more than one computing device 100 or components of device 100 may be used for multiple functions described herein. For the various modules and functions described herein, one or more computing devices 100 or components of computing device 100 may be used. Devices that include components similar or different to those included in computing device 100 may be used, and may be connected to a network and used as a system. One or more processor(s) 105 may be configured to carry out embodiments of the present invention by for example executing software or code. Storage 130 may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device or other suitable removable and/or fixed storage unit. Data such as instructions, code, NN model data, parameters, etc. may be stored in a storage 130 and may be loaded from storage 130 into a memory 120 where it may be processed by controller 105. In some embodiments, some of the components shown in FIG. 2 may be omitted”).
Regarding claim 38, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Sarel teaches that the method of claim 28, further comprising preparing three dimensional (3D) image data for use in training one or more neural networks using the first set of two or more data transforms and the second set of two or more data transforms (See Sarel: Fig. 18, and [0206], “In some embodiments, GPE 1810 includes a 3D pipeline 1812 for performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.). The 3D pipeline 1812 includes programmable and fixed function elements that perform various tasks within the element and/or spawn execution threads to a 3D/Media sub-system 1815. While 3D pipeline 1812 can be used to perform media operations, an embodiment of GPE 1810 also includes a media pipeline 1816 that is specifically used to perform media operations, such as video post-processing and image enhancement”).
Regarding claim 39, Matveev and Sarel teach all the features with respect to claim 28 as outlined above. Further, Matveev teaches that the method of claim 28, wherein the first set of two or more data transforms are performed on a batch of input data (See Matveev: Fig. 4, and [0060], “Allowing the kernel to operate on multiple input arrays at the same time can reduce the time it takes a batch of inputs to be processed through the convolutional layer of the CNN. The convolution results can be inverse transformed 460 to produce the output 470. In this manner, the process in FIG. 4 can result in a convolution of the batch of inputs of FIG. 3 with the convolution arrays of FIG. 3, without performing the shuffle/reshuffle or the point-wise matrix multiply (or the matrix multiply) that can be performed in the prior art, also reducing the time it takes to process a batch of inputs through the convolution layer of the CNN”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612