DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on February 16, 2021 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the claims and remarks filed on 1/12/2021 and entered via the request for continued examination (RCE) filed on 2/16/2021. Claims 1-20 are pending and have been examined. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 8/20/2021 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the IDS has been considered by the examiner.

Response to Amendment
The amendment filed on 1/12/2021 has been entered via the RCE filed on 2/16/2021. Claims 1, 6, 8, 13, 15 and 20 were amended, and no claims were cancelled or added in the amendment.

Response to Arguments
Applicant's arguments filed 1/12/2021 with respect to the rejections of claims 1-20 under 35 U.S.C. 103 have been fully considered but are moot because the arguments do not apply to the combination of references used in the current rejections. Applicant’s amendments have necessitated the claim objections and rejections under 35 U.S.C. 112(b) and 103 discussed below.
With reference to amended claim 1, applicant states “As amended, independent claim 1 is directed to: … introduce a library to a neural network application, the library comprising machine learning primitives, wherein the machine learning primitives of the library to analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application” (applicant’s remarks, page 8). 
With continued reference to claim 1, applicant asserts “The cited references, alone or in combination, neither anticipate and/or disclose (nor even suggest) an arrangement in which a graphics processor is to: introduce a library to a neural network application to determine, the library comprising machine learning primitives, wherein the machine learning primitives of the library to analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application; and determining, by utilizing the library, an optimal point at which to apply frequency scaling Id. 
Accordingly, applicant appears to argue that the claim limitations added to independent claims 1, 8 and 15, i.e., “the library comprising machine learning primitives, wherein the machine learning primitives of the library to analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application”, are not taught in the portions of Roblek and Dettmers references cited to reject these claims in the previous Office Action. The examiner respectfully disagrees in view of newly-cited reference Tokui et al. (U.S. Patent Application Pub. No. 2018/0349772 A1, hereinafter “Tokui”), and points applicant to the below discussion of Roblek, Dettmers, Tokui and Lambert.
As a preliminary matter, regarding the added “library comprising machine learning primitives” limitation added to the independent claims, paragraph 198 of applicant’s specification states “Machine learning primitives are basic operations that are commonly performed by machine learning algorithms.” Therefore, a “library comprising machine learning primitives”, under the broadest reasonable interpretation 
Regarding the limitation “the library comprising machine learning primitives, wherein the machine learning primitives of the library to analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application” added to independent claims 1, 8 and 15, the examiner points to paragraphs 67-68 of Tokui, which disclose a “method for using calculation libraries such as Caffe … and Theano (http://deeplearning.net/software/theano/). … According to these libraries, by using a dedicated Mini programming language to describe the loss function as a combination of prepared primitives” [i.e., libraries include machine learning primitives]. With continued reference to the above-noted limitation, the examiner also points to paragraph 68 of Tokui, which discloses that “According to these libraries, by using a dedicated Mini programming language … as a combination of prepared primitives, it is possible to automatically obtain a gradient function of the loss function, too. This is because a gradient of each primitive is defined, and therefore a gradient of the entire combination can be also obtained by automatic differentiation. … by using this Mini programming language, the neural network can perform learning by the gradient method by using a gradient function” [i.e., the primitives are used to analyze a pattern in a distributed gradient method/function/synchronization implemented/performed by the neural network application/function].
With reference to the “analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application” limitation added to claims 1, 8 and 15, the examiner points to FIG. 4 and page 63, right col., paragraph 3 of 
As detailed below, Roblek in view of Dettmers, Tokui and Lambert teaches all the remaining limitations of amended independent claims 1, 8 and 15.
Regarding the dependent claims, applicant generally asserts “The remaining dependent claims depend ultimately from one of claims 1, 8, or 15 and are allowable at least by virtue of the dependency on claims 1, 8, or 15 for the claim elements recited separately therein.” (applicant’s remarks, page 10). The examiner respectfully disagrees and points to the combination of Roblek, Dettmers, Tokui and Lambert applied to dependent claims 2, 9 and 16, and to sub-combinations of Roblek, Dettmers, Tokui and Lambert further in view of other references previously applied to the remaining dependent claims.
As discussed in detail below, Roblek in view of Dettmers, Tokui and Lambert teaches the limitations of dependent claims 2, 10 and 16. As further discussed in detail below, various sub-combinations of Roblek, Dettmers, Tokui and Lambert further in 
Applicant’s amendments have necessitated the claim objections and rejections under 35 U.S.C. 112(b) and 103 discussed below. 

Claim Objections
Claims 3 and 13 are objected to because of the following informalities: 
Claim 3 recites “wherein the graphics processor is further to introduce sparse matrix representation” which is grammatically incorrect. The examiner suggests that one way to address this objection would be to amend this claim to recite “wherein the graphics processor is further operable to introduce a sparse matrix representation”. Appropriate correction is required.
Amended claim 13 recites “wherein performing the local error propagation further comprising facilitating weight synchronization” (see, e.g., lines 3-4 of claim 13). This recitation is grammatically incorrect. The examiner suggests that one way to address this objection would be to amend this claim to recite “wherein performing the local error propagation further comprises 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Amended independent claims 1, 8 and 15 each recite “wherein the machine learning primitives of the library to analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application” (see, e.g., lines 6-8 of claim 1). These recitations are grammatically incorrect, missing words, and unclear. In particular, it is unclear whether the recited “machine learning primitives” or “the library” are configured “to analyze a skew pattern” or whether the primitives or library are executable or otherwise usable to perform analysis of the recited “skew pattern”. Also, it is unclear whether the recited “skew pattern” is observed in a specific, single “gradient synchronization implemented by the neural network application”, a general gradient synchronization, or a plurality of gradient synchronizations. For examination purposes, the examiner is interpreting the term “wherein the machine learning primitives of the library to analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application” as “wherein the machine learning primitives of the library are usable to analyze a skew pattern observed in a distributed gradient synchronization implemented by the neural network application”. Appropriate correction is required.
Additionally, amended independent claims 1, 8 and 15 each recite “the distributed gradient synchronization implemented via a tree structure” (see, the last two lines of each of these claims). There is insufficient antecedent basis for this limitation in these claims. Applicant previously introduced “gradient synchronization implemented by 
Also, claims 2-7, 9-14 and 16-20, which depend from claims 1, 8 and 15, respectively, are rejected under 35 U.S.C. 112(b) as being indefinite under the same rationale as claims 1, 8 and 15.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 8, 9, 15 and 16 are rejected are rejected under 35 U.S.C. 103 as being unpatentable over Roblek et al. (U.S. Patent Application Pub. No. 2017/0330586 A1, hereinafter “Roblek”) in view of non-patent literature Dettmers, Tim ("8-bit approximations for parallelism in deep learning." arXiv preprint arXiv:1511.04561 (2015). pp. 1 -14, hereinafter “Dettmers”) and Tokui et al. (U.S. Patent Application Pub.  2018/0349772 A1, hereinafter “Tokui”), and further in view of non-patent literature Lambert et al. ("Adaptive Frequency Neural Networks for Dynamic Pulse and Metre Perception." ISMIR. Schloss Dagstuhl LZI, 2016, hereinafter “Lambert”). Tokui was filed on April 27, 2018 as a national stage application of PCT application no. PCT/JP2016/004027 filed September 2, 2016, and this date is before the effective filing date of the present application, April 28, 2017. Therefore, Tokui constitutes prior art under 35 U.S.C. 102(a)(2).
With respect to claim 1, Roblek discloses the invention as claimed including an apparatus … to detect one or more sets of data from one or more sources over one or more networks (see, e.g., paragraphs 44, 110 and 112, “The logarithmic scale convolutional neural network system 100 is a machine learning system that receives system inputs and generates system outputs from the system inputs. The neural network system 100 can be configured to receive frequency domain features of an audio sample 102 and to generate a respective output 112 based on the input” [i.e., the inputs of audio samples are sets of data from a source of the audio sample], “and input from the user can be received in any form, including acoustic, speech, or tactile input” [i.e., the form of input from a user or client can be a form of speech, acoustic, or audio input], “The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network” [i.e., the communication network receives audio samples from clients through/over the network]); 
introduce a library to a neural network application, the library … and determine, by utilizing the library, an optimal point at which to apply frequency scaling without degrading performance of the neural network application (see, e.g., paragraph 102, “The trained parameters define an optimal convolutional mapping of frequency domain features to multi -scaled frequency domain features (step 706). The other neural network layers in the cascaded convolutional neural network system, e.g., the output layer, are able to select and use appropriate features from a concatenated convolutional neural network stage output, enabling the neural network system to tailor and optimize the convolutional mapping of frequency domain features to multi -scaled frequency domain features to the given task” [i.e., the trained parameters form a library which is introduced to the convolutional neural network/neural network application to define or determine the optimal logarithmic convolutional mapping/optimal point for applying the frequency domain features to multi -scaled frequency domain features (frequency scaling)]; see, e.g., paragraph 28, “However, important information may be lost during the mapping process and a hardcoded fixed-scale mapping may not provide an optimal mapping of frequency domain features for a given task. Therefore, the accuracy and performance of an audio classification system receiving the mapped frequency domain features may be reduced” [i.e., when an optimal mapping is not provided, then the performance of an audio classification system is reduced, as such, the optimal mapping inherently provides a non-reduced or non-degraded/without degrading performance of the classification system]).
Although Roblek substantially discloses the claimed invention, Roblek is not relied on to explicitly disclose a graphics processor to: … determine … the optimal point determined through the distributed gradient synchronization implemented via a tree structure.
a graphics processor to: … the optimal point determined through the distributed gradient synchronization implemented via a tree structure (as indicated above, “the distributed gradient synchronization implemented via a tree structure” has been interpreted as either the previously-introduced “gradient synchronization implemented by the neural network application” or another gradient synchronization that has been “implemented via a tree structure”) (see, e.g., pages 2 and 7-8, sec. 2.1 and 4.1, “In data parallelism, the model is kept constant for all GPUs while each GPU is fed with a different mini-batch. After each pass the gradients are exchanged, i.e. synchronized with each GPU” [i.e., a GPU/graphics processor used for a distributed gradient synchronization], “In our work we show that we can use 8-bit gradients for the parameter updates without degrading performance. However, dynamic fixed point data types can also be used for end-to-end training and as such a combination of both methods might yield optimal performance … Although our 8-bit data type with dynamic binary tree achieves better approximation, it cannot be used in fixed point computation and thus remains useful solely as an intermediate approximate representation” [i.e., the 8-bit gradients are used for updates which keep the most optimal performance (optimal point) wherein the 8-bit gradients are within a binary tree (tree structure)]).
Roblek and Dettmers are analogous art because they are both directed to gradient synchronizations within a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Roblek to incorporate the teachings of Dettmers in order to modify the apparatus of a convolutional neural network system 
	Doing so would enable Roblek to use “8-bit approximation [that] is able to circumnavigate problems with large batch sizes for GPU clusters and thus improves convergence rates in convolutional networks”, as suggested by Dettmers (See, e.g., Dettmers, page 2, 3rd bullet point).
Although Roblek in view of Dettmers substantially teaches the claimed invention, Roblek in view of Dettmers is not relied on to teach the library comprising machine learning primitives, wherein the machine learning primitives of the library to analyze a … pattern observed in distributed gradient synchronization implemented by the neural network application.
In the same field, analogous art Tokui teaches the library comprising machine learning primitives (paragraph 198 of applicant’s specification states “Machine learning primitives are basic operations that are commonly performed by machine learning algorithms.” Therefore, a “library comprising machine learning primitives”, under the broadest reasonable interpretation (BRI), is a library that includes operations, such as programs or functions, that can be performed by machine learning algorithms) (see, e.g., paragraphs 67-68, “method for using calculation libraries such as Caffe … and Theano (http://deeplearning.net/software/theano/). … According to these libraries, by using a dedicated Mini programming language to describe the loss function as a combination of prepared primitives” [i.e., libraries include machine learning primitives]), wherein the machine learning primitives of the library to analyze a … pattern observed in distributed gradient synchronization implemented by the neural network application (as indicated above, “wherein the machine learning primitives of the library to analyze a … pattern observed in distributed gradient synchronization implemented by the neural network application” has been interpreted as “wherein the machine learning primitives of the library are usable to analyze a … pattern observed in a distributed gradient synchronization implemented by the neural network application”) (see, e.g., paragraph 68, “According to these libraries, by using a dedicated Mini programming language … as a combination of prepared primitives, it is possible to automatically obtain a gradient function of the loss function, too. This is because a gradient of each primitive is defined, and therefore a gradient of the entire combination can be also obtained by automatic differentiation. … by using this Mini programming language, the neural network can perform learning by the gradient method by using a gradient function” [i.e., the primitives are used to analyze a pattern in a distributed gradient method/function/synchronization implemented/performed by the neural network application/function]).
Roblek, Dettmers and Tokui are analogous art because they are directed to gradient synchronizations in a neural network and performing machine learning “by the gradient method by using a gradient function” within a neural network (See, e.g., Tokui, paragraph 68).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Roblek in view of Dettmers to incorporate the teachings of Tokui in order to provide “a dedicated Mini programming 
Although Roblek in view of Dettmers and Tokui substantially teaches the claimed invention, Roblek in view of Dettmers and Tokui is not relied on to teach analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application. 
In the same field, analogous art Lambert teaches analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application (as indicated above, “analyze a skew pattern observed in distributed gradient synchronization implemented by the neural network application” has been interpreted as “analyze a skew pattern observed in a distributed gradient synchronization implemented by the neural network application”) (see, e.g., FIG. 4 and page 63, right col., paragraph 3, “We have introduced this rule to ensure the AFNN retains a spread of frequencies (and thus metrical structure) across the gradient. The force is relative to natural frequency, and can be scaled through the ϵh parameter. By balancing the adaptive (ϵf) and elastic (ϵh) parameters, the oscillator frequency is able to entrain to a greater range of frequencies, whilst also returning to its natural frequency (ω0) when the stimulus is removed. Figure 4 shows the frequencies adapting over time in the AFNN under sinusoidal input” [i.e., the gradients as part of the Adaptive Frequency Neural Network/AFNN correspond to the skew characteristics associated 
Roblek, Dettmers, Tokui and Lambert are analogous art because they are directed to gradient synchronizations within a neural network and performing machine learning “by the gradient method by using a gradient function” within a neural network (See, e.g., Tokui, paragraph 68).
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek in view of Dettmers and Tokui to incorporate the balancing to the natural frequency by the adaptive frequency neural network of Lambert. 
	Doing so would have “significantly improve[d] responses of AFNNs compared to GFNNs to stimuli with both steady and varying pulse frequencies”, which “leads us to believe that AFNNs could replace the linear filtering methods commonly used in beat tracking and tempo estimation systems, and lead to more accurate methods”, as suggested by Lambert (See, e.g., Lambert, page 60, Abstract).

With respect to independent claim 8, claim 8 is substantially similar to claim 1 and therefore is rejected on the same ground as claim 1, discussed above. In particular, claim 8 is a method claim that corresponds to the apparatus of claim 1. 
	In addition, Roblek further discloses a method (see, e.g., paragraph 42, “This specification describes methods for learning variable size convolutions on a linear spectrogram”).

With respect to independent claim 15, this claim is substantially similar to claim 1 and therefore is rejected on the same ground as claim 1, discussed above. Claim 15 is a machine-readable medium claim that corresponds to the apparatus of claim 1. 
	In addition, Roblek further discloses At least one non-transitory machine-readable medium comprising instructions that when executed by a local computing device, cause the local computing device to perform operations (see, e.g., paragraph 103, “The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them”).

Regarding claim 2, as discussed above, Roblek in view of Dettmers, Tokui and Lambert teaches the apparatus of claim 1.
Although Roblek substantially discloses the claimed invention, Roblek is not relied on to explicitly disclose wherein the optimal point is determined through gradient synchronization using the tree structure such that local weight vectors start at one or more nodes represented as leaves of the tree structure and communicate up to a root of the tree structure, .
In the same field, analogous art Dettmers teaches wherein the optimal point is determined through gradient synchronization using the tree structure (see, e.g., such that local weight vectors start at one or more nodes represented as leaves of the tree structure and communicate up to a root of the tree structure (see, e.g., pages 4 and 7 - sec. 3.1 and 4.1, and 14 - paragraph 1, “In order to decrease this error, we can use the bits of the mantissa to represent a binary tree with interval (0.1, 1) which is bisected according to the route taken through the tree; the children thus represent the start and end points for intervals in a bisection method. With this method we can cover a broader range of numbers with the mantissa and can thus reduce the average relative error” [i.e., the mantissa includes bits that present a binary tree (tree structure) with an interval to take route through the tree (including leaves and roots communicated throughout the binary tree)], “Dynamic fixed point data types are data types which use all their bits for the mantissa and have a dynamic exponent which is kept for collection of numbers (matrix, vector) and is adjusted during run-time … In our work we show that 
Roblek and Dettmers are analogous art because they are directed to gradient synchronizations within a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Roblek to incorporate the teachings of Dettmers in order to modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek in view of Dettmers to incorporate the gradient synchronization while the binary tree is traversed through by routes wherein bits are given weights by exponents (vectors) of Dettmers. 
	Doing so would enable Roblek to use “8-bit approximation [that] is able to circumnavigate problems with large batch sizes for GPU clusters and thus improves rd bullet point).
Although Roblek in view of Dettmers and Tokui substantially teaches the claimed invention, Roblek in view of Dettmers and Tokui is not relied on to teach wherein the library accounts for skew characteristics associated with the gradient synchronization to decide a core frequency.
	In the same field, analogous art Lambert teaches wherein the library accounts for skew characteristics associated with the gradient synchronization to decide a core frequency (see, e.g., page 63, right col., Par. 3, “We have introduced this rule to ensure the AFNN retains a spread of frequencies (and thus metrical structure) across the gradient. The force is relative to natural frequency, and can be scaled through the ϵh parameter. By balancing the adaptive (ϵf) and elastic (ϵh) parameters, the oscillator frequency is able to entrain to a greater range of frequencies, whilst also returning to its natural frequency (ω0) when the stimulus is removed. Figure 4 shows the frequencies adapting over time in the AFNN under sinusoidal input” [i.e., the gradients as part of the Adaptive Frequency Neural network corresponds to the skew characteristics associated with the gradient synchronization, which balances to the natural frequency (core frequency)]).
Roblek, Dettmers, Tokui and Lambert are analogous art because they are directed to gradient synchronizations within a neural network and performing machine learning “by the gradient method by using a gradient function” within a neural network (See, e.g., Tokui, paragraph 68).

	Doing so would “significantly improve responses of AFNNs compared to GFNNs to stimuli with both steady and varying pulse frequencies” and “leads us to believe that AFNNs could replace the linear filtering methods commonly used in beat tracking and tempo estimation systems, and lead to more accurate methods” (See, e.g., Lambert, page 60, Abstract).

Regarding claim 9, this claim is substantially similar to claim 2 and therefore is rejected on the same ground as claim 2, discussed above. In particular, claim 9 is a method claim that corresponds to the apparatus of claim 2. 

Regarding Claim 16, claim 16 is substantially similar to claim 2 and therefore is rejected on the same ground as claim 2. Claim 16 is a machine-readable medium claim that corresponds to the apparatus of claim 2. 

Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek in view of Dettmers, Tokui and Lambert as applied to claims 1, 8 and 15 above, and further in view of Chen et al. (US 2016/0293167, hereinafter “Chen”).

Although Roblek substantially discloses the claimed invention, Roblek is not relied on to explicitly disclose wherein the graphics processor is further to introduce sparse matrix representation.
In the same field, analogous art Dettmers teaches wherein the graphics processor is further to introduce sparse matrix representation (see, e.g., page 2 sec. 2.1, “Scaling limitations: Current GPU implementations are optimized for larger matrices, hence data parallelism does not scale indefinitely due to slow matrix operations (especially matrix multiplication) for small mini-batch sizes (< 128 per GPU)” [i.e., the GPU implementations are optimized for larger matrices or sparse matrices]).
Roblek and Dettmers are analogous art because they are directed to gradient synchronizations within a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Roblek to incorporate the teachings of Dettmers in order to modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek to incorporate the GPU implementations optimized for larger matrices of Dettmers. 
	Doing so would enable Roblek to use “8-bit approximation [that] is able to circumnavigate problems with large batch sizes for GPU clusters and thus improves convergence rates in convolutional networks”, as suggested by Dettmers (See, e.g., Dettmers, page 2, 3rd bullet point).
 wherein the … sparse matrix representation for weights to overlap communication and computation across multiple nodes associated with neural network application to reduce communication costs.
In the same field, analogous art Chen teaches wherein the … sparse matrix representation for weights to overlap communication and computation across multiple nodes associated with neural network application to reduce communication costs (see, e.g., Fig. 3 and paragraph 68, “Here k denotes the number of nodes of the rest of the hidden layers in the network. Note by comparing (2) and (3) that the variables flcn and n offer finer control over the number of parameters in the network. The first two hidden layers are influenced by flcn while remaining hidden layers have k2 weights. One interpretation of local connections is that they enforce patch-based sparse matrices when training; given the sparse filters in the first fully-connected hidden layer, e.g., as illustrated in FIG. 3, local connections are a natural fit” [i.e., the sparse matrices in each layer of the neural network depicted in Fig. 3 corresponds to the sparse matrix representation including non-zero weights which are patch-based connections of the multiple nodes (each box of a layer) of the convolution neural network]; see, e.g., paragraph 64, “This is important because parallel SIMD operations may be heavily relied upon in implementations of the techniques described herein to efficiently compute neural nets using small dense matrices rather than large, and sparse matrices. In some examples, LCN and CNN layers may be leveraged to take advantage of the sparse and local nature of the DNN to constrain the model size while improving 
Roblek, Dettmers, Tokui, Lambert and Chen are analogous art because they are directed to matrix representations of nodes or parameters of convolutional neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek in view of Dettmers, Tokui and Lambert to incorporate the overlapping and product of parameters of the convolutional neural network through sparse matrices of Chen. 
	Doing so would enable Roblek in view of Dettmers, Tokui and Lambert to “reduce the total model footprint, for example, to 30% of the original size compared to a baseline fully-connected DNN, generally with reduced latency and minimal impact in 

Regarding claim 10, claim 10 is substantially similar to claim 3 and therefore is rejected on the same ground as claim 3, discussed above. In particular, claim 10 is a method claim that corresponds to the apparatus of claim 3. 

Regarding claim 17, this claim is substantially similar to claim 3 and therefore is rejected on the same ground as claim 3. In particular, claim 17 is a machine-readable medium claim that corresponds to the apparatus of claim 3. 

Claims 4, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek in view of Dettmers, Tokui and Lambert as applied to claims 1, 8 and 15 above, and further in view of Britt et al. (U.S. Patent Application Pub. No. 2008/0092181, hereinafter “Britt”).
	
Regarding claim 4, as discussed above, Roblek in view of Dettmers, Tokui and Lambert teaches the apparatus of claim 1.
Although Roblek substantially discloses the claimed invention, Roblek is not relied on to explicitly disclose wherein the graphics processor is further to …
In the same field, analogous art Dettmers teaches wherein the graphics processor is further to (see, e.g., page 2 sec. 2.1, “Scaling limitations: Current GPU implementations are optimized for larger matrices, hence data parallelism does not 
Roblek and Dettmers are analogous art because they are directed to gradient synchronizations within a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Roblek to incorporate the teachings of Dettmers in order to modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek to incorporate the gradient synchronization for an optimal performance through the updating of parameters within a GPU of Dettmers. 
	Doing so would enable Roblek to use “8-bit approximation [that] is able to circumnavigate problems with large batch sizes for GPU clusters and thus improves convergence rates in convolutional networks”, as suggested by Dettmers (See, e.g., Dettmers, page 2, 3rd bullet point).

Although Roblek in view of Dettmers, Tokui and Lambert substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui and Lambert is not relied on to teach wherein the … processor is further to automatically analyze failed execution of programs including or relevant to the neural network application to obtain insights on one or more faults of hardware performance encounters.
wherein the … processor (see, e.g., paragraph 36, “In one embodiment, the apparatus comprises: a processor; a storage device in data communication with the processor”) is further to automatically analyze failed execution of programs including or relevant to the neural network application to obtain insights on one or more faults of hardware performance encounters (see, e.g., paragraph 136, “In one embodiment, the storage devices 204 each comprise a redundant array (e.g., RAID) device, and when coupled with the fault tolerance, self monitoring, self-healing, and automatic communication channel fail-over (in the event of a hardware or software failure or loss of channel) of the illustrated architecture 200, provide a highly redundant and reliable configuration” [i.e., redundant array device corresponds to the debugging logic which is coupled with the automatic communication channel fail-over which detects for failed executions (failure or loss of channel) within the hardware for performance]; see, e.g., paragraph 98, “As used herein, the term " speech recognition" refers to any methodology or technique by which human or other speech can be interpreted and converted to an electronic or data format or signals related thereto … Phoneme/word recognition, if used, may be based on HMM (hidden Markov modeling), although other processes such as, without limitation, DTW (Dynamic Time Warping) or NNs (Neural Networks) may be used” [i.e., the program associated with the neural network application is the speech recognition system]).
Roblek, Dettmers, Tokui, Lambert and Britt are analogous art because they are directed to the analysis of speech (see, e.g., Tokui, paragraphs 119 and 131) and audio using neural networks.
 to incorporate the detection of failure of programs within a hardware of Britt. 
	Doing so would enable of Roblek in view of Dettmers, Tokui and Lambert to “provide a highly redundant and reliable configuration”, as suggested by Britt (See, e.g., Britt, paragraph 136]).

Regarding claim 11, this claim is substantially similar to claim 4 and therefore is rejected on the same ground as claim 4, discussed above. In particular, claim 11 is a method claim that corresponds to the apparatus of claim 4. 

Regarding claim 18, claim 18 is substantially similar to claim 4 and therefore is rejected on the same ground as claim 4, discussed above. In particular, claim 18 is a machine-readable medium claim that corresponds to the apparatus of claim 4. 

Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek in view Dettmers, Tokui, Lambert and Britt as applied to claims 4, 11, and 18 above, and further in view of non-patent literature Wong et al. (Wong, W. Eric, et al. "Effective software fault localization using an RBF neural network." IEEE Transactions on Reliability 61.1 (2011): 149-169, hereinafter “Wong”).

Although Roblek substantially discloses the claimed invention, Roblek is not relied on to explicitly disclose wherein the graphics processor is further to provide one or more of successful execution information.
In the same field, analogous art Dettmers teaches wherein the graphics processor is further to provide one or more of successful execution information (see, e.g., page 2 sec. 2, “To understand the properties of a successful parallel deep learning algorithm, it is necessary to understand how the communication between GPUs works and what the bottlenecks for both model and data parallelized deep learning architectures are” [i.e., the successful parallel deep learning algorithms correspond to the successful execution information done by the GPU (graphic processor)].
Roblek and Dettmers are analogous art because they are both directed to gradient synchronizations within a neural network.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Roblek to incorporate the teachings of Dettmers in order to modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek to incorporate the gradient synchronization for an optimal performance through the updating of parameters within a GPU of Dettmers. 
rd bullet point).
Although Roblek in view of Dettmers, Tokui, Lambert and Britt substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui, Lambert and Britt is not relied on to teach wherein the … processor is further to provide one or more of successful execution information obtained from successful execution of programs and failed execution information obtained from failed execution of programs to a trained network model to seek out one or more of the hardware performance counters that are regarded as faulty or outside a range of approval.
In the same field, analogous art Wong teaches wherein the … processor is further to provide one or more of successful execution information obtained from successful execution of programs and failed execution information obtained from failed execution of programs to a trained network model (see, e.g., page 150, left col., paragraph 2, “A typical RBF neural network has a three-layer feed-forward structure that can be trained to learn an input-output relationship based on a data set. In this paper, the input is the statement coverage of a test case which indicates how the program is executed by the test case, and the output is the result (success or failure) of the corresponding program execution. Once the network has been trained, the coverage of a virtual test case with only one statement covered1 is used as an input to compute the suspiciousness of the corresponding statement in terms of its likelihood of containing bugs” [i.e., the RBF neural network is used to learn the input-output to seek out one or more of the hardware performance counters that are regarded as faulty or outside a range of approval (see, e.g., page 157, right col. last paragraph, “For a fair comparison, we compute the effectiveness of both techniques (RBF, and Crosstab) using the same data. Note that statistics such as fault revealing behavior and statement coverage of each test can vary under different compilers, operating systems, and hardware platforms” [i.e., the RBF neural network is used to seek faults within a system or hardware platform]).
Roblek, Dettmers, Tokui, Lambert, Britt, and Wong are analogous art because they are directed to the optimization or yielding of a best performance of neural network.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature with failure detection of programs of Roblek in view of Dettmers, Tokui and Lambert and further in view of Britt to incorporate the output of success or failure of the execution of programs used as input to the trained RBF neural network of Wong. 
	Doing so would enable Roblek in view of Dettmers, Tokui and Lambert and further in view of Britt to be “more effective at locating bugs, in that a relatively smaller amount of code needs to be examined to find bugs, compared to other state of the art contemporary techniques”, as suggested by Wong (See, e.g., Wong, page 149, Introduction, paragraph 1).



Regarding Claim 19, claim 19 is substantially similar to claim 5 and therefore is rejected on the same ground as claim 5, discussed above. In particular, claim 19 is a machine-readable medium claim that corresponds to the apparatus of claim 5. 

Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek in view of Dettmers, Tokui and Lambert as applied to claims 1 and 8 above, in view of Sugiura et al. (U.S. Patent Application Pub. No. 2019/0095757, hereinafter “Sugiura”) and further in view of non-patent literature Anwar et al. ("Fixed point optimization of deep convolutional neural networks for object recognition." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, hereinafter “Anwar”). Sugiura was filed on October 31, 2018 as a national stage application of PCT application no. PCT/JP2017/044295 filed December 11, 2017, and this date is before the effective filing date of the present application, April 28, 2017. Therefore, Sugiura constitutes prior art under 35 U.S.C. 102(a)(2).
Regarding Claim 6, as discussed above, Roblek in view of Dettmers, Tokui and Lambert teaches the apparatus of claim 1.
Although Roblek substantially discloses the claimed invention, Roblek is not relied on to explicitly disclose wherein the graphics process is further to perform local error propagation.
wherein the graphics process is further to perform local error propagation (see, e.g., page 6 sec. 3.4, “Since we only had one GPU available for the following experiments, we simulated training on a large GPU cluster by only using the pure 8-bit approximation gradient component by training on a single GPU – so no 32-bit gradients or activations where used. On MNIST, we found that the best test error of all four approximation techniques static tree, dynamic tree, linear quantization, and mantissa did not differ significantly from the test error of 32-bit training for both data parallelism F(4, 4) = 0.71, p = 0.59, and model parallelism F(4, 4) = 0.54, p = 0.71 (F-test assumptions were satisfied); also the 99% confidence intervals did overlap for all techniques” [i.e., the GPU corresponds to the graphics processor that is used to test error of the approximation techniques (local error propagation of the binary tree)]).
Roblek and Dettmers are analogous art because they are directed to gradient synchronizations within a neural network.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature using matrix representation for parameters of the neural network of Roblek in view of Dettmers to incorporate the error testing of the bits in the binary tree of Dettmers. 
	Doing so would enable Roblek to use “8-bit approximation [that] is able to circumnavigate problems with large batch sizes for GPU clusters and thus improves convergence rates in convolutional networks”, as suggested by Dettmers (See, e.g., Dettmers, page 2, 3rd bullet point).
 further … to perform local error propagation by computing high precision and low precision for local weights and compute local errors at each of the multiple nodes, wherein performing local error propagation further comprises facilitating weight synchronization across the multiple nodes to track the local errors for accuracy and reduced communication.
In the same field, analogous art Sugiura teaches the apparatus further … to perform local error propagation by … compute local errors at each of the multiple nodes (see, e.g., paragraph 49, “In the learning processing of deep learning, a weight value of the coupling (synaptic coupling) between nodes configuring the neural network is updated using a known algorithm (for example, in a reverse error propagation method, adjust and update the weight value so as to reduce the error from the correct at the output layer, or the like). An aggregate of the weight values between the nodes on which the learning process is completed is called a "learned model". By applying the learned model to a neural network having the same configuration as the neural network used in the learning process (setting as the weight value of inter -node coupling), it is possible to output correct data with a constant precision as output data (recognition result) when inputting unknown input data, i.e., new input data not used in learning processing, into the neural network” [i.e., the weight value between nodes corresponds to the local weights being updated using error propagation to reduce error at the output layer of the neural network which provides constant precision with inputting unknown data]), wherein performing local error propagation further comprises facilitating weight synchronization across the multiple nodes to track the local errors for accuracy and reduced communication (see, e.g., paragraph 49, “In the learning processing of deep learning, a weight value of the coupling (synaptic coupling) between nodes configuring the neural network is updated using a known algorithm (for example, in a reverse error propagation method, adjust and update the weight value so as to reduce the error from the correct at the output layer, or the like). An aggregate of the weight values between the nodes on which the learning process is completed is called a "learned model". By applying the learned model to a neural network having the same configuration as the neural network used in the learning process (setting as the weight value of inter -node coupling), it is possible to output correct data with a constant precision as output data (recognition result) when inputting unknown input data, i.e., new input data not used in learning processing, into the neural network” [i.e., the aggregating of weight values corresponds to the weight synchronization across the nodes of the neural network to reduce the error (reduced communication) by tracking and outputting correct data (accuracy)]).
Roblek, Dettmers, Tokui, Lambert and Sugiura are analogous art because they are directed to using techniques of back propagation on a neural network.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature with failure detection of programs of Roblek in view of Dettmers, Tokui and Lambert to incorporate the error propagation at multiple nodes and aggregating the weights of each node to reduce errors and output correct data of Sugiura. 

Although Roblek in view of Dettmers, Tokui, Lambert and Sugiura substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui and Lambert and Sugiura is not relied on to teach 
	However, Roblek in view of Dettmers do not explicitly teach to perform local error propagation by computing high precision and low precision for local weights.
	In the same field, analogous art Anwar teaches to perform local error propagation by computing high precision and low precision for local weights (see, e.g., page 1133, sec. 3, “During training we keep parameters in both high and low precision. We set aside 5000 training samples for validation…We start with a high precision pre trained network and obtain a quantized network using L2 error minimization. Then the inputs are fed forward via the network with the low precision weights…The output error is back propagated via low precision weights. The computed change in weights is added to the high precision weights. Thus we obtain new high precision weights. This process is iterated for several mini-batches and epochs. During training the selection of mini-batch size is important. Generally CNN employs the stochastic gradient descent (SGD) algorithm, where conventionally the minibatch size is one and weights are updated after each sample” [i.e., the output error of the CNN is back propagated by the use of low precision weights wherein the computed change in weights is added to high precision weights]).

It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature with failure detection of programs and error propagation of Roblek in view of Dettmers, Tokui, Lambert and further in view of Sugiura to incorporate the error propagation of computing high and low precision weights of Anwar. 
	Doing so would enable Roblek in view of Dettmers, Tokui, Lambert and further in view of Sugiura to induce “sparsity in the network which reduces the effective number of network parameters and improves generalization” and “reduces the required memory storage by a factor of 1/10 and achieves better classification results than the high precision networks”, as suggested by Anwar (See, e.g., Anwar, Abstract).

Regarding claim 13, claim 13 is substantially similar to claim 6 and therefore is rejected on the same ground as claim 6, discussed above. In particular, claim 13 is a method claim that corresponds to the apparatus of claim 6. 

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek in view of Dettmers, Tokui and Lambert as applied to claims 1 and 8 above, and further in view of Ray et al. (U.S. Patent Application Pub. No. 20180293102, hereinafter “Ray”).

Although Roblek in view of Dettmers, Tokui and Lambert substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui and Lambert is not relied on to teach wherein the apparatus comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment, wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package.
In the same field, analogous art Ray teaches wherein the apparatus comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package (see, e.g., paragraph 144, “Computing device 600 may further include (without limitations) an autonomous machine or an artificially intelligent agent, such as a mechanical agent or machine, an electronics agent or machine, a virtual agent or machine, an electro-mechanical agent or machine, etc.” [i.e., the computing device corresponds to the autonomous machine]), wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package (see, e.g., paragraph 353, “Example 7 includes the subject matter of Examples 1-6, 
Roblek, Dettmers, Tokui, Lambert and Ray are analogous art because they are directed to the analysis of speech and audio using neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature of Roblek in view of Dettmers, Tokui, Lambert to incorporate the autonomous machine including a graphics processor coupled with a semiconductor of Ray. 
	Doing so would allow manufacturers of the systems of Roblek in view of Dettmers, Tokui, Lambert to “maximize the amount of parallel processing in the graphics pipeline” and “attempt to execute program instructions synchronously together as often as possible to increase processing efficiency” wherein the “efficiency provided by parallel machine learning algorithm implementations allows the use of high capacity networks and enables those networks to be trained on larger datasets”, as suggested by Ray (See, e.g., Ray, paragraph 4).

Regarding claim 14, this claim is substantially similar to claim 7 and therefore is rejected on the same ground as claim 7, discussed above. In particular, claim 14 is a method claim that corresponds to the apparatus of claim 7. 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Roblek in view Dettmers, Tokui and Lambert as applied to claim 15 above, in view of Sugiura in view of Anwar, and further in view of Ray.

Regarding claim 20, as discussed above, Roblek in view of Dettmers, Tokui and Lambert teach the machine-readable medium of claim 15.
Although Roblek in view of Dettmers, Tokui and Lambert substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui and Lambert is not relied on to teach wherein the operations further comprise performing local error propagation by computing high precision and low precision for local weights and compute local errors at each of the multiple nodes, wherein performing the local error propagation further comprises facilitating weight synchronization across the multiple nodes to track the local errors for accuracy and reduced communication, wherein the computing device comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment, wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package.
In the same field, analogous art Sugiura teaches wherein the operations further comprise performing local error propagation by … compute local errors at each of the multiple nodes (see, e.g., paragraph 49, “In the learning processing of deep learning, a weight value of the coupling (synaptic coupling) between nodes configuring the neural network is updated using a known algorithm (for example, in a  [i.e., the weight value between nodes corresponds to the local weights being updated using error propagation to reduce error at the output layer of the neural network which provides constant precision with inputting unknown data]), wherein performing the local error propagation further comprises facilitating weight synchronization across the multiple nodes to track the local errors for accuracy and reduced communication (see, e.g., paragraph 49, “In the learning processing of deep learning, a weight value of the coupling (synaptic coupling) between nodes configuring the neural network is updated using a known algorithm (for example, in a reverse error propagation method, adjust and update the weight value so as to reduce the error from the correct at the output layer, or the like). An aggregate of the weight values between the nodes on which the learning process is completed is called a "learned model". By applying the learned model to a neural network having the same configuration as the neural network used in the learning process (setting as the weight value of inter -node coupling), it is possible to output correct data with a constant precision as output data (recognition result) when inputting unknown input data, i.e., new input data not used in learning 
Roblek, Dettmers Tokui, Lambert, and Sugiura are analogous art because they are each directed to using techniques of back propagation on a neural network.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the machine-readable medium of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature with failure detection of programs of Roblek in view of Dettmers, Tokui and Lambert to incorporate the error propagation at multiple nodes and aggregating the weights of each node to reduce errors and output correct data of Sugiura. 
	Doing so would enable Roblek in view of Dettmers, Tokui and Lambert to “adjust and update the weight value so as to reduce the error from the correct at the output layer” (See, e.g., Sugiura, paragraph 49).
	Although Roblek in view of Dettmers, Tokui and Sugiura substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui and Sugiura is not relied on to teach perform local error propagation by computing high precision and low precision for local weights … wherein the computing device comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment, wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package.
 teaches perform local error propagation by computing high precision and low precision for local weights (see, e.g., page 1133, sec. 3, “During training we keep parameters in both high and low precision. We set aside 5000 training samples for validation…We start with a high precision pre trained network and obtain a quantized network using L2 error minimization. Then the inputs are fed forward via the network with the low precision weights …The output error is back propagated via low precision weights. The computed change in weights is added to the high precision weights. Thus we obtain new high precision weights. This process is iterated for several mini-batches and epochs. During training the selection of mini-batch size is important. Generally CNN employs the stochastic gradient descent (SGD) algorithm, where conventionally the minibatch size is one and weights are updated after each sample” [i.e., the output error of the CNN is back propagated by the use of low precision weights wherein the computed change in weights is added to high precision weights]).
Roblek, Dettmers, Tokui, Lambert, Sugiura and Anwar are analogous art because they are each directed to using techniques of back propagation on a neural network.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature with failure detection of programs and error propagation of Roblek, Dettmers, Tokui and Lambert, and further in view of Sugiura to incorporate the error propagation of computing high and low precision weights of Anwar. 

Although Roblek in view of Dettmers, Tokui, Sugiura and Anwar substantially teaches the claimed invention, Roblek in view of Dettmers, Tokui, Sugiura and Anwar is not relied on to teach wherein the computing device comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment, wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package.
In the same field, analogous art Ray teaches wherein the computing device comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment (see, e.g., paragraph 144, “Computing device 600 may further include (without limitations) an autonomous machine or an artificially intelligent agent, such as a mechanical agent or machine, an electronics agent or machine, a virtual agent or machine, an electro-mechanical agent or machine, etc.” [i.e., the computing device corresponds to the autonomous machine]), wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package (see, e.g., paragraph 353, “Example 7 includes the subject matter of Examples 1-6, wherein the graphics 
Roblek, Dettmers, Tokui, Lambert, Sugiura, Anwar, and Ray are analogous art because they are each directed to the analysis of speech (see, e.g., Tokui, paragraphs 119 and 131) and audio using neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date modify the apparatus of a convolutional neural network system which uses an optimal mapping for a multi-scaled frequency domain feature of Roblek in view of Dettmers, Tokui, Lambert, Sugiura and further in view Anwar to incorporate the autonomous machine including a graphics processor coupled with a semiconductor of Ray. 
	Doing so would allow manufacturers of a systems such as those of Roblek in view of Dettmers, Tokui, Lambert, Sugiura and further in view Anwar to “maximize the amount of parallel processing in the graphics pipeline” and “attempt to execute program instructions synchronously together as often as possible to increase processing efficiency” wherein the “efficiency provided by parallel machine learning algorithm implementations allows the use of high capacity networks and enables those networks to be trained on larger datasets”, as suggested by Ray (See, e.g., Ray, paragraph 4).

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure. The examiner requests, in response to this office action, support be shown for language added to any original claims on  line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business 




/R.K.B./Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125