DETAILED ACTION
This action is in response to the claims filed 06/22/2022 for application 16/204,549. Claims 1, 3-8 and 15 have been amended. Claims 1, 3-8, 10-15, and 17-21 are currently pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/22/2022 has been entered.
 Claim Objections
Claims 1, 8 and 15 is objected to because of the following informalities:  "multidmentional" should read "multidimensional".  Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Peemen et al. ("Memory-Centric Accelerator Design for Convolutional Neural Networks", hereinafter "Peemen") in view of Luo et al. ("Canny Edge Detection on NVIDIA CUDA", hereinafter "Luo").

Regarding claim 1, Peemen discloses At least one non-transitory machine-readable medium comprising instructions which, when executed by a computing device, cause the computing device to perform operations (“A challenging problem for the design of efficient accelerators is the limited amount of external memory bandwidth. We show that the effects of the memory bottleneck can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload. The efficiency of the on-chip memories is maximized by our scheduler that uses tiling to optimize for data locality” [Abstract; use of memory and computers is implicit]) comprising:
 generating a multifunction perceptron architecture having a plurality of neurons to perform one or more neuron functions in a machine learning environment, wherein the plurality of neurons includes one or more of splitter neurons, mixer neurons, and counter neurons (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features. In contrast, a CNN combines the two steps in a single trainable model. We outline the model with an example of a speed sign recognition application for a car driver support system [6]. The architecture of the model is depicted in Figure 2. First, the network performs feature extraction by a cascade of trainable convolution and subsample operations. In these layers simple features such as edges are extracted, which are combined in the next layers to detect more complex features such as corners or crossings. Secondly, the features, represented as feature maps, are classified by feed forward neural network layers. The final outputs describe whether there is a sign and to which category it belongs.” [pg. 14, A. Algorithm, ¶1; CNN would be equivalent to a multifunction perceptron architecture. Under BRI, the claim recites “one or more of” so the examiner is interpreting features being combined in the next layer to be equivalent to “mixer neurons”.]), wherein the plurality of neurons include heterogenous neurons (“Due to the flexibility of the reuse buffers subsampling factors are directly supported. If (1) contains a subsample factor S > 1 the parallel feature map neurons are not direct neighbors. As a result a different pattern must be send to the PEs, e.g. S=2, [x00 x20 x40 x60], [x10 x30 x50 x70], etc.” [pg. 16, § 4) Column Based storage with subsampling; Implies neurons within the architecture are heterogenous.]), wherein the plurality of neurons represent a plurality of nodes associated with software threads or hardware threads, wherein the software threads are facilitated by one or more processors including one or more homogenous processor or heterogenous processors (“Due to the small memory bandwidth of the MicroBlaze core, the accelerator of the previous section does not scale beyond 20 MACC PEs.” [pg. 19, § B Memory bandwidth limitations; note: Under BRI the claim recites: one or more processors thus the Examiner only needs to cite one processor therefore the examiner is interpreting MicroBlaze processor to correspond to the one or more processors. The accelerator would imply multiple different processors which would correspond to “heterogenous processors”. Note: The claim recites software threads or hardware threads, under BRI, the examiner does not need to cite both. Software threads is equivalent to the operating system executing the software program. See pg. 13, right col, top para, while hardware threads can be interpreted as the physical CPU core executing operations as cited below.]), and wherein the hardware threads are associated with the one or more processors through sequential circuitry (“Each PE sequentially computes a neuron value in a feature map.” [pg. 15, CNN Accelerator Template, ¶2; note: PE represents processing element, sequential circuitry is shown in Fig. 10 (bottom circuit).])
Although Peemen suggests software/hardware threads, the reference fails to explicitly teach wherein the plurality of neurons further include extractor neurons to support edge detection of edges associated with multidimensional channels to facilitate computation of image gradients in one or more pixel positions of a multidimensional output.
Luo also teaches software/hardware threads (“Under CUDA the GPU is a compute device that is a highly multithreaded coprocessor. A thread block is a batch of threads that executes on a multiprocessor that have access to its local memory. They perform their computations and become idle when they reach a synchronization point, waiting for other threads in the block to reach that point. Each thread is identified by its thread ID (one, two or three indices). The choice of 1,2 or 3D index layout is used to map the different pieces of data to the thread. The programmer writes data parallel code, which executes the same instructions on different data, though some customization of each thread is possible based on different behaviors depending on the value of the thread indices.” [pg. 2, ¶3])
	Luo teaches wherein the plurality of neurons further include extractor neurons to support edge detection of edges (“In this paper we focus on a GPU implementation of the Canny edge detector. This algorithm has remained a standard in edge finding techniques over the years. Applications of edge detection include their use as features in vision algorithms, their use to improve the appearance of displayed objects, in image coding and others too numerous to discuss” [pg. 2, § 1.2, ¶1; computer vision algorithms imply the use of neurons.]) associated with multidimensional channels (“The image data must be in a linear 24 bit per pixel format. Every “pixel” must be constrained to 3 channels of 8bit width each. The suggested color space is RGB since the Canny algorithm is best suited for grayscale images converted from RGB space. The image width and height must be of a multiple of 16 pixels to fit global memory access alignment properties” [pg. 3, 3.1, ¶1]) to facilitate computation of image gradients in one or more pixel positions of a multidimensional output (“To smooth the image, the Canny edge detector uses Gaussian convolution. Next, the image is convolved with a 2D first derivative operator to determine regions of sharp changes in intensities. The gradient magnitude and direction at each pixel are calculated in this step. Note that the maxima and minima of the first derivative gradient are the same as the zero-crossings of the second directional derivative. Only the maxima crossings are of interest because these pixels represent the areas of the sharpest intensity changes in the image” [pg. 3, left col, top para; See further: pgs. 4-5, § Gradient Computations. Computing gradient magnitude and direction data implies a multidimensional output.]).
	Peemen and Luo are both in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Peemen’s teachings by implementing the Canny Edge Detection algorithm as taught by Luo. One would have been motivated to make this modification in order to use it as a pre-processing step for a computer vision algorithm. [Abstract, Luo]

Regarding claim 8, Peemen discloses A method comprising:
generating a multifunction perceptron architecture having a plurality of neurons to perform one or more neuron functions in a machine learning environment, wherein the plurality of neurons includes one or more of splitter neurons, mixer neurons, and counter neurons (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features. In contrast, a CNN combines the two steps in a single trainable model. We outline the model with an example of a speed sign recognition application for a car driver support system [6]. The architecture of the model is depicted in Figure 2. First, the network performs feature extraction by a cascade of trainable convolution and subsample operations. In these layers simple features such as edges are extracted, which are combined in the next layers to detect more complex features such as corners or crossings. Secondly, the features, represented as feature maps, are classified by feed forward neural network layers. The final outputs describe whether there is a sign and to which category it belongs.” [pg. 14, A. Algorithm, ¶1; CNN would be equivalent to a multifunction perceptron architecture. Under BRI, the claim recites “one or more of” so the examiner is interpreting features being combined in the next layer to be equivalent to “mixer neurons”.]), wherein the plurality of neurons include heterogenous neurons (“Due to the flexibility of the reuse buffers subsampling factors are directly supported. If (1) contains a subsample factor S > 1 the parallel feature map neurons are not direct neighbors. As a result a different pattern must be send to the PEs, e.g. S=2, [x00 x20 x40 x60], [x10 x30 x50 x70], etc.” [pg. 16, § 4) Column Based storage with subsampling; Implies neurons within the architecture are heterogenous.]), wherein the plurality of neurons represent a plurality of nodes associated with software threads or hardware threads, wherein the software threads are facilitated by one or more processors including one or more homogenous processor or heterogenous processors (“Due to the small memory bandwidth of the MicroBlaze core, the accelerator of the previous section does not scale beyond 20 MACC PEs.” [pg. 19, § B Memory bandwidth limitations; note: Under BRI the claim recites: one or more processors thus the Examiner only needs to cite one processor therefore the examiner is interpreting MicroBlaze processor to correspond to the one or more processors. The accelerator would imply multiple different processors which would correspond to “heterogenous processors”. Note: The claim recites software threads or hardware threads, under BRI, the examiner does not need to cite both. Software threads is equivalent to the operating system executing the software program. See pg. 13, right col, top para, while hardware threads can be interpreted as the physical CPU core executing operations as cited below.]), and wherein the hardware threads are associated with the one or more processors through sequential circuitry (“Each PE sequentially computes a neuron value in a feature map.” [pg. 15, CNN Accelerator Template, ¶2; note: PE represents processing element, sequential circuitry is shown in Fig. 10 (bottom circuit).])
Although Peemen suggests software/hardware threads, the reference fails to explicitly teach wherein the plurality of neurons further include extractor neurons to support edge detection of edges associated with multidimensional channels to facilitate computation of image gradients in one or more pixel positions of a multidimensional output.
Luo also teaches software/hardware threads (“Under CUDA the GPU is a compute device that is a highly multithreaded coprocessor. A thread block is a batch of threads that executes on a multiprocessor that have access to its local memory. They perform their computations and become idle when they reach a synchronization point, waiting for other threads in the block to reach that point. Each thread is identified by its thread ID (one, two or three indices). The choice of 1,2 or 3D index layout is used to map the different pieces of data to the thread. The programmer writes data parallel code, which executes the same instructions on different data, though some customization of each thread is possible based on different behaviors depending on the value of the thread indices.” [pg. 2, ¶3])
	Luo teaches wherein the plurality of neurons further include extractor neurons to support edge detection of edges (“In this paper we focus on a GPU implementation of the Canny edge detector. This algorithm has remained a standard in edge finding techniques over the years. Applications of edge detection include their use as features in vision algorithms, their use to improve the appearance of displayed objects, in image coding and others too numerous to discuss” [pg. 2, § 1.2, ¶1; computer vision algorithms imply the use of neurons.]) associated with multidimensional channels (“The image data must be in a linear 24 bit per pixel format. Every “pixel” must be constrained to 3 channels of 8bit width each. The suggested color space is RGB since the Canny algorithm is best suited for grayscale images converted from RGB space. The image width and height must be of a multiple of 16 pixels to fit global memory access alignment properties” [pg. 3, 3.1, ¶1]) to facilitate computation of image gradients in one or more pixel positions of a multidimensional output (“To smooth the image, the Canny edge detector uses Gaussian convolution. Next, the image is convolved with a 2D first derivative operator to determine regions of sharp changes in intensities. The gradient magnitude and direction at each pixel are calculated in this step. Note that the maxima and minima of the first derivative gradient are the same as the zero-crossings of the second directional derivative. Only the maxima crossings are of interest because these pixels represent the areas of the sharpest intensity changes in the image” [pg. 3, left col, top para; See further: pgs. 4-5, § Gradient Computations. Computing gradient magnitude and direction data implies a multidimensional output.]).
	Peemen and Luo are both in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Peemen’s teachings by implementing the Canny Edge Detection algorithm as taught by Luo. One would have been motivated to make this modification in order to use it as a pre-processing step for a computer vision algorithm. [Abstract, Luo]

Regarding claim 15, Peemen discloses An apparatus comprising:
one or more processors to (“For control purposes the accelerator is connected to a MicroBlaze host processor.” [pg. 15, § CNN Accelerator Template], ¶1):
generate a multifunction perceptron architecture having a plurality of neurons to perform one or more neuron functions in a machine learning environment, wherein the plurality of neurons includes one or more of splitter neurons, mixer neurons, and counter neurons (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features. In contrast, a CNN combines the two steps in a single trainable model. We outline the model with an example of a speed sign recognition application for a car driver support system [6]. The architecture of the model is depicted in Figure 2. First, the network performs feature extraction by a cascade of trainable convolution and subsample operations. In these layers simple features such as edges are extracted, which are combined in the next layers to detect more complex features such as corners or crossings. Secondly, the features, represented as feature maps, are classified by feed forward neural network layers. The final outputs describe whether there is a sign and to which category it belongs.” [pg. 14, A. Algorithm, ¶1; CNN would be equivalent to a multifunction perceptron architecture. Under BRI, the claim recites “one or more of” so the examiner is interpreting features being combined in the next layer to be equivalent to “mixer neurons”.]), wherein the plurality of neurons include heterogenous neurons (“Due to the flexibility of the reuse buffers subsampling factors are directly supported. If (1) contains a subsample factor S > 1 the parallel feature map neurons are not direct neighbors. As a result a different pattern must be send to the PEs, e.g. S=2, [x00 x20 x40 x60], [x10 x30 x50 x70], etc.” [pg. 16, § 4) Column Based storage with subsampling; Implies neurons within the architecture are heterogenous.]), wherein the plurality of neurons represent a plurality of nodes associated with software threads or hardware threads, wherein the software threads are facilitated by one or more processors including one or more homogenous processor or heterogenous processors (“Due to the small memory bandwidth of the MicroBlaze core, the accelerator of the previous section does not scale beyond 20 MACC PEs.” [pg. 19, § B Memory bandwidth limitations; note: Under BRI the claim recites: one or more processors thus the Examiner only needs to cite one processor therefore the examiner is interpreting MicroBlaze processor to correspond to the one or more processors. The accelerator would imply multiple different processors which would correspond to “heterogenous processors”. Note: The claim recites software threads or hardware threads, under BRI, the examiner does not need to cite both. Software threads is equivalent to the operating system executing the software program. See pg. 13, right col, top para, while hardware threads can be interpreted as the physical CPU core executing operations as cited below.]), and wherein the hardware threads are associated with the one or more processors through sequential circuitry (“Each PE sequentially computes a neuron value in a feature map.” [pg. 15, CNN Accelerator Template, ¶2; note: PE represents processing element, sequential circuitry is shown in Fig. 10 (bottom circuit).])
Although Peemen suggests software/hardware threads, the reference fails to explicitly teach wherein the plurality of neurons further include extractor neurons to support edge detection of edges associated with multidimensional channels to facilitate computation of image gradients in one or more pixel positions of a multidimensional output.
Luo also teaches software/hardware threads (“Under CUDA the GPU is a compute device that is a highly multithreaded coprocessor. A thread block is a batch of threads that executes on a multiprocessor that have access to its local memory. They perform their computations and become idle when they reach a synchronization point, waiting for other threads in the block to reach that point. Each thread is identified by its thread ID (one, two or three indices). The choice of 1,2 or 3D index layout is used to map the different pieces of data to the thread. The programmer writes data parallel code, which executes the same instructions on different data, though some customization of each thread is possible based on different behaviors depending on the value of the thread indices.” [pg. 2, ¶3])
	Luo teaches wherein the plurality of neurons further include extractor neurons to support edge detection of edges (“In this paper we focus on a GPU implementation of the Canny edge detector. This algorithm has remained a standard in edge finding techniques over the years. Applications of edge detection include their use as features in vision algorithms, their use to improve the appearance of displayed objects, in image coding and others too numerous to discuss” [pg. 2, § 1.2, ¶1; computer vision algorithms imply the use of neurons.]) associated with multidimensional channels (“The image data must be in a linear 24 bit per pixel format. Every “pixel” must be constrained to 3 channels of 8bit width each. The suggested color space is RGB since the Canny algorithm is best suited for grayscale images converted from RGB space. The image width and height must be of a multiple of 16 pixels to fit global memory access alignment properties” [pg. 3, 3.1, ¶1]) to facilitate computation of image gradients in one or more pixel positions of a multidimensional output (“To smooth the image, the Canny edge detector uses Gaussian convolution. Next, the image is convolved with a 2D first derivative operator to determine regions of sharp changes in intensities. The gradient magnitude and direction at each pixel are calculated in this step. Note that the maxima and minima of the first derivative gradient are the same as the zero-crossings of the second directional derivative. Only the maxima crossings are of interest because these pixels represent the areas of the sharpest intensity changes in the image” [pg. 3, left col, top para; See further: pgs. 4-5, § Gradient Computations. Computing gradient magnitude and direction data implies a multidimensional output.]).
	Peemen and Luo are both in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Peemen’s teachings by implementing the Canny Edge Detection algorithm as taught by Luo. One would have been motivated to make this modification in order to use it as a pre-processing step for a computer vision algorithm. [Abstract, Luo]

Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Peemen in view of Luo and further in view Chattopadhyay et al. ("Counting Everyday Objects in Everyday Scenes", hereinafter "Chattopadhyay") in further view of Ansotegui et al. ("Model-Based Genetic Algorithms for Algorithm Configuration", hereinafter "Ansotegui")..

Regarding claim 3, Peemen/Luo discloses The non-transitory machine-readable medium of claim 1, where Peemen further teaches wherein the operations further comprise: 
detecting sensor data through the one or more sensors including one or more of a camera, a microphone, a touch sensor, a capacitor, a radio component, a radar component, a scanner, and an accelerometer (“An embedded camera platform equipped with a CNN accelerator has the performance advantages of dedicated acceleration without sacrificing flexibility for multiple vision tasks.” [pg. 13, § Introduction, ¶2; camera would be obtaining the images (i.e. sensor data) for the training of the CNN.); and performing the one or more neuron functions on the sensor data as part of a training process using one or more of the plurality of neurons, wherein the plurality of neurons further includes one or more of selector neurons, extractor neurons, and transformer neurons (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features.” [pg. 13, § A. Algorithm, ¶1; feature extractor layer of CNN would imply “extractor neurons”. See Fig. 2 for performing one or more neuron functions on the sensor data (i.e. input image)]).
However Peemen/Luo fails to explicitly teach wherein the counter neurons to facilitate discovery of salient features that characterize classes by counting a number of extracted features appearing in training data 
Chattopadhyay teaches wherein the counter neurons to facilitate discovery of salient features that characterize classes by counting a number of extracted features appearing in training data (“A toy example explaining the motivation for three categories of counting approaches explored in this paper. The task is to count the number of stars and circles. In detect, the idea is to detect instances of a category, and then report the total number of instances detected as the count.” [pg. 2, Figure 2; See further “Other recent work studies the problem of salient object subitizing (SOS). This is the task of counting the number of salient objects in the image (independent of the category). In contrast, we are interested in counting the number of instances of objects per category.” [pg. 3, top left col, ¶2; note: Under the BRI of the claim, the examiner does not need to provide a citation for “counter neurons”, however for purposes of compact prosecution the examiner has provided a citation corresponding to the limitation involving the counter neurons.]]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]
However the combination of Peemen/Luo/Chattopadhyay fails to explicitly teach and paths that describe a flow of information between the heterogenous neurons and competing algorithms.
Ansotegui teaches and paths that describe a flow of information between the heterogenous neurons and competing algorithms (“GGA represents parameterizations using an AND-OR tree in order to capture the dependencies between parameters. A node in the tree is a parameter or an AND node that binds multiple parameters together. Several sample trees are shown in Figure 1. In these particular trees, the root node is a categorical parameter that activates the left branch when it takes the value 0, and the right branch when it takes the value 1 (think of a parameter that chooses which subroutine is used, with parameters in the left subtree affecting only subroutine ’0’ and parameters only affecting subroutine ’1’ in the right tree)” [pg. 734, Genome representation; Ansotegui discloses sample trees in Fig. 1 which would include heterogenous neurons.]).
Peemen, Luo, Chattopadhyay, and Ansotegui are all in the same field of endeavor of training and optimizing machine learning algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. Ansotegui discloses a tournament selection based method for choosing the best algorithm in a population. It would have been obvious to a person of ordinary skill in the art to modify the teachings of Peemen/Luo/Chattopadhyay to implement an algorithm selection method in order to select the best algorithm as taught by Ansotegui. One would have been motivated to make this modification in order to select the most optimized algorithm to perform certain machine learning tasks. [§ 1 Gender-based Genetic Algorithm Configuration, ¶1, Ansotegui]
Regarding claim 4, Peemen/Luo/Chattopadhyay/Ansotegui discloses The non-transitory machine-readable medium of claim 3, where Peemen further teaches wherein the operations further comprise:
facilitating connectivity within the plurality of neurons through interconnects or buses (“The memory subsystem facilitates the flexibility and increases communication bandwidth by exploiting data reuse in memory access patterns. This is implemented by a series of dual ported FPGA Block RAMs (BRAMs). In Figure 4 the communication is depicted in more detail.” [pg. 15, § A. Flexible memory subsystem; See Fig. 4 “
    PNG
    media_image1.png
    368
    527
    media_image1.png
    Greyscale
”]); and mapping, physically and logically, the plurality of neurons with the one or more processors through the interconnects or buses (“A high-level verification and evaluation of the methodology by FPGA mapping of a speed traffic-sign recognition application.” [pg. 14 top bullet, left col; FPGA mapping is equivalent to logically mapping, See further Fig. 10 for “physically mapping”: 
    PNG
    media_image2.png
    694
    706
    media_image2.png
    Greyscale
]).


Regarding claim 10, Peemen/Luo discloses The method of claim 8, where Peemen further teaches wherein the operations further comprise: 
detecting sensor data through the one or more sensors including one or more of a camera, a microphone, a touch sensor, a capacitor, a radio component, a radar component, a scanner, and an accelerometer (“An embedded camera platform equipped with a CNN accelerator has the performance advantages of dedicated acceleration without sacrificing flexibility for multiple vision tasks.” [pg. 13, § Introduction, ¶2; camera would be obtaining the images (i.e. sensor data) for the training of the CNN.); and performing the one or more neuron functions on the sensor data as part of a training process using one or more of the plurality of neurons, wherein the plurality of neurons further includes one or more of selector neurons, extractor neurons, and transformer neurons (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features.” [pg. 13, § A. Algorithm, ¶1; feature extractor layer of CNN would imply “extractor neurons”. See Fig. 2 for performing one or more neuron functions on the sensor data (i.e. input image)]).
However Peemen/Luo fails to explicitly teach wherein the counter neurons to facilitate discovery of salient features that characterize classes by counting a number of extracted features appearing in training data 
Chattopadhyay teaches wherein the counter neurons to facilitate discovery of salient features that characterize classes by counting a number of extracted features appearing in training data (“A toy example explaining the motivation for three categories of counting approaches explored in this paper. The task is to count the number of stars and circles. In detect, the idea is to detect instances of a category, and then report the total number of instances detected as the count.” [pg. 2, Figure 2; See further “Other recent work studies the problem of salient object subitizing (SOS). This is the task of counting the number of salient objects in the image (independent of the category). In contrast, we are interested in counting the number of instances of objects per category.” [pg. 3, top left col, ¶2; note: Under the BRI of the claim, the examiner does not need to provide a citation for “counter neurons”, however for purposes of compact prosecution the examiner has provided a citation corresponding to the limitation involving the counter neurons.]]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]
However the combination of Peemen, Luo, and Chattopadhyay fails to explicitly teach and paths that describe a flow of information between the heterogenous neurons and competing algorithms.
Ansotegui teaches and paths that describe a flow of information between the heterogenous neurons and competing algorithms (“GGA represents parameterizations using an AND-OR tree in order to capture the dependencies between parameters. A node in the tree is a parameter or an AND node that binds multiple parameters together. Several sample trees are shown in Figure 1. In these particular trees, the root node is a categorical parameter that activates the left branch when it takes the value 0, and the right branch when it takes the value 1 (think of a parameter that chooses which subroutine is used, with parameters in the left subtree affecting only subroutine ’0’ and parameters only affecting subroutine ’1’ in the right tree)” [pg. 734, Genome representation; Ansotegui discloses sample trees in Fig. 1 which would include heterogenous neurons.]).
Peemen, Luo, Chattopadhyay, and Ansotegui are all in the same field of endeavor of training and optimizing machine learning algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. Ansotegui discloses a tournament selection based method for choosing the best algorithm in a population. It would have been obvious to a person of ordinary skill in the art to modify the teachings of Peemen/Luo/Chattopadhyay to implement an algorithm selection method in order to select the best algorithm as taught by Ansotegui. One would have been motivated to make this modification in order to select the most optimized algorithm to perform certain machine learning tasks. [§ 1 Gender-based Genetic Algorithm Configuration, ¶1, Ansotegui]

Regarding claim 11, Peemen/Luo/Chattopadhyay/Ansotegui discloses The method of claim 10, where Peemen further teaches further comprising: 
facilitating connectivity within the plurality of neurons through interconnects or buses (“The memory subsystem facilitates the flexibility and increases communication bandwidth by exploiting data reuse in memory access patterns. This is implemented by a series of dual ported FPGA Block RAMs (BRAMs). In Figure 4 the communication is depicted in more detail.” [pg. 15, § A. Flexible memory subsystem; See Fig. 4 “
    PNG
    media_image1.png
    368
    527
    media_image1.png
    Greyscale
”]); and mapping, physically and logically, the plurality of neurons with the one or more processors through the interconnects or buses (“A high-level verification and evaluation of the methodology by FPGA mapping of a speed traffic-sign recognition application.” [pg. 14 top bullet, left col; FPGA mapping is equivalent to logically mapping, See further Fig. 10 for “physically mapping”:
    PNG
    media_image2.png
    694
    706
    media_image2.png
    Greyscale
]).

Regarding claim 17, Peemen/Luo discloses The apparatus of claim 15, where Peemen teaches wherein the one or more processors are further to:
detect sensor data through the one or more sensors including one or more of a camera, a microphone, a touch sensor, a capacitor, a radio component, a radar component, a scanner, and an accelerometer (“An embedded camera platform equipped with a CNN accelerator has the performance advantages of dedicated acceleration without sacrificing flexibility for multiple vision tasks.” [pg. 13, § Introduction, ¶2; camera would be obtaining the images (i.e. sensor data) for the training of the CNN.); and perform the one or more neuron functions on the sensor data as part of a training process using one or more of the plurality of neurons, wherein the plurality of neurons further includes one or more of selector neurons, extractor neurons, and transformer neurons (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features.” [pg. 13, § A. Algorithm, ¶1; feature extractor layer of CNN would imply “extractor neurons”. See Fig. 2 for performing one or more neuron functions on the sensor data (i.e. input image)]).
However Peemen/Luo fails to explicitly teach wherein the counter neurons to facilitate discovery of salient features that characterize classes by counting a number of extracted features appearing in training data 
Chattopadhyay teaches wherein the counter neurons to facilitate discovery of salient features that characterize classes by counting a number of extracted features appearing in training data (“A toy example explaining the motivation for three categories of counting approaches explored in this paper. The task is to count the number of stars and circles. In detect, the idea is to detect instances of a category, and then report the total number of instances detected as the count.” [pg. 2, Figure 2; See further “Other recent work studies the problem of salient object subitizing (SOS). This is the task of counting the number of salient objects in the image (independent of the category). In contrast, we are interested in counting the number of instances of objects per category.” [pg. 3, top left col, ¶2; note: Under the BRI of the claim, the examiner does not need to provide a citation for “counter neurons”, however for purposes of compact prosecution the examiner has provided a citation corresponding to the limitation involving the counter neurons.]]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]
However the combination of Peemen, Luo, and Chattopadhyay fails to explicitly teach and paths that describe a flow of information between the heterogenous neurons and competing algorithms.
Ansotegui teaches and paths that describe a flow of information between the heterogenous neurons and competing algorithms (“GGA represents parameterizations using an AND-OR tree in order to capture the dependencies between parameters. A node in the tree is a parameter or an AND node that binds multiple parameters together. Several sample trees are shown in Figure 1. In these particular trees, the root node is a categorical parameter that activates the left branch when it takes the value 0, and the right branch when it takes the value 1 (think of a parameter that chooses which subroutine is used, with parameters in the left subtree affecting only subroutine ’0’ and parameters only affecting subroutine ’1’ in the right tree)” [pg. 734, Genome representation; Ansotegui discloses sample trees in Fig. 1 which would include heterogenous neurons.]).
Peemen, Luo, Chattopadhyay, and Ansotegui are all in the same field of endeavor of training and optimizing machine learning algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. Ansotegui discloses a tournament selection based method for choosing the best algorithm in a population. It would have been obvious to a person of ordinary skill in the art to modify the teachings of Peemen/Luo/Chattopadhyay to implement an algorithm selection method in order to select the best algorithm as taught by Ansotegui. One would have been motivated to make this modification in order to select the most optimized algorithm to perform certain machine learning tasks. [§ 1 Gender-based Genetic Algorithm Configuration, ¶1, Ansotegui]

Regarding claim 18, Peemen/Luo/Chattopadhyay/Ansotegui discloses The apparatus of claim 17, where Peemen further teaches wherein the one or more processors are further to:
facilitate connectivity within the plurality of neurons through interconnects or buses (“The memory subsystem facilitates the flexibility and increases communication bandwidth by exploiting data reuse in memory access patterns. This is implemented by a series of dual ported FPGA Block RAMs (BRAMs). In Figure 4 the communication is depicted in more detail.” [pg. 15, § A. Flexible memory subsystem; See Fig. 4 “
    PNG
    media_image1.png
    368
    527
    media_image1.png
    Greyscale
”]); and map, physically and logically, the plurality of neurons with the one or more processors through the interconnects or buses (“A high-level verification and evaluation of the methodology by FPGA mapping of a speed traffic-sign recognition application.” [pg. 14 top bullet, left col; FPGA mapping is equivalent to logically mapping, See further Fig. 10 for “physically mapping”:
    PNG
    media_image2.png
    694
    706
    media_image2.png
    Greyscale
])

Claims 5, 6, 12, 13, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Peemen in view of Luo and further in view of Chattopadhyay.

Regarding claim 5, Peemen/Luo teaches The non-transitory machine-readable medium of claim 1, where Peemen further teaches wherein the operations further comprise: performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features.” [pg. 14, A. Algorithm, ¶1; this is equivalent to supervised learning.) optimize parameters used by the plurality of neurons to perform the neuron functions (“The huge amount of data reuse in the algorithm has practical implications that should be considered before implementation. For example, the ordering of operations and selection of parallelism are key parameters that influence data transfer and computational resource usage… Each level in the algorithm has a different data transfer pattern and contains an amount of data reuse. In addition, different layers or network configurations change the properties of the 4 levels. For example, in figure 3 the first layers have much parallelism at the kernel level. Layer 4, on the other hand, has no kernel level parallelism because the kernel size is 1x1. As a result, an accelerator must be flexible to be efficient for each CNN layer.” [pg. 15, B. Practical implications, ¶1-2; Peemen discloses changing the parameters to influence data transfer and computation resource usage which would be equivalent to optimizing the parameters to perform “neuron functions”.]).
However Peemen/Luo fails to explicitly teach performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture to optimize parameters used by the plurality of neurons to perform the neuron functions.
Chattopadhyay teaches performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture to optimize parameters used by the plurality of neurons to perform the neuron functions (“Our glance approach repurposes a generic CNN architecture for counting by training a multi-layered perceptron (MLP) with a L2 loss to regress to image level counts from deep representations extracted from the CNN. The MLP has batch normalization and Rectified Linear Unit (ReLU) activations between hidden layers. The models were trained with a learning rate of 10-3 and weight decay set to 0.95. We experiment with choices of a single hidden layer, and two hidden layers for the MLP, as well as the sizes of the hidden units.” [pg. 3, § 3.2 Glancing, ¶1]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]

Regarding claim 6, Peemen/Luo/Chattopadhyay teaches The non-transitory machine-readable medium of claim 5, where Peemen further teaches wherein the operations further comprise: 
extracting, based on training, a learned algorithm (“A CNN allows sufficient parallelism to execute hundreds of operations in parallel. The current bottleneck in available platforms for efficient utilization of parallelism is data transfer. Evaluating a trained CNN on 720p video involves a large number of convolutions, which in a single layer can require 3.4 billion memory accesses.” [pg. 13, § Introduction, ¶3; a trained CNN would be equivalent to a learned algorithm.]); 
optimizing the learned algorithm (“Evaluating a trained CNN on 720p video involves a large number of convolutions, which in a single layer can require 3.4 billion memory accesses. Without on-chip buffers all accesses are to external memory, requiring huge memory bandwidth and consuming a lot of energy. The number of external accesses can be reduced by on-chip memory that exploits data reuse. Varying on-chip memory size is in essence trading chip area versus memory bandwidth. E.g. 4 MB on-chip memory can reduce the external accesses to 5.4 million.” [pg. 13, § Introduction, ¶3; reducing external accesses would correspond to optimizing the learned algorithm.]); and 
building a separate implementation of the learned algorithm (“To quantify the performance of the improved schedules, we compare with the original schedule. The differences are measured by mapping with the accelerator template to a Xilinx ML-605 Virtex 6 FPGA board. The system clock frequency of presented results is 150 MHz.” [pg. 18, A. Accelerator performance, ¶1; mapping the CNN to the FPGA board would be equivalent to building a separate implementation.]).
However Peemen/Luo fails to explicitly teach training the multifunction perceptron architecture 
Chattopadhyay teaches training the multifunction perceptron architecture (“Our glance approach repurposes a generic CNN architecture for counting by training a multi-layered perceptron (MLP) with a L2 loss to regress to image level counts from deep representations extracted from the CNN. The MLP has batch normalization and Rectified Linear Unit (ReLU) activations between hidden layers. The models were trained with a learning rate of 10-3 and weight decay set to 0.95. We experiment with choices of a single hidden layer, and two hidden layers for the MLP, as well as the sizes of the hidden units.” [pg. 3, § 3.2 Glancing, ¶1]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]

Regarding claim 12, Peemen/Luo teaches The method of claim 8, where Peemen further teaches further comprising: performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features.” [pg. 14, A. Algorithm, ¶1; this is equivalent to supervised learning.) optimize parameters used by the plurality of neurons to perform the neuron functions (“The huge amount of data reuse in the algorithm has practical implications that should be considered before implementation. For example, the ordering of operations and selection of parallelism are key parameters that influence data transfer and computational resource usage… Each level in the algorithm has a different data transfer pattern and contains an amount of data reuse. In addition, different layers or network configurations change the properties of the 4 levels. For example, in figure 3 the first layers have much parallelism at the kernel level. Layer 4, on the other hand, has no kernel level parallelism because the kernel size is 1x1. As a result, an accelerator must be flexible to be efficient for each CNN layer.” [pg. 15, B. Practical implications, ¶1-2; Peemen discloses changing the parameters to influence data transfer and computation resource usage which would be equivalent to optimizing the parameters to perform “neuron functions”.]).
However Peemen/Luo fails to explicitly teach performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture to optimize parameters used by the plurality of neurons to perform the neuron functions.
Chattopadhyay teaches performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture to optimize parameters used by the plurality of neurons to perform the neuron functions (“Our glance approach repurposes a generic CNN architecture for counting by training a multi-layered perceptron (MLP) with a L2 loss to regress to image level counts from deep representations extracted from the CNN. The MLP has batch normalization and Rectified Linear Unit (ReLU) activations between hidden layers. The models were trained with a learning rate of 10-3 and weight decay set to 0.95. We experiment with choices of a single hidden layer, and two hidden layers for the MLP, as well as the sizes of the hidden units.” [pg. 3, § 3.2 Glancing, ¶1]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]

Regarding claim 13, Peemen/Luo/Chattopadhyay teaches The method of claim 12, where Peemen further teaches further comprising: 
extracting, based on training, a learned algorithm (“A CNN allows sufficient parallelism to execute hundreds of operations in parallel. The current bottleneck in available platforms for efficient utilization of parallelism is data transfer. Evaluating a trained CNN on 720p video involves a large number of convolutions, which in a single layer can require 3.4 billion memory accesses.” [pg. 13, § Introduction, ¶3; a trained CNN would be equivalent to a learned algorithm.]); 
optimizing the learned algorithm (“Evaluating a trained CNN on 720p video involves a large number of convolutions, which in a single layer can require 3.4 billion memory accesses. Without on-chip buffers all accesses are to external memory, requiring huge memory bandwidth and consuming a lot of energy. The number of external accesses can be reduced by on-chip memory that exploits data reuse. Varying on-chip memory size is in essence trading chip area versus memory bandwidth. E.g. 4 MB on-chip memory can reduce the external accesses to 5.4 million.” [pg. 13, § Introduction, ¶3; reducing external accesses would correspond to optimizing the learned algorithm.]); and 
building a separate implementation of the learned algorithm (“To quantify the performance of the improved schedules, we compare with the original schedule. The differences are measured by mapping with the accelerator template to a Xilinx ML-605 Virtex 6 FPGA board. The system clock frequency of presented results is 150 MHz.” [pg. 18, A. Accelerator performance, ¶1; mapping the CNN to the FPGA board would be equivalent to building a separate implementation.]).
However Peemen/Luo fails to explicitly teach training the multifunction perceptron architecture 
Chattopadhyay teaches training the multifunction perceptron architecture (“Our glance approach repurposes a generic CNN architecture for counting by training a multi-layered perceptron (MLP) with a L2 loss to regress to image level counts from deep representations extracted from the CNN. The MLP has batch normalization and Rectified Linear Unit (ReLU) activations between hidden layers. The models were trained with a learning rate of 10-3 and weight decay set to 0.95. We experiment with choices of a single hidden layer, and two hidden layers for the MLP, as well as the sizes of the hidden units.” [pg. 3, § 3.2 Glancing, ¶1]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]

Regarding claim 19, Peemen/Luo teaches The apparatus of claim 15, where Peemen further teaches wherein the one or more processors are further to: performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture (“Traditional vision applications use feature extraction and classification as two distinct steps. The feature extractor largely influences the classification accuracy. Because this extractor is often developed manually, the system performance depends on the ability of the designer to come up with a set of rules to extract an appropriate set of features.” [pg. 14, A. Algorithm, ¶1; this is equivalent to supervised learning.) optimize parameters used by the plurality of neurons to perform the neuron functions (“The huge amount of data reuse in the algorithm has practical implications that should be considered before implementation. For example, the ordering of operations and selection of parallelism are key parameters that influence data transfer and computational resource usage… Each level in the algorithm has a different data transfer pattern and contains an amount of data reuse. In addition, different layers or network configurations change the properties of the 4 levels. For example, in figure 3 the first layers have much parallelism at the kernel level. Layer 4, on the other hand, has no kernel level parallelism because the kernel size is 1x1. As a result, an accelerator must be flexible to be efficient for each CNN layer.” [pg. 15, B. Practical implications, ¶1-2; Peemen discloses changing the parameters to influence data transfer and computation resource usage which would be equivalent to optimizing the parameters to perform “neuron functions”.]).
However Peemen/Luo fails to explicitly teach performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture to optimize parameters used by the plurality of neurons to perform the neuron functions.
Chattopadhyay teaches performing one or more of supervised learning and unsupervised learning within the multifunction perceptron architecture to optimize parameters used by the plurality of neurons to perform the neuron functions (“Our glance approach repurposes a generic CNN architecture for counting by training a multi-layered perceptron (MLP) with a L2 loss to regress to image level counts from deep representations extracted from the CNN. The MLP has batch normalization and Rectified Linear Unit (ReLU) activations between hidden layers. The models were trained with a learning rate of 10-3 and weight decay set to 0.95. We experiment with choices of a single hidden layer, and two hidden layers for the MLP, as well as the sizes of the hidden units.” [pg. 3, § 3.2 Glancing, ¶1]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]

Regarding claim 20, Peemen/Luo/Chattopadhyay teaches The apparatus of claim 19, where Peemen further teaches wherein the one or more processors are further to: 
extracting, based on training, a learned algorithm (“A CNN allows sufficient parallelism to execute hundreds of operations in parallel. The current bottleneck in available platforms for efficient utilization of parallelism is data transfer. Evaluating a trained CNN on 720p video involves a large number of convolutions, which in a single layer can require 3.4 billion memory accesses.” [pg. 13, § Introduction, ¶3; a trained CNN would be equivalent to a learned algorithm.]); 
optimizing the learned algorithm (“Evaluating a trained CNN on 720p video involves a large number of convolutions, which in a single layer can require 3.4 billion memory accesses. Without on-chip buffers all accesses are to external memory, requiring huge memory bandwidth and consuming a lot of energy. The number of external accesses can be reduced by on-chip memory that exploits data reuse. Varying on-chip memory size is in essence trading chip area versus memory bandwidth. E.g. 4 MB on-chip memory can reduce the external accesses to 5.4 million.” [pg. 13, § Introduction, ¶3; reducing external accesses would correspond to optimizing the learned algorithm.]); and 
building a separate implementation of the learned algorithm (“To quantify the performance of the improved schedules, we compare with the original schedule. The differences are measured by mapping with the accelerator template to a Xilinx ML-605 Virtex 6 FPGA board. The system clock frequency of presented results is 150 MHz.” [pg. 18, A. Accelerator performance, ¶1; mapping the CNN to the FPGA board would be equivalent to building a separate implementation.]).
However Peemen/Luo fails to explicitly teach training the multifunction perceptron architecture 
Chattopadhyay teaches training the multifunction perceptron architecture (“Our glance approach repurposes a generic CNN architecture for counting by training a multi-layered perceptron (MLP) with a L2 loss to regress to image level counts from deep representations extracted from the CNN. The MLP has batch normalization and Rectified Linear Unit (ReLU) activations between hidden layers. The models were trained with a learning rate of 10-3 and weight decay set to 0.95. We experiment with choices of a single hidden layer, and two hidden layers for the MLP, as well as the sizes of the hidden units.” [pg. 3, § 3.2 Glancing, ¶1]).
Peemen, Luo, and Chattopadhyay are all in the same field of endeavor of training computer vision algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Chattopadhyay discloses training a CNN to detect and count the number of instances of different object classes in an image. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Peemen’s/Luo’s teachings to detect and count extracted features as taught by Chattopadhyay. One would have been motivated to make this modification in order to improve object detection. [Abstract, Chattopadhyay]

Claims 7, 14, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Peemen in view of Luo and further in view of Ansotegui.

Regarding claim 7, Peemen/Luo teaches The non-transitory machine-readable medium of claim 1,where Peemen teaches wherein the operations further comprise wherein the one or more processors comprising one or more of a graphics processor, an application processor, (“On a standard laptop the optimization procedure for a series of memory sizes” [pg. 17, § 3) scheduling Design Space Exploration, GPUs and an application processor would be inherent with a standard laptop.) and another processor (“For control purposes the accelerator is connected to a MicroBlaze host processor” [pg. 15, § CNN Accelerator Template, ¶1; this would correspond to another processor]), wherein the one or more processors are co-located on a common semiconductor package (Peemen discloses using a standard laptop on pg. 17, located on a common semiconductor package is inherent).
However Peemen/Luo fails to explicitly teach facilitating competition between multiple algorithms at the multifunction perception architecture, wherein the multiple algorithms are allowed to mutate to improve scores until an algorithm from the multiple algorithms emerges as a winning algorithm
Ansotegui teaches facilitating competition between multiple algorithms at the multifunction perception architecture, wherein the multiple algorithms are allowed to mutate to improve scores until an algorithm from the multiple algorithms emerges as a winning algorithm (“The crossover operator in GGA takes two genomes, one from the competitive and one from the non-competitive population. The competitive genome is a tournament winner, while the other genome is drawn uniformly at random from the non-competitive population. The crossover process is shown in Figure 1, and it can be viewed as a hybrid of the traditional multi-point and uniform crossovers that considers parameter dependencies. GGA performs mutation simply by selecting random parameter settings and replacing them with values drawn from either a Gaussian (for discrete and continuous parameters) or a uniform (for categorical parameters) distribution.” [pg. 734, § Recombination and Mutation]).
Peemen, Luo, and Ansotegui are both in the same field of endeavor of training and optimizing machine learning algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Ansotegui discloses a tournament selection based method for choosing the best algorithm in a population. It would have been obvious to a person of ordinary skill in the art to modify the teachings of Peemen/Luo to implement an algorithm selection method in order to select the best algorithm as taught by Ansotegui. One would have been motivated to make this modification in order to select the most optimized algorithm to perform certain machine learning tasks. [§ 1 Gender-based Genetic Algorithm Configuration, ¶1, Ansotegui]

Regarding claim 14, Peemen/Luo teaches The method of claim 8, where Peemen further teaches further comprising wherein the one or more processors comprising one or more of a graphics processor, an application processor, (“On a standard laptop the optimization procedure for a series of memory sizes” [pg. 17, § 3) scheduling Design Space Exploration, GPUs and an application processor would be inherent with a standard laptop.) and another processor (“For control purposes the accelerator is connected to a MicroBlaze host processor” [pg. 15, § CNN Accelerator Template, ¶1; this would correspond to another processor]), wherein the one or more processors are co-located on a common semiconductor package (Peemen discloses using a standard laptop on pg. 17, located on a common semiconductor package is inherent).
However Peemen/Luo fails to explicitly teach facilitating competition between multiple algorithms at the multifunction perception architecture, wherein the multiple algorithms are allowed to mutate to improve scores until an algorithm from the multiple algorithms emerges as a winning algorithm
Ansotegui teaches facilitating competition between multiple algorithms at the multifunction perception architecture, wherein the multiple algorithms are allowed to mutate to improve scores until an algorithm from the multiple algorithms emerges as a winning algorithm (“The crossover operator in GGA takes two genomes, one from the competitive and one from the non-competitive population. The competitive genome is a tournament winner, while the other genome is drawn uniformly at random from the non-competitive population. The crossover process is shown in Figure 1, and it can be viewed as a hybrid of the traditional multi-point and uniform crossovers that considers parameter dependencies. GGA performs mutation simply by selecting random parameter settings and replacing them with values drawn from either a Gaussian (for discrete and continuous parameters) or a uniform (for categorical parameters) distribution.” [pg. 734, § Recombination and Mutation]).
Peemen, Luo, and Ansotegui are both in the same field of endeavor of training and optimizing machine learning algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Ansotegui discloses a tournament selection based method for choosing the best algorithm in a population. It would have been obvious to a person of ordinary skill in the art to modify the teachings of Peemen/Luo to implement an algorithm selection method in order to select the best algorithm as taught by Ansotegui. One would have been motivated to make this modification in order to select the most optimized algorithm to perform certain machine learning tasks. [§ 1 Gender-based Genetic Algorithm Configuration, ¶1, Ansotegui]
Regarding claim 21, Peemen/Luo teaches The apparatus of claim 15, and further teaches wherein the one or more processors comprising one or more of a graphics processor, an application processor, (“On a standard laptop the optimization procedure for a series of memory sizes” [pg. 17, § 3) scheduling Design Space Exploration, GPUs and an application processor would be inherent with a standard laptop.) and another processor (“For control purposes the accelerator is connected to a MicroBlaze host processor” [pg. 15, § CNN Accelerator Template, ¶1; this would correspond to another processor]), wherein the one or more processors are co-located on a common semiconductor package (Peemen discloses using a standard laptop on pg. 17, located on a common semiconductor package is inherent).
However Peemen/Luo fails to explicitly teach where the one or more processors are further to facilitate competition between multiple algorithms at the multifunction perception architecture, wherein the multiple algorithms are allowed to mutate to improve scores until an algorithm from the multiple algorithms emerges as a winning algorithm
Ansotegui teaches where the one or more processors are further to facilitate competition between multiple algorithms at the multifunction perception architecture, wherein the multiple algorithms are allowed to mutate to improve scores until an algorithm from the multiple algorithms emerges as a winning algorithm (“The crossover operator in GGA takes two genomes, one from the competitive and one from the non-competitive population. The competitive genome is a tournament winner, while the other genome is drawn uniformly at random from the non-competitive population. The crossover process is shown in Figure 1, and it can be viewed as a hybrid of the traditional multi-point and uniform crossovers that considers parameter dependencies. GGA performs mutation simply by selecting random parameter settings and replacing them with values drawn from either a Gaussian (for discrete and continuous parameters) or a uniform (for categorical parameters) distribution.” [pg. 734, § Recombination and Mutation]).
Peemen, Luo, and Ansotegui are both in the same field of endeavor of training and optimizing machine learning algorithms. Peemen discloses training a CNN and implementing the algorithm on a FPGA. Luo discloses Canny Edge detection for computing on GPUs. Ansotegui discloses a tournament selection based method for choosing the best algorithm in a population. It would have been obvious to a person of ordinary skill in the art to modify the teachings of Peemen/Luo to implement an algorithm selection method in order to select the best algorithm as taught by Ansotegui. One would have been motivated to make this modification in order to select the most optimized algorithm to perform certain machine learning tasks. [§ 1 Gender-based Genetic Algorithm Configuration, ¶1, Ansotegui]

Response to Arguments
Applicant's arguments filed 06/22/2022 have been fully considered but they are not persuasive.



Regarding the 35 U.S.C. §102/103 Rejections:
Applicant’s arguments regarding the following limitation: “wherein the plurality of neurons represent a plurality of nodes associated with software threads or hardware threads, wherein the software threads are facilitated by one or more processors including one or more homogenous processors or one or more heterogenous processors, and wherein the hardware threads are associated with the one or more processors through sequential circuitry” has been considered but are not persuasive. As noted above, the BRI of the claim only requires the teaching of one processor. The examiner is interpreting the MicroBlaze processor to correspond to the “one or more processors”. Additionally, the claim also recites software or hardware threads, thus under BRI, the examiner is interpreting the operating system executing the software program disclosed by Peemen to be equivalent to software threads. The examiner has also provided additional evidence noting that hardware threads can be interpreted as the physical CPU core executing the operations. Therefore, applicant’s arguments are not persuasive.

Applicant’s arguments regarding the newly amended limitations, specifically, the previous prior arts of record failing to suggest or teach “wherein the plurality of neurons further include extractor neurons to support edge detection of edges associated with multidimensional channels to facilitate computation of image gradients in one or more pixel positions of a multidimensional output” has been considered but are moot because the newly amended limitations are now taught by the newly presented art of Luo. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chen et al. (“Facial expression recognition based on edge detection”) discloses edge detection with multichannel RGB and canny operators
Xin et al. (“An improved Canny edge detection algorithm for color image”) discloses an improved canny edge detection algorithm for colored images
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122