DETAILED ACTION
Claims 1-26 have been presented for examination. Claims 1-26 are rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed with respect to the rejection of claim 1 under 35 U.S.C. 103 have been fully considered but they are not persuasive. 
Applicant argues that Meloni in combination of Graf fails to teach the following limitations of claim 1 as amended:
a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs are configured to perform non-convolutional data processing operations associated with processing of a DCNN kernel in parallel with execution of convolutional operations associated with processing of the DCNN kernel by the plurality of convolution accelerators of the configurable accelerator framework to execute the DCNN kernel.
Examiner disagrees with Applicant’s assertion. First, Meloni explicitly states that “the activation of the accelerator and of the data transfers from/to external memories are managed by two processing cores that are coupled to the HWCE through a shared data scratchpad.” The data transfers are an example of a non-convolutional data processing operation associated with convolutional operations, since the cores communicate with the HWCE. Convolutional operations naturally involve convolutional kernels (see Meloni Section 1). Additionally, these non-convolutional operations occur in parallel with .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 2, 6, 7, 8, 18, 21, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Meloni et al. (“A High-Efficiency Runtime Reconfigurable IP for CNN Acceleration on a Mid-Range All-Programmable SoC”) in view of Graf et al. (“A Massively Parallel Digital Learning Processor”).
a system on chip (SoC) that implements a deep convolutional neural network architecture (Section II: invention implements CNN using Zynq Z-7045 SoC architecture; Section VII: implementation include DCNN’s), the SoC comprising:
a system bus (Fig. 2: AXI-based interconnect);
a plurality of addressable memory arrays coupled to the system bus (Fig 2: TCDM’s [Tightly Coupled Data Memory] connected to interconnect);
at least one applications processor core coupled to the system bus (Fig. 2: ARM-based Processing system);
a configurable accelerator framework coupled to the system bus, wherein the configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system that includes a plurality of convolution accelerators configured to perform convolutional operations (Section V: HWCE [Hardware Convolution Engine] acts as a special purpose coprocessor to receive input image features for CNN processing [see Abstract]; HWCE is composed of SoP modules that implement the actual convolution [convolution accelerators]); and
a plurality of digital signal processors (DSPs) coupled to the system bus, (Fig. 2: two general cores are coupled to the interconnect),
wherein the plurality of DSPs are configured to perform non-convolutional data processing operations associated with processing of a DCNN kernel in parallel with execution of convolutional operations associated with processing of the DCNN kernel by the plurality of convolution accelerators of the configurable accelerator framework to execute the DCNN kernel (Section I and Fig. 2: two general processing cores are coupled to the HWCE to perform data transfers and related scheduling [non-convolutional data processing operations]; Section VI and Fig. 7: computation and communication tasks have a complete overlap and thus are done in parallel with each other).
Meloni does not teach a plurality of digital signal processors.
a plurality of digital signal processors (Section 3: 128 VPE’s [hardware DSP’s] are divided into 4 blocks of 32; Fig. 1: the VPE’s are connected by a PCI bus to a host CPU).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing cores of Meloni with the DSP’s of Graf to increase performance in machine learning operations higher than would be possible with simple multiple-core CPU’s (Graf Section 2.3).

As per claim 2, the rejection of claim 1 is incorporated.
Graf additionally teaches wherein the plurality of digital signal processors include a plurality of DSP clusters coupled to the system bus, and wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs (128 VPE’s [hardware DSP’s] are divided into 4 blocks of 32, each group controlled by one sequencer with a vector instruction set).
 It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing cores of Meloni with the DSP’s of Graf to increase performance in machine learning operations higher than would be possible with simple multiple-core CPU’s (Graf Section 2.3).

As per claim 6, the rejection of claim 1 is incorporated.
Meloni additionally teaches wherein the configurable accelerator framework includes:
a reconfigurable stream switch coupled to the system bus (Fig 1: crossbar connected to the interconnect; crossbar switches are reconfigurable in the sense that a connection that can be established at any intersection by setting a switch at runtime [see Cheung “A Cross Bar Switch”]; 
wherein the plurality of convolution accelerators are coupled to the stream switch (Fig. 3: HWCE is made up of multiple SoP modules that do the actual convolutions).

As per claim 7, the rejection of claim 6 is incorporated.
Meloni additionally teaches wherein the stream switch is configurable at run-time and reconfigurable during execution of the at least one DCNN (Fig 1: crossbar connected to the interconnect; crossbar switches are reconfigurable in the sense that a connection that can be established at any intersection by setting a switch at runtime [see Cheung]).

As per claim 8, the rejection of claim 6 is incorporated.
Meloni additionally teaches wherein each of the plurality of convolution accelerators is configurable at run-time and reconfigurable during execution of the DCNN kernel (Section I: “The proposed accelerator can be re-conﬁgured at run-time to be used to execute convolutions with different ﬁlter sizes applied with different strides”).

As per claim 18, Meloni teaches a system on a chip (SoC) (Section II: invention implements CNN using Zynq Z-7045 SoC architecture; Section VII: implementation include DCNN’s), comprising:
a communication bus (Fig. 2: AXI-based interconnect);
a memory coupled to the communication bus (Fig 2: TCDM’s [Tightly Coupled Data Memory] connected to interconnect);
a configurable accelerator framework coupled to the communication bus (Fig. 1: interconnect is coupled to the HWCE), the configurable accelerator framework having a reconfigurable stream switch coupled to the communication bus and a plurality of convolution accelerators coupled to the stream switch (Fig. 1: HWCE is connected to a crossbar switch coupled to the interconnect; Fig. 3: HWCE , the plurality of convolution accelerators arranged to perform convolution operations on image data during execution of a deep convolutional neural network (DCNN) (Fig. 3: SoP modules perform convolution operations); and
a plurality of digital signal processors (DSPs) coupled to the communication bus, wherin the plurality of DSPs perform non-convolution operations for at least one DCNN and coordinate functionality with the plurality of convolution accelerators of the configurable accelerator framework to execute the at least one DCNN (Section I: two processing cores aid the HWCE [and thus the SoP’s] in executing CNN operations by managing data transfers to and from external memory [non-convolution operations]; Fig. 2: the cores are coupled to the interconnect),
wherein the plurality of DSPs are configured to perform non-convolutional data processing operations for a DCNN kernel in parallel with execution of convolutional operations for the DCNN kernel by the plurality of convolution accelerators (Section I and Fig. 2: two general processing cores are coupled to the HWCE to perform data transfers and related scheduling [non-convolutional data processing operations]; Section VI and Fig. 7: computation and communication tasks have a complete overlap and thus are done in parallel with each other).
Meloni does not teach a plurality of digital signal processors.
However, Graf does teach a plurality of digital signal processors (Section 3: 128 VPE’s [hardware DSP’s] are divided into 4 blocks of 32; Fig. 1: the VPE’s are connected by a PCI bus to a host CPU).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni with the DSP’s of Graf to increase performance in machine learning operations higher than would be possible with simple multiple-core CPU’s (Graf Section 2.3).

As per claim 21, the rejection of claim 18 is incorporated.

wherein the configurable accelerator framework includes a plurality of control registers to store control information to control execution of the DCNN, and the reconfigurable stream switch, in operation, provides DSPs of the plurality of DSPs and convolutional accelerators of the plurality of convolutional accelerators with access to the plurality of control registers (Section II: memory mapped control registers support different kernel sizes and strides in convolution; Fig. 2: crossbar helps provide access to the control registers).

As per claim 24, the rejection of claim 1 is incorporated.
Meloni additionally teaches:
wherein the memory arrays have a hierarchical structure (Section IV: HWEC access the memory banks in a statically defined order, meaning the memory banks have a hierarchy to their structure).


Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Graf and further in view of Moctar et al. (“ROUTING ALGORITHMS FOR FPGAS WITH SPARSE INTRA-CLUSTER ROUTING CROSSBARS”).
As per claim 19, the rejection of claim 18 is incorporated.
Graf additionally teaches wherein the plurality of DSPs are in clusters, each cluster having at least two separate DSPs (128 VPE’s [hardware DSP’s] are divided into 4 blocks of 32, each group controlled by one sequencer with a vector instruction set).
Graf does not teach each cluster including a cluster communication bus coupled to the communication bus.
each cluster including a cluster communication bus coupled to the communication bus (Fig. 1: each CLB contains intra-cluster crossbars).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf with the crossbar switch of Moctar in order to provide increased routability in the clusters (Moctar Abstract).


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Graf, and further in view of Choudhary et al. (“NETRA: A Hierarchical and Partitionable Architecture for Computer Vision Systems”).
As per claim 3, the rejection of claim 2 is incorporated.
Meloni-Graf does not teach a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the system bus.
However, Choudhary does teach a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the system bus (Fig. 2: global crossbar switch for clusters of processing elements; switch is connected to a bus).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf with the stream switch coordination of Choudhary in order to provide processors with selective broadcast capability (Choudhary pg 1093).


Claims 4, 5, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Graf, and further in view of Moctar, Kommanaboyina et al. (U.S. Patent No. 9436637B2), and Stenstrom et al. (“Reducing Contention in Shared-Memory Multiprocessors”).

Meloni-Graf does not teach wherein each DSP cluster includes:
a DSP cluster crossbar switch;
a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch;
a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch.
However, Moctar does teach wherein each DSP cluster includes:
a DSP cluster crossbar switch (Fig. 1: each CLB contains intra-cluster crossbars);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf with the crossbar switch of Moctar in order to provide increased routability in the clusters (Moctar Abstract).
Meloni-Graf-Moctar does not teach a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch.
a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch.
However, Kommanaboyina does teach a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch (Fig. 1: two crossbar switches connected to another crossbar switch in one system);

Meloni-Graf-Moctar-Kommanaboyina does not teach a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch.
However, Stenstrom does teach a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch (Fig. 5: cache, processor, and memory each connected to an interconnection network; pg. 27 specifies the network can be a crossbar switch).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf-Moctar-Kommanaboyina with the switch configuration of Stenstrom in order to allow multiple processors to share memory data with each other (Stenstrom pg. 26).

As per claim 5, the rejection of claim 2 is incorporated.
Meloni additionally teaches a shared DSP cluster memory coupled to the DSP cluster crossbar switch; and a direct memory access coupled between the shared DSP cluster memory and the system bus (Fig. 2: TCDM is connected to a crossbar switch; DMA is connected between TCDM and interconnect).
Meloni-Graf does not teach a DSP cluster crossbar switch;
a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch;
a first DSP coupled to the first local DSP crossbar switch;
a second DSP coupled to the second local DSP crossbar switch;
a shared DSP cluster memory coupled to the DSP cluster crossbar switch;
Moctar does teach a first local DSP crossbar switch (Fig. 1: each CLB contains intra-cluster crossbars);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf with the crossbar switch of Moctar in order to provide increased routability in the clusters (Moctar Abstract).
Meloni-Graf-Moctar does not teach a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch;
a first DSP coupled to the first local DSP crossbar switch;
a second DSP coupled to the second local DSP crossbar switch;
a shared DSP cluster memory coupled to the DSP cluster crossbar switch;
However, Kommanaboyina does teach a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch (Fig. 1: two crossbar switches connected to another crossbar switch in one system).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf-Moctar with the switch configuration of Kommanaboyina in order to provide increased routability within the clusters. 
Meloni-Graf-Moctar-Kommanaboyina does not teach a first DSP coupled to the first local DSP crossbar switch;
a second DSP coupled to the second local DSP crossbar switch;
a first DSP coupled to the first local DSP crossbar switch; a second DSP coupled to the second local DSP crossbar switch (Fig. 5: cache, processor, and memory each connected to an interconnection network; pg. 27 specifies the network can be a crossbar switch).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf-Moctar-Kommanaboyina with the switch configuration of Stenstrom in order to allow multiple processors to share memory data with each other (Stenstrom pg. 26).

As per claim 20, the rejection of claim 19 is incorporated.
Meloni additionally teaches a shared cluster memory coupled to the cluster communication bus; and a direct memory access coupled between the shared cluster memory and the communication bus (Fig. 2: TCDM is connected to a crossbar switch; DMA is connected between TCDM and interconnect).
Meloni does not teach a global DSP communication bus coupled to the communication bus and coupled to the cluster communication bus of each cluster;
wherein each cluster further includes:
a first local DSP communication bus and a second local DSP communication bus, each of the first and second local DSP communication buses coupled to the cluster communication bus;
a first DSP coupled to the first local DSP communication bus;
a second DSP coupled to the second local DSP communication bus;
Moctar does teach a first local DSP communication bus (Fig. 1: each CLB contains intra-cluster crossbars);

Moctar does not teach a second local DSP communication bus, each of the first and second local DSP communication buses coupled to the cluster communication bus;
a first DSP coupled to the first local DSP communication bus;
a second DSP coupled to the second local DSP communication bus;
However, However, Kommanaboyina does teach a second local DSP communication bus, each of the first and second local DSP communication buses coupled to the cluster communication bus (Fig. 1: two crossbar switches connected to another crossbar switch in one system);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf-Moctar with the switch configuration of Kommanaboyina in order to provide increased routability within the clusters.
Kommanaboyina does not teach a first DSP coupled to the first local DSP communication bus;
a second DSP coupled to the second local DSP communication bus;
However, Stenstrom does teach a first DSP coupled to the first local DSP communication bus;
a second DSP coupled to the second local DSP communication bus (Fig. 5: cache, processor, and memory each connected to an interconnection network; pg. 27 specifies the network can be a crossbar switch).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf-Moctar-Kommanaboyina with the switch configuration of Stenstrom in order to allow multiple processors to share memory data with each other (Stenstrom pg. 26).

9 is rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Graf, and further in view of Jagannathan et al. (“Optimizing Convolutional Neural Network on DSP”).
As per claim 9, the rejection of claim 6 is incorporated.
Meloni-Graf does not teach wherein the non-convolution data processing operations of the DCNN kernel performed by the plurality of DSPs include one or more operations of a group that includes pooling, nonlinear activation, and cross-channel response normalization.
However, Jagannathan does teach wherein the non-convolution data processing operations of the DCNN kernel performed by the plurality of DSPs include one or more operations of a group that includes pooling, nonlinear activation, and cross-channel response normalization (pg. 372: DSP can be used to perform max pooling layer).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf with the max pooling functionality of Jagannathan, as Jagannathan teaches that DSPs can be used to efficiently process pooling layers for CNN processing (see Abstract).

Claims 10, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Du et al. (“ShiDianNao: Shifting Vision Processing Closer to the Sensor”) and further in view of Graf.
As per claim 10, Meloni teaches:
a system on chip (SoC) that implements a deep convolutional neural network architecture (Section II: invention implements CNN using Zynq Z-7045 SoC architecture; Section VII: implementation include DCNN’s), the SoC, including:
an SoC bus (Fig. 2: AXI-based interconnect);
an on-chip memory coupled to the SoC bus (Fig 2: TCDM’s [Tightly Coupled Data Memory] connected to interconnect);
a configurable accelerator framework coupled to the SoC bus, the configurable accelerator framework including a reconfigurable dataflow accelerator fabric that receives the captured images from the imaging sensor for deep convolutional neural network (DCNN) processing by at least one convolution accelerator (Section V: HWCE [Hardware Convolution Engine] acts as a special purpose coprocessor tor receive input image features for CNN processing [see Abstract]; Fig. 3: HWCE is configured like a fabric;  Section I: “The proposed accelerator can be re-conﬁgured at run-time to be used to execute convolutions with different ﬁlter sizes applied with different strides”); and
a plurality of digital signal processors (DSPs) clusters coupled to the SoC bus, the plurality of DSP clusters configured to perform non-convolution operations for the DCNN and arranged to coordinate functionality with the at least one convolution accelerator of the configurable accelerator framework to execute the DCNN (Section I: two processing cores aid the HWCE [and thus the SoP’s] in executing CNN operations by managing data transfers to and from external memory [non-convolution operations]; Fig. 2: the cores are coupled to the interconnect),
wherein the plurality of DSP clusters are configured to perform the non-convolutional data processing operations for a DCNN kernel in parallel with execution of convolutional operations associated for the DCNN kernel by the at least one convolutional accelerator (Section I and Fig. 2: two general processing cores are coupled to the HWCE to perform data transfers and related scheduling [non-convolutional data processing operations]; Section VI and Fig. 7: computation and communication tasks have a complete overlap and thus are done in parallel with each other).
Meloni does not teach: 
A mobile computing device comprising:
an imaging sensor that captures images;
a plurality of digital signal processors (DSPs) clusters.
However, Du does teach:
A mobile computing device comprising (Section 12: “Thanks to its high performance, its low power consumption, as well as its small area, ShiDianNao particularly suits visual applications at mobile ends and wearable devices”):
an imaging sensor that captures images (Fig. 1: image sensors feeding image data to an image processor as a neural network accelerator).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Graf with the mobile imaging sensor of Du in order to perform computer vision algorithms on the go.
	Meloni-Du does not teach:
a plurality of digital signal processors (DSPs) clusters.
However, Graf does teach:
a plurality of digital signal processors (DSPs) clusters  (Section 3: 128 VPE’s [hardware DSP’s] are divided into 4 blocks of 32; Fig. 1: the VPE’s are connected by a PCI bus to a host CPU).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du with the DSP’s of Graf to increase performance in machine learning operations higher than would be possible with simple multiple-core CPU’s (Graf Section 2.3).

As per claim 15, the rejection of claim 10 is incorporated.
Meloni-Du-Graf additionally teaches wherein the reconfigurable dataflow accelerator fabric is configurable at run-time and reconfigurable during execution of the DCNN (Meloni Section I: “The proposed accelerator can be re-conﬁgured at run-time to be used to execute convolutions with different ﬁlter sizes applied with different strides”).

As per claim 16, the rejection of claim 10 is incorporated.
Meloni-Du-Graf additionally teaches wherein the at least one convolution accelerator is configurable at run-time and reconfigurable during execution of the DCNN. (Meloni Section I: “The proposed accelerator can be re-conﬁgured at run-time to be used to execute convolutions with different ﬁlter sizes applied with different strides”).


Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Du and Graf, and further in view of Choudhary.
As per claim 11, the rejection of claim 10 is incorporated.
Meloni-Du-Graf does not teach wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs arranged for communication with each other and for communication with the SoC bus via a DSP cluster crossbar switch.
However, Choudhary does teach wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs arranged for communication with each other and for communication with the SoC bus via a DSP cluster crossbar switch (Fig. 2: clusters of processing elements are able to communicate through a crossbar switch connected to a synchronization bus).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du-Graf with the cluster arrangement of Choudhary in order to allow the clusters to communicate with each other through point-to-point communications (Choudhary pg. 1094).

As per claim 12, the rejection of claim 10 is incorporated.
 comprising:
a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the SoC bus.
However, Choudhary does teach a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the SoC bus (Fig. 2: clusters of processing elements are able to communicate through a crossbar switch connected to a synchronization bus).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du-Graf with the cluster arrangement of Choudhary in order to allow the clusters to communicate with each other through point-to-point communications (Choudhary pg. 1094).

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Du and Graf, and further in view of Moctar, Kommanaboyina, and Stenstrom.
As per claim 13, the rejection of claim 10 is incorporated.
Meloni-Du-Graf does not teach wherein each DSP cluster includes:
a DSP cluster crossbar switch;
a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch;
a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch.
However, Moctar does teach wherein each DSP cluster includes:
a DSP cluster crossbar switch (Fig. 1: each CLB contains intra-cluster crossbars);

Meloni-Du-Graf-Moctar does not teach a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch.
a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch.
However, Kommanaboyina does teach a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch (Fig. 1: two crossbar switches connected to another crossbar switch in one system);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du-Graf-Moctar with the switch configuration of Kommanaboyina in order to provide increased routability within the clusters. 
Meloni-Du-Graf-Moctar-Kommanaboyina does not teach a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch.
However, Stenstrom does teach a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and
a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch (Fig. 5: cache, processor, and memory each connected to an interconnection network; pg. 27 specifies the network can be a crossbar switch).


As per claim 14, the rejection of claim 10 is incorporated.
Meloni additionally teaches a shared DSP cluster memory coupled to the DSP cluster crossbar switch; and a direct memory access coupled between the shared DSP cluster memory and the system bus (Fig. 2: TCDM is connected to a crossbar switch; DMA is connected between TCDM and interconnect).
Meloni-Du-Graf does not teach a DSP cluster crossbar switch;
a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch;
a first DSP coupled to the first local DSP crossbar switch;
a second DSP coupled to the second local DSP crossbar switch;
a shared DSP cluster memory coupled to the DSP cluster crossbar switch;
Moctar does teach a first local DSP crossbar switch (Fig. 1: each CLB contains intra-cluster crossbars);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du-Graf with the crossbar switch of Moctar in order to provide increased routability in the clusters (Moctar Abstract).
Moctar does not teach a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch;
a first DSP coupled to the first local DSP crossbar switch;
a second DSP coupled to the second local DSP crossbar switch;
a shared DSP cluster memory coupled to the DSP cluster crossbar switch;
However, Kommanaboyina does teach a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch (Fig. 1: two crossbar switches connected to another crossbar switch in one system);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du-Graf-Moctar with the switch configuration of Kommanaboyina in order to provide increased routability within the clusters. 
Meloni-Du-Graf-Moctar-Kommanaboyina does not teach a first DSP coupled to the first local DSP crossbar switch;
a second DSP coupled to the second local DSP crossbar switch;
However, Stenstrom does teach a first DSP coupled to the first local DSP crossbar switch; a second DSP coupled to the second local DSP crossbar switch (Fig. 5: cache, processor, and memory each connected to an interconnection network; pg. 27 specifies the network can be a crossbar switch).
It would have been obvious to a person of ordinary skill in the art before the effective filing date fof the invention to modify Meloni-Du-Graf-Moctar-Kommanaboyina with the switch configuration of Stenstrom in order to allow multiple processors to share memory data with each other (Stenstrom pg. 26).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Du and Graf, and further in view of Jagannathan.
As per claim 17, the rejection of claim 10 is incorporated.
wherein the non-convolution operations of the at least one DCNN performed by the plurality of DSPs include one or more operations of a group that includes pooling, nonlinear activation, and cross-channel response normalization.
However, Jagannathan does teach wherein the non-convolution operations of the at least one DCNN performed by the plurality of DSPs include one or more operations of a group that includes pooling, nonlinear activation, and cross-channel response normalization (pg. 372: DSP can be used to perform max pooling layer).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du-Graf with the max pooling functionality of Jagannathan, as Jagannathan teaches that DSPs can be used to efficiently process pooling layers for CNN processing (see Abstract).

Claims 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Du, and further in view of Asghar (US 5768613 A).
As per claim 22, the rejection of claim 1 is incorporated.
While the previously cited art do not teach the claim’s limitations, Asghar does teach:
wherein non-convolution operations associated with the DCNN kernel are synchronized with convolutional operations associated with the DCNN kernel using interrupts (col. 7 55-65: MAC engine can be synchronized with a CPU using interrupts; thus, similarly, the general processing cores can be synchronized with the HWCE).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du with the synchronization of Asghar, so as to increase system responsiveness with parallel operations (see Asghar col. 23 55-60).


While the previously cited art do not teach the claim’s limitations, Asghar does teach:
wherein non-convolution operations associated with the DCNN kernel are synchronized with convolutional operations associated with the DCNN kernel using mailboxes (col. 7 lines 55-65: communication can be synchronized using a mailbox handshaking mechanism; thus, similarly, the general processing cores can be synchronized with the HWCE).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du with the synchronization of Asghar, so as to increase system responsiveness with parallel operations (see Asghar col. 23 55-60).

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Graf, and further in view of Easwaran (US 20150212955 A1).
As per claim 25, the rejection of claim 6 is incorporated.
While the previously cited art do not teach the claim’s limitations, Easwaran teach:
wherein the configurable accelerator framework includes a plurality of control registers and the stream switch, in operation, provides the plurality of DSPs with access to the plurality of control registers during processing of the DCNN kernel to control performing of non-convolution operations by the plurality of DSPs (0019 and Fig. 1: crossbar switch can contain control registers to control interrupts with outside processing cores; thus, the control registers can be used with the general processing cores in Meloni).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du with the registers of Easwaran, so as to allow interrupts for parallel processing (see Easwaran 0019).

26 is rejected under 35 U.S.C. 103 as being unpatentable over Meloni in view of Du and Graf, and further in view of Easwaran (US 20150212955 A1).
As per claim 25, the rejection of claim 10 is incorporated.
While the previously cited art do not teach the claim’s limitations, Easwaran teach:
wherein the configurable accelerator framework includes a plurality of control registers and the stream switch, in operation, provides the plurality of DSPs with access to the plurality of control registers during processing of the DCNN kernel to control performing of non-convolution operations by the plurality of DSPs (0019 and Fig. 1: crossbar switch can contain control registers to control interrupts with outside processing cores; thus, the control registers can be used with the general processing cores in Meloni).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Meloni-Du with the registers of Easwaran, so as to allow interrupts for parallel processing (see Easwaran 0019).




	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HANSOL DOH whose telephone number is (571)272-1293.  The examiner can normally be reached on M - F 7:30 - 4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126