DETAILED ACTION
This office action is in response to the Application No. 16221295 filed on
03/29/2018. Claims 1-20 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.

Response to Arguments
2.	Applicant’s arguments are moot in view of the new grounds of rejection.  The examiner is withdrawing the rejections in the previous office action 04/08/2022 because the applicant amendments necessitated the new grounds of rejection presented in this office action. Accordingly, this action is made final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


3.	Claims 1, 3, 4, 7, 11, 13, 14 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US20180218518 filed on 02/01/2017) in view of Brothers et al (US20160358069)

	Regarding claim 1, Yan teaches a deep learning accelerator (DLA 200 [0026] Figure 2A; A deep learning accelerator (DLA) architecture described herein, exploits weight and/or activation sparsity to reduce energy consumption [0017]), comprising: 
	a plurality of processing elements (PEs) grouped into PE groups to perform convolutional neural network (CNN) computations (processing element (PE) array 240 [0026] Figure 2A; The DLA 200 may be configured to implement convolutional Neural Network (CNNs) algorithms [0033])
	a dispatcher to dispatch input data in an input activation and non-zero weights (In one embodiment, the sequence controller 235 (as dispatcher) broadcasts a set of weights to each PE within the PE array 240 and sequences through sets of input activations before broadcasting another set of weights [0030]; The sequence controller 235 streams single bit values and the associated multi-bit values for the weights to the PE 250 [0039])
	in a plurality of filters of multi-dimensional weights to the PE groups (FIG. 4A illustrates input activations and weights for processing by the PE 250 shown in FIG. 2C, in accordance with one embodiment. The input activations and weights are organized as matrices, with the input activations in a H×W matrix and the weights in an S×R matrix [0046]; Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations [0047]; The weights 410 for K filters are organized as K three-dimensional S×R×C matrices [0048]) 
	while skipping zero weights (In one embodiment, the zero gating control unit 270 prevents the input activation registers 262 and weight registers 260 from updating the input activation and weight values output to the input registers 275 when either the input activation or the weight equals zero [0042])
	according to a control mask (compaction engine 215 (as control mask) transmits single bit signal [0020]; In one embodiment, the single bit signal controls an enable for a location where the associated weight would be stored [0039] and in one embodiment, the single bit signal controls an enable for a location where the associated input activation would be stored [0040]); and  
	and a buffer memory to store the control mask (A compaction engine 215 (as control mask) within the memory interface 205 (as buffer memory) [0027], Figure 2a; The compaction engine 215 may be configured to convert the input data 305 to a compact data format 300. ... A zero bitmask 310 (in the compaction engine) is generated indicating positions of non-zero values and zeros in the input data 305. As shown in FIG. 3, the positions of non-zero values are indicated by bits set TRUE (e.g., logic one) and the positions of zeros are indicated by bits set FALSE (e.g., logic zero). In another embodiment, the positions of zeroes are indicated by bits set TRUE and the positions of non-zero values are indicated by bits set FALSE [0044])
	wherein the PE group applies a corresponding filter on the input activation to produce output data of a corresponding output channel in the output (In one embodiment, the PE array 240 is configured to perform convolution operations on the weights and input activations [0031]; The accumulator 245 within the DLA 200 accumulates the results generated by the PE array 240 to complete the convolution operation by generating output activations [0032]; Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations [0047])
	Yan does not explicitly teach wherein the control mask is shared by the filters and specifies positions of the zero where positions are the same for each filter
	Brothers teaches wherein the control mask is shared by the filters and specifies positions of the zero where positions are the same for each filter (Alignment mask generator 530 is configured to determine whether the entire 4×4 region, at each alignment, is zero based upon the component mask read from input data FIFO 526, ... Through control signal 558, alignment mask generator 530 informs weight application controller 532 whether the input data is zero for all alignments for a given weight via control signal 558. [0088]; The right-hand side of FIG. 7 illustrates exemplary output from weight decompressor 514. [0099]. Weight decompressor 514 is further configured to store the 16-bit weight mask indicating the x, y positions ... For example, the 16-bit mask indicates those weights for the 4×4 region to be processed that have weights with a zero value (e.g., using a zero in the corresponding bit position of the mask [0083]. The Examiner notes that the zero valued weights is shared with the last column of the two filters on the right-hand side filter) 
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Yan to incorporate the teachings of Brothers for the benefit of saving power and improving performance in a neural network (Brothers [0026])

	Regarding claim 3, Modified Yan teaches the deep learning accelerator of claim 1, Yan teaches wherein the control mask specifies the positions of the zero weights by identifying a given height coordinate and a given width coordinate of the multi-dimensional weights as being zero values. (The weights and activations are organized as matrices, such as the 2×4 matrix of the input data 305 ([0044], Figures 3a and 3b) and the input activations and weights are organized as matrices, with the input activations in a H×W matrix [0046] and the height coordinate and weight coordinate having zero values (input data 305 [0044], Figures 3a and 3b))

	Regarding claim 4, Yan teaches the deep learning accelerator of claim 1, Yan teaches wherein the control mask specifies the positions of the zero weights by identifying one or more of a given channel, a given height coordinate and a given width coordinate of the multi-dimensional weights as being zero values. (where the compacted data sequence comprises at least one single bit signal indicating that at least one multi-bit value equals zero…The compacted data sequence may represent input activation values [0021] for input activation planes, which are referred to as input channels [0047] which means input channel has zero value; the input activations in a H×W matrix [0046] and the height coordinate and weight coordinate having zero values (input data 305 [0044], Figures 3a and 3b); The input activations 405 are organized as a three-dimensional H×W×C matrix. The rectangular volumes in the input activations and weight matrices illustrate corresponding elements of the input activations and weights that are multiplied and accumulated [0048])

	Regarding claim 7, Modified Yan teaches the deep learning accelerator of claim 1, Yan teaches wherein the dispatcher is further operative to skip dispatching multiply-and-accumulate (MAC) operations that use the zero weights ( Because multiplication by zero just results in a zero, the zero gating control unit 270 switches the output of the multiplexer 285 between the output of the multiplier 280 and zero when at least one of the operands equals zero [0041])

	Regarding claim 11, Yan teaches a method for accelerating deep learning operations, (a method 100 for zero gating in a deep learning accelerator, in accordance with one embodiment. Although method 100 is described in the context of a processing element within a DLA [0018], Fig, 1A) comprising:
	dispatching input data in an input activation and non-zero weights (In one embodiment, the sequence controller 235 (as dispatcher) broadcasts a set of weights to each PE within the PE array 240 and sequences through sets of input activations before broadcasting another set of weights [0030]; The sequence controller 235 streams single bit values and the associated multi-bit values for the weights to the PE 250 [0039])
	in a plurality of filters of multi-dimensional weights to a plurality of processing element (PE) groups (FIG. 4A illustrates input activations and weights for processing by the PE 250 shown in FIG. 2C, in accordance with one embodiment. The input activations and weights are organized as matrices, with the input activations in a H×W matrix and the weights in an S×R matrix [0046]; Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations [0047]; The weights 410 for K filters are organized as K three-dimensional S×R×C matrices [0048]) 
	while skipping zero weights (In one embodiment, the zero gating control unit 270 prevents the input activation registers 262 and weight registers 260 from updating the input activation and weight values output to the input registers 275 when either the input activation or the weight equals zero [0042])
	according to a control mask, (compaction engine 215 (as control mask) transmits single bit signal [0020]; In one embodiment, the single bit signal controls an enable for a location where the associated weight would be stored [0039] and in one embodiment, the single bit signal controls an enable for a location where the associated input activation would be stored [0040]); 
	 performing, by each of the PE groups, convolutional neural network (CNN) computations (processing element (PE) array 240 [0026] Fig. 2A; The DLA 200 may be configured to implement convolutional Neural Network (CNNs) algorithms [0033])
	by applying a corresponding filter on the input activations; (FIG. 4B illustrates input activations and filter weights for a single CNN layer, in accordance with one embodiment. The input activations 405 are organized as a three-dimensional H×W×C matrix. The weights 410 for K filters are organized as K three-dimensional S×R×C matrices [0048]; and Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations [0047]) and
	generating, by each of the PE groups, output data of corresponding output channel in an output activation (In one embodiment, the PE array 240 is configured to perform convolution operations on the weights and input activations [0031]; The accumulator 245 within the DLA 200 accumulates the results generated by the PE array 240 to complete the convolution operation by generating output activations [0032]; Multiple filters (K) can be applied to the same body of input activations to produce K output channels of output activations [0047])
	Yan does not explicitly teach wherein the control mask is shared by the filters and specifies positions of the zero where positions are the same for each filter
	Brothers teaches wherein the control mask is shared by the filters and specifies positions of the zero where positions are the same for each filter (Alignment mask generator 530 is configured to determine whether the entire 4×4 region, at each alignment, is zero based upon the component mask read from input data FIFO 526, ... Through control signal 558, alignment mask generator 530 informs weight application controller 532 whether the input data is zero for all alignments for a given weight via control signal 558. [0088]; The right-hand side of FIG. 7 illustrates exemplary output from weight decompressor 514. [0099]. Weight decompressor 514 is further configured to store the 16-bit weight mask indicating the x, y positions ... For example, the 16-bit mask indicates those weights for the 4×4 region to be processed that have weights with a zero value (e.g., using a zero in the corresponding bit position of the mask [0083]. The Examiner notes that the zero valued weights is shared with the last column of the two filters on the right-hand side filter) 
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Yan to incorporate the teachings of Brothers for the benefit of saving power and improving performance in a neural network (Brothers [0026])

	Regarding claim 13, Modified Yan teaches the method of claim 11, Yan teaches wherein the control mask specifies the positions of the zero weights by identifying a given height coordinate and a given width coordinate of the multi-dimensional weights as being zero values. (The weights and activations are organized as matrices, such as the 2×4 matrix of the input data 305 ([0044], Figures 3a and 3b) and the input activations and weights are organized as matrices, with the input activations in a H×W matrix [0046] and the height coordinate and weight coordinate having zero values (input data 305 [0044], Figures 3a and 3b))

	Regarding claim 14, Modified Yan teaches the method of claim 11, Yan teaches wherein the control mask specifies the positions of the zero weights by identifying one or more of a given channel, a given height coordinate and a given width coordinate of the multi-dimensional weights as being zero values (where the compacted data sequence comprises at least one single bit signal indicating that at least one multi-bit value equals zero…The compacted data sequence may represent input activation values [0021] for input activation planes, which are referred to as input channels [0047] which means input channel has zero value; the input activations in a H×W matrix [0046] and the height coordinate and weight coordinate having zero values (input data 305 [0044], Figures 3a and 3b); The input activations 405 are organized as a three-dimensional H×W×C matrix. The rectangular volumes in the input activations and weight matrices illustrate corresponding elements of the input activations and weights that are multiplied and accumulated [0048])

	Regarding claim 17, Modified Yan teaches the method of claim 11, Yan teaches wherein the dispatcher is further operative to skip dispatching multiply-and-accumulate (MAC) operations that use the zero weights (Because multiplication by zero just results in a zero, the zero gating control unit 270 switches the output of the multiplexer 285 between the output of the multiplier 280 and zero when at least one of the operands equals zero [0041])

4.	Claims 2 and 12 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US20180218518 filed on 02/01/2017) in view of Brothers et al (US20160358069) and further in view of Judd et al (US20190205740 filed 06/14/2017)

	Regarding claim 2, Modified Yan teaches the deep learning accelerator of claim 1, Brothers teaches the zero valued weights of the control mask is shared by filters (Alignment mask generator 530 is configured to determine whether the entire 4×4 region, at each alignment, is zero based upon the component mask read from input data FIFO 526, ... Through control signal 558, alignment mask generator 530 informs weight application controller 532 whether the input data is zero for all alignments for a given weight via control signal 558. [0088]; The right-hand side of FIG. 7 illustrates exemplary output from weight decompressor 514. [0099]. Weight decompressor 514 is further configured to store the 16-bit weight mask indicating the x, y positions ... For example, the 16-bit mask indicates those weights for the 4×4 region to be processed that have weights with a zero value (e.g., using a zero in the corresponding bit position of the mask [0083]. The Examiner notes that the zero valued weights is shared with the last column of the two filters on the right-hand side filter)
	Modified Yan does not explicitly teach wherein the number of PE groups, which is the same number of filters sharing the control mask is adjustable
	Judd teaches wherein the number of PE groups, which is the same number of filters sharing the control mask is adjustable (The number of neuron lanes and filters per unit are design time parameters that could be changed [0060]; The number of activations and filters per unit are design parameters which can be adjusted accordingly. It will be assumed that both are 16 for this further embodiment of the present invention which skips ineffectual weights [0131])
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Modified Yan to incorporate the teachings of Judd for the benefit of avoiding storing or communicating the zero activations (Judd [0120])

	Regarding claim 12, Modified Yan teaches the method of claim 11, Brothers teaches the zero valued weights of the control mask is shared by filters (Alignment mask generator 530 is configured to determine whether the entire 4×4 region, at each alignment, is zero based upon the component mask read from input data FIFO 526, ... Through control signal 558, alignment mask generator 530 informs weight application controller 532 whether the input data is zero for all alignments for a given weight via control signal 558. [0088]; The right-hand side of FIG. 7 illustrates exemplary output from weight decompressor 514. [0099]. Weight decompressor 514 is further configured to store the 16-bit weight mask indicating the x, y positions ... For example, the 16-bit mask indicates those weights for the 4×4 region to be processed that have weights with a zero value (e.g., using a zero in the corresponding bit position of the mask [0083]. The Examiner notes that the zero valued weights is shared with the last column of the two filters on the right-hand side filter)
	Modified Yan does not explicitly teach wherein the number of PE groups, which is the same number of filters sharing the control mask is adjustable
	Judd teaches wherein the number of PE groups, which is the same number of filters sharing the control mask is adjustable (The number of neuron lanes and filters per unit are design time parameters that could be changed [0060]; The number of activations and filters per unit are design parameters which can be adjusted accordingly. It will be assumed that both are 16 for this further embodiment of the present invention which skips ineffectual weights [0131])
	The same motivation to combine dependent claim 2 applies here.


5.	Claims 5, 6, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US20180218518 filed on 02/01/2017) in view of in view of Brothers et al (US20160358069) and further in view of Dally et al. (US20180046916) 

	Regarding claim 5, Modified Yan teaches the deep learning accelerator of claim 1, Yan teaches wherein each PE group includes multiple processing elements, (processing element (PE) array 240 [0026] Figure 2a).
	Modified Yan does not explicitly teach perform the CNN computations in parallel on different portions of the input activation. 
	Dally teaches performing the CNN computations in parallel on different portions of the input activation (A CNN's dataflow defines how the loops are ordered, partitioned, and parallelized [0051]; Parallelism within a PE 210 is accomplished by processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325. F×I products are generated each processing cycle by each PE 210 in the SCNN accelerator 200 [0066])
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Yan to incorporate the teachings of Dally for the benefit of sparse CNN (SCNN) accelerator that exploits weight and/or activation sparsity to reduce energy consumption and improve processing throughput (Dally, [0030])

	Regarding claim 6, Modified Yan teaches the deep learning accelerator of claim 1, Modified Yan does not explicitly teach wherein the number of PE groups is less than or equal to the number of output channels in the output activation.
	Dally teaches wherein the number of PE groups is less than the number of output channels in the output activation (Fig. 2A, the first PE 210 has more than one output activations (OA), that is first PE 210 outputs (OA) to the next PE 210 and to the PE 210 below it. Therefore, number of PE groups is less than the output activations (OA))
	The same motivation to combine as dependent claim 5 applies here.

	Regarding claim 15, Modified Yan teaches the method of claim 11, Yan teaches multiple processing elements in each PE group (processing element (PE) array 240 [0026] Figure 2a)
	Modified Yan does not explicitly teach further comprising: performing the CNN computations in parallel on different portions of the input activation 
	Dally teaches further comprising: performing the CNN computations in parallel on different portions of the input activation by multiple processing elements in each PE group (A CNN's dataflow defines how the loops are ordered, partitioned, and parallelized [0051]; Parallelism within a PE 210 is accomplished by processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325. F×I products are generated each processing cycle by each PE 210 in the SCNN accelerator 200 [0066])
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Modified Yan to incorporate the teachings of Dally for the benefit of sparse CNN (SCNN) accelerator that exploits weight and/or activation sparsity to reduce energy consumption and improve processing throughput (Dally, [0030])

	Regarding claim 16, Modified Yan teaches the method of claim 11, Modified Yan does not explicitly teach wherein the number of PE groups is less than or equal to the number of output channels in the output activation.
	Dally teaches wherein the number of PE groups is less than or equal to the number of output channels in the output activation (Fig. 2A, the first PE 210 has more than one output activations (OA), that is first PE 210 outputs (OA) to the next PE 210 and to the PE 210 below it. Therefore, number of PE groups is less than the output activations (OA))
	The same motivation to combine dependent claim 15 applies here.
	
6.	Claims 8, 9, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US20180218518 filed on 02/01/2017) in view of Brothers et al (US20160358069) and further in view of Aydonat et al. (US20170103299) 

	Regarding claim 8, Modified Yan teaches the deep learning accelerator of claim 1, Modified Yan does not explicitly teach wherein the processing elements are further operative to perform fully-connected (FC) neural network computations, the deep learning accelerator further comprising: a buffer loader operative to read FC input data from a memory, and to selectively read FC weights from the memory according to values of the FC input data.
	Aydonat teaches wherein the processing elements are further operative to perform fully-connected (FC) neural network computations, (When implementing a fully-connected layer using one or more of the processing elements, coefficient data is treated as non-repeated data and is stored in on-chip buffers 951-954 [0069])
	the deep learning accelerator (The CNN accelerator also includes a plurality of processing elements … that implement a fully connected layer [0010]) further comprising: 
	a buffer loader operative to read FC input data from a memory, (Feature data is treated as repeated data. The input features are read from external memory into the cache 1010 [0069) and
	to selectively read FC weights from the memory according to values of the FC input data. (The coefficient data (as weights) is treated as non-repeated data since different sets of coefficient data are used to compute different output features of each image. Sets of coefficient data are read once from external memory, stored on on-chip buffers, and streamed into processing elements … Since the same coefficient data is used for different images, each processing element receives the same coefficient data every cycle to apply to different feature data that belong to different images and to compute different output features of different image [0069]) 
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Modified Yan to incorporate the teachings of Aydonat for the benefit of implement a fully connected layer in response to the change in the data flow (Aydonat, Abstract)

	Regarding claim 9, Modified Yan teaches the deep learning accelerator of claim 8, Yan teaches the first subset corresponding to a nonzero FC input channel and the second subset corresponding to a zero FC input channel (At step 160, non-zero values are extracted from the compacted data sequence. In one embodiment, the zero and non-zero values encoded in the compacted data sequence represent input activations. In one embodiment, the zero and non-zero values encoded in the compacted data sequence represent weights [0024])
	Aydonat teaches wherein the buffer loader is operative to: read a first subset of the FC weights from the memory without reading a second subset of the FC weights from the memory, (When implementing a fully-connected layer using one or more of the processing elements, coefficient data (as weights) is treated as non-repeated data and is stored in on-chip buffers 951-954 [0069];The coefficient data is treated as non-repeated data since different sets of coefficient data are used to compute different output features of each image. Sets of coefficient data are read once from external memory [0069])
	The same motivation to combine dependent claim 8 applies here.

	Regarding claim 18, Modified Yan teaches the method of claim 11, Modified Yan does not explicitly teach wherein the processing elements are further operative to perform fully-connected (FC) neural network computations, the deep learning accelerator further comprising: reading FC input data from a memory, and to selectively reading FC weights from the memory according to values of the FC input data.
	Aydonat teaches wherein the processing elements are further operative to perform fully-connected (FC) neural network computations, the method further comprising: (When implementing a fully-connected layer using one or more of the processing elements, coefficient data is treated as non-repeated data and is stored in on-chip buffers 951-954 [0069]) 
	 reading FC input data from a memory, (Feature data is treated as repeated data. The input features are read from external memory into the cache 1010 [0069) and
	selectively reading FC weights from the memory according to values of the FC input data. (The coefficient data (as weights) is treated as non-repeated data since different sets of coefficient data are used to compute different output features of each image. Sets of coefficient data are read once from external memory, stored on on-chip buffers, and streamed into processing elements … Since the same coefficient data is used for different images, each processing element receives the same coefficient data every cycle to apply to different feature data that belong to different images and to compute different output features of different image [0069]) 
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Modified Yan to incorporate the teachings of Aydonat for the benefit of implement a fully connected layer in response to the change in the data flow (Aydonat, Abstract)

	Regarding claim 19, Modified Yan teaches the method of claim 18, Yan teaches the first subset corresponding to a nonzero FC input channel and the second subset corresponding to a zero FC input channel (At step 160, non-zero values are extracted from the compacted data sequence. In one embodiment, the zero and non-zero values encoded in the compacted data sequence represent input activations. In one embodiment, the zero and non-zero values encoded in the compacted data sequence represent weights [0024])
	Aydonat teaches further comprising: reading a first subset of the FC weights from the memory without reading a second subset of the FC weights from the memory, (The coefficient data is treated as non-repeated data since different sets of coefficient data are used to compute different output features of each image. Sets of coefficient data are read once from external memory [0069])
	The same motivation to combine dependent claim 18 applies here.

7.	Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US20180218518 filed on 02/01/2017) in view of Brothers et al (US20160358069) in view of Aydonat et al. (US20170103299) and further in view of Seo et al. (US20190164538 filled on 07/27/2017) 

	Regarding claim 10, Modified Yan teaches the deep learning accelerator of claim 8, Yan teaches wherein the dispatcher is further operative to: (In one embodiment, the sequence controller 235 (as dispatcher) broadcasts a set of weights to each PE within the PE array 240 and sequences through sets of input activations before broadcasting another set of weights [0030]; The sequence controller 235 streams single bit values and the associated multi-bit values for the weights to the PE 250 [0039])
	the zero FC weights to the processing elements for FC neural network computations (In one embodiment, a compacted data sequence for input to a PE is received by the DLA, where the compacted data sequence comprises at least one single bit signal indicating that at least one multi-bit value equals zero, and the single bit signal is transmitted to the PE in lieu of the multi-bit value [0021])
	 Modified Yan does not explicitly teach identify zero FC weights in the first subset; and dispatch nonzero FC weights in the first subset to the processing elements, without dispatching the second subset of the FC weights.
	Seo teaches identify zero FC weights in the first subset; (Cij is a binary matrix indicating each non-zero weight and each zero weight in a fully connected weight matrix, such as the fully connected weight matrix 18 in the DNN 10 [0055]) and 
	dispatch nonzero FC weights in the first subset to the processing elements, without dispatching the second subset of the FC weights (During the DNN training for the DNN 10 and the DNN 20, only the non-zero weight identified by Cij will be updated. In this regard, Cij ensures that only the weights present in the network (e.g., Cij=1) are updated [0055])
	It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the accelerator of Modified Yan to incorporate the teachings of Seo for the benefit of providing the DNN in an efficient hardware implementation without sacrificing accuracy of the DNN application (Seo, Abstract)

	Regarding claim 20, Modified Yan teaches the method of claim 18, Yan teaches the zero FC weights to the processing elements for FC neural network computations (In one embodiment, a compacted data sequence for input to a PE is received by the DLA, where the compacted data sequence comprises at least one single bit signal indicating that at least one multi-bit value equals zero, and the single bit signal is transmitted to the PE in lieu of the multi-bit value [0021])
	Modified Yan does not explicitly teach further comprising: identifying zero FC weights in the first subset; and dispatching nonzero FC weights in the first subset to the processing elements without dispatching the second subset of the FC weights and the zero FC weights to the processing elements for FC neural network computations.
	Seo teaches further comprising: identifying zero FC weights in the first subset; (Cij is a binary matrix indicating each non-zero weight and each zero weight in a fully connected weight matrix, such as the fully connected weight matrix 18 in the DNN 10 [0055]) and 
	dispatching nonzero FC weights in the first subset to the processing elements without dispatching the second subset of the FC weights (During the DNN training for the DNN 10 and the DNN 20, only the non-zero weight identified by Cij will be updated. In this regard, Cij ensures that only the weights present in the network (e.g., Cij=1) are updated [0055]) 
	The same motivation to combine dependent claim 10 applies here.


Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/M.G./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121