DETAILED ACTION
1.	This office action is in response to the Application No. 16816453 filed on 03/12/2020. Claims 1-21 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



3.	Claims 1-8, 11 and 14-21 are rejected under 35 U.S.C. 103 as being unpatentable over Duong et al. (US11210586 filed 06/28/2019) in view of Chen et al (US20210182077 filed 09/13/2018)

	Regarding claim 1, Duong teaches a neural network circuit (FIG. 52 is an example of an architecture 5200 of an electronic device that includes the neural network integrated circuit of some embodiments, col 84, lines 18-20)
	 for decoding weights of a neural network, the neural network circuit comprising:
a weight memory configured to store encoded weights for the neural network, (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55. Examiner notes that filter slice buffers is the weight memory) 
	the encoded weights including an index weight word; (This decoder, in some embodiments, (i) aligns the stored additional weight data correctly in the filter slice buffer for the non-zero weights and (ii) fills in additional weight data for the zero-value weights. The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 33-44. Examiner notes that the weight map is the index weight word) and
	a decompression logic circuit configured to: retrieve the encoded weights from the weight memory; (encoded weight data is read from memory and provided in blocks of data (as determined by the memory read controller, including the cache as shown in FIG. 22) to the filter slice buffer controller 3100, col 60, lines 26-29, Fig. 31)
	decode the encoded weight using the index weight word (In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block, col 10, lines 40-44)
	to obtain a sequence of one or more non-pruned weight words (This decoder, in some embodiments, (i) aligns the stored additional weight data correctly in the filter slice buffer for the non-zero weights and (ii) fills in additional weight data for the zero-value weights. The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 33-44) and  
	the non-pruned weight words including non-zero-value weight words; (The filter slice buffers of some embodiments store a full set of weight data (e.g., a non-zero map bit, a positive/negative bit, and multiplexer select bits) for each of the input multiplexers, irrespective of whether that input multiplexer receives an input with a non-zero corresponding weight or not (this still reduces the amount of weight data, from data for every weight in a filter to data for only approximately one-fourth of the weights in the filter). As such, decoding circuitry is used to expand the limited stored weight data into a full set of weight data for each input multiplexer, col 10, lines 22-31) and
	provide the sequence of the non-pruned weight words to a plurality of input-weight multipliers (The process 1800 then computes (at 1815) partial dot products in the cores. As described above, the activation values loaded into the activation window buffers in each of the active cores are multiplied by their corresponding weight values loaded into the filter slice buffers of these cores. In some embodiments, the size of the partial dot products is reduced using the wiring structure shown in FIGS. 16 and 17, and with ternary weight values of {0, 1, −1}, the multiplication is handled by the ternary MAC circuits shown in this figure, col 43, lines 3-12)
	Duong does not explicitly teach one or more pruned weight words, the pruned weight words including zero-value weight words, and provide the sequence of the pruned weight words to a plurality of input-weight multipliers
	Chen teaches one or more pruned weight words, the pruned weight words including zero-value weight words, and provide the sequence of the pruned weight words to a plurality of input-weight multipliers (selecting M weights from the weights of the neural network through a sliding window, where M is an integer greater than 1 [0669]; and when the M weights satisfy a preset condition, setting all or part of the M weights to zero to obtain the pruned weights [0670]; The first part (the multiplier) multiplies input data 1 and input data 2 to obtain an output result [3615]; The multiplier 7 is configured to calculate k*x. [3891]; the domain conversion component 10 is a domain conversion component, and includes three inputs x, i, and j [3900];The slope array storage unit and the intercept array storage unit store the straight slope (i.e., K) [3904])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Duong to incorporate the teachings of Chen for the benefit of performing a coarse-grained pruning operation on the weights of the neural network to obtain the pruned weights [0613] so that the subsequent storage and access to values and the subsequent operation amount may be reduced, which may improve operating efficiency and reduce power consumption. (Chen [2937])
	
	Regarding claim 2, Modified Duong teaches the neural network circuit of claim 1, Duong teaches wherein the index weight word includes a plurality of bits, each bit having a first bit value or a second bit value, the first bit value indicating a pruned weight word in the sequence, the second bit value indicating a non-pruned weight word in the sequence (in FIG. 16, the non-zero weight map 3010 includes 36 bits. For a chip fabric with partial dot product computation circuits as shown in FIG. 17 (i.e., having redundant input multiplexers to better ensure that all of the inputs with non-zero weights are mapped to different input multiplexers), the non-zero weight map includes 40 bits. Each bit of the non-zero weight map 3010 specifies whether the corresponding input multiplexer receives an input with a non-zero corresponding weight value (with the bit set to 1 to indicate this case, and with the bit set to 0 to indicate that none of the inputs received by the multiplexer have non-zero corresponding weight values), col 59, lines 30-41)
	 
	Regarding claim 3, Modified Duong teaches the neural network circuit of claim 1, 	Duong teaches wherein the decompression logic circuit is configured to decode the encoded weights within a single clock cycle. (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55;  As shown, the filter slice buffers include a filter slice buffer decoder 3405, a set of primary filter slice buffers 3410, a set of secondary filter slice buffers 3415, and an output multiplexer 3420 that selects between outputting the primary or secondary filter slice buffers for a given clock cycle, col 63, lines 56-61) 

	Regarding claim 4, Modified Duong teaches the neural network circuit of claim 1, 	Duong teaches wherein the decompression logic circuit includes a weight word decoder, the weight word decoder including: (The filter slice buffer controller 3100 includes a weight decoder circuit that decodes the weight data (i.e., expands the weight data to include weight information (indicating whether the weight is positive/negative/zero), col 60, lines 37-41 ;This weight decoder circuitry 3200 assumes that each filter slice buffer holds weight data for 40 inputs (i.e., the adder trees include 40 input multiplexers), ... The weight decoder circuitry 3200 receives the weight data stored in the weight memory and decodes this data to be formatted for storage in one of the filter slice buffers (the filter slice identifier is also sent to the filter slice buffers so that the decoded data is routed to the correct filter slice buffer), col 60-61, lines 61-66 and 1-6)
	 a plurality of shifters; (two programmable shift registers 2315 and 2320, Fig 23, col 49, lines 40-41; In some embodiments, the truncator 1235 receives (as output from the right bit shifter 1230) more bits than it outputs, col 35,lines 10-12; The bit shifters 4135 in the post-processing units are configured to shift this incoming second dot product left by 4 bits in this second clock cycle, col 72, lines 20-22)
	a plurality of digital logic gates coupled to the plurality of shifters; and (The AND gate 1305 enables this first dot product input, while the AND gate 1310 gates the second dot product to 0. However, in other situations, the adder 1315, left-shift operator 1320, and adder 1325 enable the dot product calculation for a neural network node to be completed and provided to the other post-processing operations. In addition, the left shift circuit 1320 can also be used to align a dot product to ensure that the binary point is in the correct location for the input value, col 36, lines 5-13)
	an index word decoder coupled to the plurality of shifters and the plurality of digital logic gates, (a mask decoder 2310, two programmable shift registers 2315 and 2320, a multiplexer 2325, and an AND gate 2330, col 49, lines 40-43) 
	the index word decoder configured to control the plurality of shifters and the plurality of digital logic gates based on the index weight word. (However, the register block also receives an enable signal (from the mask decoder 2310), and when this signal is 0 the clock signal is gated. Thus, the enable bit to the mask decoder 2310 can be used to prevent the entire set of register blocks from shifting their activation values (e.g., while waiting for a memory read operation to complete). The mask signal from the mask decoder 2310 can also be used to prevent some register blocks from shifting their output when the rest of the programmable shift register executes a shift operation, so that unused activation register blocks will not consume the power required to change their values, col 50, lines 57-67)

	Regarding claim 5, Modified Duong teaches the neural network circuit of claim 4, Duong teaches wherein the plurality of shifters are connected in parallel to the index word decoder (FIG. 39 conceptually illustrates a process 3900 of some embodiments for computing a dot product (or a partial dot product). This process 3900 is performed by an IC of some embodiments (e.g., by a partial dot product computation circuit, by a set of such circuits that compute a complete dot product, etc.) that uses ternary MACs such as that shown in FIG. 36, which pass, for each input value, (i) the value zero if the corresponding weight value is zero, col 67, lines 26-33; The process 3900 then determines what value to pass to the dot product computation for each input value. As shown, the process selects (at 3910) an input value. It should be understood that the process 3900 is a conceptual illustration of the process performed by the IC. While shown as an iterative process that selects each input value in process, the actual operation of some embodiments performs these operations in parallel (e.g., in the same clock cycle) across numerous ternary MACs (which each handle two inputs), col 67, lines 45-53)

	Regarding claim 6, Modified Duong teaches the neural network circuit of claim 4, Duong teaches wherein each of the plurality of shifters is configured to receive the non-pruned weight words (some embodiments enable dot products to be double the size of the standard quantized output (e.g., 8-bit rather than 4-bit) by using dot products from multiple cycles and bit-shifting the first set of input data, col 4, lines 29-32; Similarly, each primary weight value buffer can hold the specified number of weight values (which is the number of inputs to which the input values are reduced), col 11, lines 45-48; Furthermore, for larger input and output values (e.g., 8-bit input and output values), in which the dot product input processing circuit 1205 left shifts the dot product of the most significant bits of the inputs (e.g., by 4 bits), the bias factor has to add a larger amount for the negative weights. For the 8-bit case (in which the dot product of the weights with the most significant nibble of the inputs is shifted by 4 bits), the bias factor adds 17 for each negative weight), col 34, lines 24-28) and 
	a control signal from the index word decoder that controls a shift operation applied to the non-pruned weight words. (The mask signal from the mask decoder 2310 can also be used to prevent some register blocks from shifting their output when the rest of the programmable shift register executes a shift operation, so that unused activation register blocks will not consume the power required to change their values, col 50, lines 63-67; some embodiments enable dot products to be double the size of the standard quantized output (e.g., 8-bit rather than 4-bit) by using dot products from multiple cycles and bit-shifting the first set of input data, col 4, lines 29-32;  weight data specifies which of its inputs to select (i.e., the input value corresponding to the non-zero weight value), col 6, lines 8-11)

	Regarding claim 7, Modified Duong teaches the neural network circuit of claim 4, Duong teaches wherein each of the digital logic gates is configured to receive a control signal from the index word decoder that controls a Boolean operation executed by a respective digital logic gate (However, the register block also receives an enable signal (from the mask decoder 2310), and when this signal is 0 the clock signal is gated, col 50, lines 57-59)

	
	Regarding claim 8, Modified Duong teaches the neural network circuit of claim 4, wherein the plurality of digital logic gates include a first digital logic gate coupled to an output of a first shifter of the plurality of shifter, and a second digital logic gate coupled to an output of a second shifter of the plurality of shifters. (The AND gate 1305 enables this first dot product input, while the AND gate 1310 gates the second dot product to 0. However, in other situations, the adder 1315, left-shift operator 1320, and adder 1325 enable the dot product calculation for a neural network node to be completed and provided to the other post-processing operations, col 36, lines 5-11)

	Regarding claim 11, Duong teaches a device comprising: a neural network configured to receive a set of inputs and generate a set of outputs, (FIG. 52 is an example of an architecture 5200 of an electronic device that includes the neural network integrated circuit of some embodiments, col 84, lines 18-20; FIG. 1 illustrates an example of a multi-layer machine-trained network of some embodiments. This figure illustrates a feed-forward neural network 100 that has multiple layers of processing nodes 102 (also called neurons). In all but the first (input) and last (output) layer, each node 102 receives two or more outputs of nodes from earlier processing node layers and provides its output to one or more nodes in subsequent layers. The output of the node (or nodes) in the last layer represents the output of the network 100, col 17, lines 33-41)  
	a weight memory configured to store encoded weights for the neural network; (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55. Examiner notes that filter slice buffers is the weight memory) 
and
	an accelerator configured to execute the neural network, the accelerator including: (FIG. 4 conceptually illustrates the neural network computation fabric 400 (also referred to as the chip fabric) of some embodiments. Col 21, lines 66-67)
	a decompression logic circuit configured to retrieve the encoded weights from the weight memory, (encoded weight data is read from memory and provided in blocks of data (as determined by the memory read controller, including the cache as shown in FIG. 22) to the filter slice buffer controller 3100, col 60, lines 26-29, Fig. 3) and 
decode the encoded weights (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55. 
to obtain a sequence of one or more pruned weight words and one or more non-pruned weight words (This decoder, in some embodiments, (i) aligns the stored additional weight data correctly in the filter slice buffer for the non-zero weights and (ii) fills in additional weight data for the zero-value weights. The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 33-44)
the non-pruned weight words including non-zero-value weight words; a plurality of input-weight multipliers configured to receive the sequence of the pruned weight words and the non-pruned weight words (The process 1800 then computes (at 1815) partial dot products in the cores. As described above, the activation values loaded into the activation window buffers in each of the active cores are multiplied by their corresponding weight values loaded into the filter slice buffers of these cores. In some embodiments, the size of the partial dot products is reduced using the wiring structure shown in FIGS. 16 and 17, and with ternary weight values of {0, 1, −1}, the multiplication is handled by the ternary MAC circuits shown in this figure, col 43, lines 3-12) and
	Duong does not explicitly teach the set of inputs including speech data, the set of outputs including one or more potential speech commands that correspond to the speech data; the pruned weight words including zero-value weight words, a plurality of input-weight multipliers configured to receive the sequence of the pruned weight words 
Chen teaches the set of inputs including speech data, the set of outputs including one or more potential speech commands that correspond to the speech data; (algorithm models to perform speech recognition on the input speech information so as to output target information after obtaining a recognition result. The present disclosure does not restrict an output form of the target information. For instance, the target information may be output as text [2186]; In speech recognition, an audio can be split into a plurality of segments and placed into different processing units for processing to accelerate feature extraction or even implement feature extraction in real time [4185])
the pruned weight words including zero-value weight words, one or more pruned weight words, the pruned weight words including zero-value weight words, to a plurality of input-weight multipliers (selecting M weights from the weights of the neural network through a sliding window, where M is an integer greater than 1 [0669]; and when the M weights satisfy a preset condition, setting all or part of the M weights to zero to obtain the pruned weights [0670]; The multiplier 7 is configured to calculate k*x. [3891]; the domain conversion component 10 is a domain conversion component, and includes three inputs x, i, and j [3900];The slope array storage unit and the intercept array storage unit store the straight slope (i.e., K) [3904])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Duong to incorporate the teachings of Chen for the benefit of performing a coarse-grained pruning operation on the weights of the neural network to obtain the pruned weights [0613] so that the subsequent storage and access to values and the subsequent operation amount may be reduced, which may improve operating efficiency and reduce power consumption. (Chen [2937])

	Regarding claim 14, Modified Duong teaches the device of claim 11, Duong teaches wherein the encoded weights include an index weight word (In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block, col 10, lines 40-44) and 
	one or more non-pruned weight words, (This decoder, in some embodiments, (i) aligns the stored additional weight data correctly in the filter slice buffer for the non-zero weights and (ii) fills in additional weight data for the zero-value weights. The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 33-44) 
	the decompression logic circuit including a weight word decoder, the weight word decoder including: (The filter slice buffer controller 3100 includes a weight decoder circuit that decodes the weight data (i.e., expands the weight data to include weight information (indicating whether the weight is positive/negative/zero), col 60, lines 37-41 ;This weight decoder circuitry 3200 assumes that each filter slice buffer holds weight data for 40 inputs (i.e., the adder trees include 40 input multiplexers), ... The weight decoder circuitry 3200 receives the weight data stored in the weight memory and decodes this data to be formatted for storage in one of the filter slice buffers (the filter slice identifier is also sent to the filter slice buffers so that the decoded data is routed to the correct filter slice buffer), col 60-61, lines 61-66 and 1-6)
	a plurality of shifters; (two programmable shift registers 2315 and 2320, Fig 23, col 49, lines 40-41; In some embodiments, the truncator 1235 receives (as output from the right bit shifter 1230) more bits than it outputs, col 35,lines 10-12; The bit shifters 4135 in the post-processing units are configured to shift this incoming second dot product left by 4 bits in this second clock cycle, col 72, lines 20-22)
	a plurality of AND gates coupled to the plurality of shifters; (The AND gate 1305 enables this first dot product input, while the AND gate 1310 gates the second dot product to 0. However, in other situations, the adder 1315, left-shift operator 1320, and adder 1325 enable the dot product calculation for a neural network node to be completed and provided to the other post-processing operations. In addition, the left shift circuit 1320 can also be used to align a dot product to ensure that the binary point is in the correct location for the input value, col 36, lines 5-13) and
	an index word decoder coupled to the plurality of shifters and the plurality of AND gates, (a mask decoder 2310, two programmable shift registers 2315 and 2320, a multiplexer 2325, and an AND gate 2330, col 49, lines 40-43) 
	 the index word decoder configured to control the plurality of shifters and the plurality of AND gates based on bit values of individual bits of the index weight word. (However, the register block also receives an enable signal (from the mask decoder 2310), and when this signal is 0 the clock signal is gated. Thus, the enable bit to the mask decoder 2310 can be used to prevent the entire set of register blocks from shifting their activation values (e.g., while waiting for a memory read operation to complete). The mask signal from the mask decoder 2310 can also be used to prevent some register blocks from shifting their output when the rest of the programmable shift register executes a shift operation, so that unused activation register blocks will not consume the power required to change their values, col 50, lines 57-67; The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 36-44)

	Regarding claim 15, Modified Duong teaches the device of claim 14, Duong teaches wherein each of the plurality of shifters is configured to receive the non-pruned weight words (some embodiments enable dot products to be double the size of the standard quantized output (e.g., 8-bit rather than 4-bit) by using dot products from multiple cycles and bit-shifting the first set of input data, col 4, lines 29-32; Similarly, each primary weight value buffer can hold the specified number of weight values (which is the number of inputs to which the input values are reduced), col 11, lines 45-48; Furthermore, for larger input and output values (e.g., 8-bit input and output values), in which the dot product input processing circuit 1205 left shifts the dot product of the most significant bits of the inputs (e.g., by 4 bits), the bias factor has to add a larger amount for the negative weights. For the 8-bit case (in which the dot product of the weights with the most significant nibble of the inputs is shifted by 4 bits), the bias factor adds 17 for each negative weight), col 34, lines 24-28) and 
	a first control signal from the index word decoder, each shifter being configured to execute a shift operation on the non-pruned weight words according to the first control signal. (The mask signal from the mask decoder 2310 can also be used to prevent some register blocks from shifting their output when the rest of the programmable shift register executes a shift operation, so that unused activation register blocks will not consume the power required to change their values, col 50, lines 63-67; some embodiments enable dot products to be double the size of the standard quantized output (e.g., 8-bit rather than 4-bit) by using dot products from multiple cycles and bit-shifting the first set of input data, col 4, lines 29-32;  weight data specifies which of its inputs to select (i.e., the input value corresponding to the non-zero weight value), col 6, lines 8-11)

	Regarding claim 16, Modified Duong teaches the device of claim 15, Duong teaches wherein each of the AND gates is configured to an output of a respective shifter and a second control signal from the index word decoder, each AND gate being configured to execute an AND operation on the output of a respective shifter and the second control signal. (The AND gate 1305 enables this first dot product input, while the AND gate 1310 gates the second dot product to 0. However, in other situations, the adder 1315, left-shift operator 1320, and adder 1325 enable the dot product calculation for a neural network node to be completed and provided to the other post-processing operations, In addition, the left shift circuit 1320 can also be used to align a dot product to ensure that the binary point is in the correct location for the input value, col 36, lines 5-12)

	Regarding claim 17, Modified Duong teaches the device of claim 11, Duong teaches wherein the encoded weights include an index weight word followed by the non-pruned weight words, (This decoder, in some embodiments, (i) aligns the stored additional weight data correctly in the filter slice buffer for the non-zero weights and (ii) fills in additional weight data for the zero-value weights. The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 33-44. Examiner notes that the weight map is the index weight word)
	the encoded weights not including weight values for the pruned weight words, (FIG. 30 illustrates the format of the encoded weight data 3000 for a single filter slice in some embodiments. As shown in this example, some embodiments divide the weight data for a filter slice into three sections: a slice identifier 3005, a non-zero weight map 3010, and additional weight data 3015 for each of the non-zero weights, col 58 and 59, lines 66-67 and 1-4)
	the index weight word including a plurality of bits, each bit having a first bit value or a second bit value, the first bit value indicating a pruned weight word in the sequence, the second bit value indicating a non-pruned weight word in the sequence (in FIG. 16, the non-zero weight map 3010 includes 36 bits. For a chip fabric with partial dot product computation circuits as shown in FIG. 17 (i.e., having redundant input multiplexers to better ensure that all of the inputs with non-zero weights are mapped to different input multiplexers), the non-zero weight map includes 40 bits. Each bit of the non-zero weight map 3010 specifies whether the corresponding input multiplexer receives an input with a non-zero corresponding weight value (with the bit set to 1 to indicate this case, and with the bit set to 0 to indicate that none of the inputs received by the multiplexer have non-zero corresponding weight values), col 59, lines 30-41)

	Regarding claim 18, Modified Duong teaches the device of claim 17, Chen teaches  teach wherein the pruned weight word is one of a most significant part or a least significant part of a respective encoded weight, (selecting M weights from the weights of the neural network through a sliding window, where M is an integer greater than 1 [0669]; and when the M weights satisfy a preset condition, setting all or part of the M weights to zero to obtain the pruned weights [0670]; A value of weight data indicated by the power weight data is expressed as a power exponential value of the weight data value. The power weight data includes a sign bit and a power bit. The sign bit uses one or more bits to indicate the sign of the weight data [0464]; The correspondence in the encoding table may be that the most significant bit of the power-bit data represents a zero setting bit [2623])
	wherein, when the pruned weight word is the most significant part, a sign of the least significant part is used to fill the most significant part. (when the floating-point number fi is a non-normalized floating-point number, the value of the hidden bit is 0; and k “0”s are added behind a least significant bit of the mantissa bit as significant bits [4128]; obtaining long-bit floating-point data of each layer of an artificial neural network, including weights, biases, and/or input and output values of each layer [1352])
	The same motivation to combine independent claim 11 applies here.

	Regarding claim 19, Modified Duong teaches a method of decoding weights of a neural network, (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55. Examiner notes that filter slice buffers is the weight memory) the method comprising:
	retrieving encoded weights from a weight memory via a processor data bus, (encoded weight data is read from memory and provided in blocks of data (as determined by the memory read controller, including the cache as shown in FIG. 22) to the filter slice buffer controller 3100, col 60, lines 26-29, Fig. 31; )
	the encoded weights including an index weight word; (This decoder, in some embodiments, (i) aligns the stored additional weight data correctly in the filter slice buffer for the non-zero weights and (ii) fills in additional weight data for the zero-value weights. The non-zero weight map indicates the correct alignment of the stored additional weight data, and the bits of the non-zero weight map are also stored alongside this additional data within the filter slice buffer (e.g., such that 5 consecutive bits are stored for each weight). In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block), col 10, lines 33-44. Examiner notes that the weight map is the index weight word)
	decoding the encoded weights using the index weight word (In some embodiments, the decoder also uses the non-zero weight map to determine the location of the next filter slice block within the core memory (because the non-zero weight map specifies the amount of additional weight data for the current filter slice block, col 10, lines 40-44)
	 one or more non-pruned weight words, (The filter slice buffers of some embodiments store a full set of weight data (e.g., a non-zero map bit, a positive/negative bit, and multiplexer select bits) for each of the input multiplexers, irrespective of whether that input multiplexer receives an input with a non-zero corresponding weight or not (this still reduces the amount of weight data, from data for every weight in a filter to data for only approximately one-fourth of the weights in the filter). As such, decoding circuitry is used to expand the limited stored weight data into a full set of weight data for each input multiplexer, col 10, lines 22-31) 
	 the non-pruned weight words including non-zero-value weight words; (The filter slice buffers of some embodiments store a full set of weight data (e.g., a non-zero map bit, a positive/negative bit, and multiplexer select bits) for each of the input multiplexers, irrespective of whether that input multiplexer receives an input with a non-zero corresponding weight or not (this still reduces the amount of weight data, from data for every weight in a filter to data for only approximately one-fourth of the weights in the filter). As such, decoding circuitry is used to expand the limited stored weight data into a full set of weight data for each input multiplexer, col 10, lines 22-31) and
	providing the sequence of the non-pruned weight words to a plurality of input-weight multipliers. (The process 1800 then computes (at 1815) partial dot products in the cores. As described above, the activation values loaded into the activation window buffers in each of the active cores are multiplied by their corresponding weight values loaded into the filter slice buffers of these cores. In some embodiments, the size of the partial dot products is reduced using the wiring structure shown in FIGS. 16 and 17, and with ternary weight values of {0, 1, −1}, the multiplication is handled by the ternary MAC circuits shown in this figure, col 43, lines 3-12)
	Duong does not explicitly teach to obtain a sequence of one or more pruned weight words and the pruned weight words including zero-value weight words to a plurality of input-weight multipliers
	Chen teaches to obtain a sequence of one or more pruned weight words and the pruned weight words including zero-value weight words, providing the sequence of the pruned weight word to a plurality of input-weight multipliers (selecting M weights from the weights of the neural network through a sliding window, where M is an integer greater than 1 [0669]; and when the M weights satisfy a preset condition, setting all or part of the M weights to zero to obtain the pruned weights [0670]; The multiplier 7 is configured to calculate k*x. [3891]; the domain conversion component 10 is a domain conversion component, and includes three inputs x, i, and j [3900];The slope array storage unit and the intercept array storage unit store the straight slope (i.e., K) [3904])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Duong to incorporate the teachings of Chen for the benefit of performing a coarse-grained pruning operation on the weights of the neural network to obtain the pruned weights [0613] so that the subsequent storage and access to values and the subsequent operation amount may be reduced, which may improve operating efficiency and reduce power consumption. (Chen [2937])

	Regarding claim 20, Modified Duong teaches the method of claim 19, Modified Duong does not explicitly teach wherein the index weight word includes a plurality of bits, each bit having a first bit value or a second bit value, the first bit value indicating a pruned weight word in the sequence, the second bit value indicating a non-pruned weight word in the sequence, (in FIG. 16, the non-zero weight map 3010 includes 36 bits. For a chip fabric with partial dot product computation circuits as shown in FIG. 17 (i.e., having redundant input multiplexers to better ensure that all of the inputs with non-zero weights are mapped to different input multiplexers), the non-zero weight map includes 40 bits. Each bit of the non-zero weight map 3010 specifies whether the corresponding input multiplexer receives an input with a non-zero corresponding weight value (with the bit set to 1 to indicate this case, and with the bit set to 0 to indicate that none of the inputs received by the multiplexer have non-zero corresponding weight values), col 59, lines 30-41)

	Regarding claim 21, Modified Duong teaches the method of claim 19, Duong teaches wherein the decoding includes controlling a plurality of shifters and a plurality of digital logic gates based on the index weight word. (The first AND gate in the decoder always receives the first set of additional weight data (i.e., the first 4 bits), and this is either gated off or passed through (depending on the value of the first non-zero map bit). Each subsequent AND gate is prefaced by a multiplexer that selects between two or more sets of encoded additional weight data based on the previous non-zero map bits. Specifically, the second AND gate can receive either the first or second (indices 0 or 1) set encoded additional weight data, the third AND gate can receive any of the first, second, or third (indices 0-2) sets of encoded additional weight data, and so on up to the twentieth AND gate that can receive any of the twenty sets of encoded additional weight data, col 61, lines 18-30)
	
4.	Claims 9, 10, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Duong et al. (US11210586 filed 06/28/2019) in view of Chen et al (US20210182077 filed 09/13/2018) and further in view of Chen et al (US20190115933 hereinafter ‘Chen2019’)

	Regarding claim 9, Modified Duong teaches the neural network circuit of claim 1, Duong teaches wherein the decompression logic circuit includes: (The filter slice buffer controller 3100 includes a weight decoder circuit that decodes the weight data (i.e., expands the weight data to include weight information (indicating whether the weight is positive/negative/zero), col 60, lines 37-41)
	a first weight decoder configured to decode a first portion of the encoded weights; (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55)  
	Modified Duong does not explicitly teach a second weight decoder configured to decode a second portion of the encoded weights; and a control logic circuit configured to control the first weight decoder and the second weight decoder.
	Chen2019 teaches a second weight decoder configured to decode a second portion of the encoded weights; (the address decoder 424 sequentially calculates two weight offsets (i.e., 0x2 and 0x6) of two NZ elements (wnz[1] and wnz[3]) and respectively adds the two weight offset and the base-address b-addr2 in SRAM 422 to output two weight addresses (i.e., 0x2+b-addr2 and 0x6 b-addr2) [0034]) and 
	a control logic circuit configured to control the first weight decoder and the second weight decoder. (the address decoders 423 and 424 are configured to be enabled/disabled in response to the control signals CS1/CS2. At first, the compressor 411 and the parser 412 sends two bitmap headers 31 of node-1 for the hierarchical NZP-x 70 and the hierarchical NZP-w 70 to the AND gate array 425 (having 16 AND gates), and send the control signals CS1 and CS2 with a first voltage state to disable the address decoders 423 and 424, Fig. 4A[0045])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Duong to incorporate the teachings of Chen2019 for the benefit of  reducing the memory size for the weight and the data matrices and the computational overhead, thereby allowing more neurons per unit area on the integrated circuit and processing at high speed and with low power consumption (Chen2019 [0006])

	Regarding claim 10, Modified Duong teaches the neural network circuit of claim 9, Duong teaches indicating a number of available weight words stored in a weight word buffer, (This weight decoder circuitry 3200 assumes that each filter slice buffer holds weight data for 40 inputs, col 60, lines 61-63)
	the first weight decoder configured to determine a number of bits having a first bit value in the index weight word, (As shown, in a particular cycle the decoder circuitry 3200 receives 20 non-zero map bits (i.e., the first 20 bits in a first cycle and the latter 20 bits in the next cycle) as well as up to 20 sets of additional weight data, col 61, lines 7-10; For each 4-bit AND gate output, the decoder circuitry 3200 includes the corresponding non-zero weight map bit so that all 5 bits of weight data for each input multiplexer and ternary MAC are provided together to the filter slice buffer, col 61, lines 43-46) and 
	decode the first portion of the encoded weights in response to the number of available weight words stored in the weight word buffer being equal to or greater than the number of bits having the first bit value. (Each set of additional weight data is 4 bits wide, and the decoded output includes 20 sets of 5-bit wide weight data (e.g., a non-zero map bit combined with 4 bits of additional data). The non-zero map bit is used to gate this data such that if the non-zero map bit corresponding to a particular input multiplexer is zero, then the data sent to the filter slice buffer for this input is all zeros (zeroing out this data saves power), col 61, lines 10-17; The first AND gate in the decoder always receives the first set of additional weight data (i.e., the first 4 bits), and this is either gated off or passed through (depending on the value of the first non-zero map bit), col 61, lines 18-21; For each 4-bit AND gate output, the decoder circuitry 3200 includes the corresponding non-zero weight map bit so that all 5 bits of weight data for each input multiplexer and ternary MAC are provided together to the filter slice buffer, col 61, lines 47-53)
	Chen2019 teaches wherein the first weight decoder is configured to receive a signal, from the control logic circuit (it indicates this is a node and the compressor 411 and the parser 412 send the control signals CS1 and CS2 with the first voltage state to disable the address decoders 423 and 424 [0045])
	The same motivation to combine dependent claim 9 applies here.

	Regarding claim 12, Modified Duong teaches the device of claim 11, Duong teaches wherein the decompression logic circuit includes: a weight word buffer configured to temporarily store the encoded weights, the encoded weights (The filter slice buffer controller 3100 includes a weight decoder circuit that decodes the weight data (i.e., expands the weight data to include weight information (indicating whether the weight is positive/negative/zero), col 60, lines 37-41)
 	 including a first index weight word and a second index weight word; (Thus, for a chip fabric with partial dot product computation circuits as shown in FIG. 16, the non-zero weight map 3010 includes 36 bits. For a chip fabric with partial dot product computation circuits as shown in FIG. 17 (i.e., having redundant input multiplexers to better ensure that all of the inputs with non-zero weights are mapped to different input multiplexers), the non-zero weight map includes 40 bits. Each bit of the non-zero weight map 3010 specifies whether the corresponding input multiplexer receives an input with a non-zero corresponding weight value (with the bit set to 1 to indicate this case, and with the bit set to 0 to indicate that none of the inputs received by the multiplexer have non-zero corresponding weight values, col 59, lines 29-41. Examiner notes that first index weight is non-zero weight map 3010 includes 36 bits and second index weight word the non-zero weight map includes 40 bits)
	a first weight decoder configured to receive the encoded weights from the weight word buffer and generate a first group of decoded weight words (Weight decoding circuitry in the core reads this weight data from memory, decodes the weight data, and stores the decoded weight data in the filter slice buffers, col 9, 52-55. Examiner notes that weight decoding circuitry is the first weight decoder) 
	using the first index weight word; (Thus, for a chip fabric with partial dot product computation circuits as shown in FIG. 16, the non-zero weight map 3010 includes 36 bits, col 59, lines 29-31) and
	Modified Duong does not explicitly teach a second weight decoder configured to receive a portion of the encoded weights from the first weight decoder and generate a second group of decoded weight words using the second index weight word.
	Chen2019 teaches a second weight decoder configured to receive a portion of the encoded weights from the first weight decoder and generate a second group of decoded weight words using the second index weight word (Prior to element-by-element multiplications of the two vectors X and W, the AND gate array 425 performs a bitwise logical AND operation between the bitmap headers 31 of the NZP-w 30 and the NZP-x 30 in parallel to generate the output bitmap (i.e., o-bm) with two non-zero bits (i.e., bit 5 and bit 10). Next, according to o-bm and the bitmap header 31 of the NZP-x 30, the address decoder 423 sequentially calculates two data offsets (i.e., 0x2 and 0x4) for two NZ elements (xnz[1] and xnz[2]) and respectively adds the two data offsets and the base-address b-addr1 in SRAM 421 to output two data addresses (i.e., 0x2+b-addr1 and 0x4 b-addr1); according to o-bm and the bitmap header 31 of the NZP-w 30, the address decoder 424 sequentially calculates two weight offsets (i.e., 0x2 and 0x6) of two NZ elements (wnz[1] and wnz[3]) and respectively adds the two weight offset and the base-address b-addr2 in SRAM 422 to output two weight addresses (i.e., 0x2+b-addr2 and 0x6 b-addr2). The SRAM 421 sequentially outputs their corresponding NZ elements (i.e., synapse values) to the MAC 450 based on the two data addresses (0x2+b-addr1 and 0x4+b-addr1) while the SRAM 422 sequentially outputs their corresponding NZ elements (i.e., weight values) to the MAC 450 based on the two weight addresses (0x2+b-addr2 and 0x6+b-addr2). According to the outputs of the SRAMs 421 and 422, the MAC 450 sequentially generates a product of the NZ weight value wnz[1] and the NZ synapse value xnz[1] for the first non-zero bit in o-bm and a product of the NZ weight value wnz[3] and the NZ synapse value xnz[2] for the second non-zero bit in o-bm, and sequentially adds the products to the accumulator 453 to produce an accumulate value, i.e., y=xnz[1]*wnz[1]+xnz[2]*wnz[3]. [0034])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Duong to incorporate the teachings of Chen2019 for the benefit of  reducing the memory size for the weight and the data matrices and the computational overhead, thereby allowing more neurons per unit area on the integrated circuit and processing at high speed and with low power consumption (Chen2019 [0006])

	Regarding claim 13, Modified Duong teaches the device of claim 12, further comprising: Duong teaches a plurality of weight registers coupled to the shifter, each weight register of the plurality of weight registers being configured to receive a separate decoded weight word from the shifter. (The filter slice buffer controller 3100 provides this decoded weight data, as well as the filter slice index and write enable data to the filter slice buffers 3120, which store the decoded weight data (e.g., in a set of registers), col 60, lines 47-50; The bit shifters 4135 in the post-processing units are configured to shift this incoming second dot product left by 4 bits in this second clock cycle. In addition, the stored first dot products are released from the registers 4145 and passed to the adder 4140, which combines the dot products from the first and second clock cycles, col 72, lines 20-25; In the latter case, the initial dot product would be bit shifted 4 bits to the left rather than this bit shift being applied to the latter dot product, and the bit shifted dot product stored in the register, col 72, lines 35-38)
	a shifter configured to receive the first group of decoded weight words from the first weight decoder (the initial dot product would be bit shifted 4 bits to the left rather than this bit shift being applied to the latter dot product, and the bit shifted dot product stored in the register, col 72, lines 35-38; The mode decoder 2305 receives configuration data specifying one of several modes for the shift registers, and outputs a one-hot decoded mode input for the shift registers 2315 and 2320, col 49, lines 45-48) and
	Modified Duong does not explicitly teach a control logic configured to control the first weight decoder and the second weight decoder; the second group of decoded weight words from the second weight decoder; 
	Chen2019 teaches a control logic configured to control the first weight decoder and the second weight decoder; (it indicates this is a node and the compressor 411 and the parser 412 send the control signals CS1 and CS2 with the first voltage state to disable the address decoders 423 and 424 [0045])
	 the second group of decoded weight words from the second weight decoder; (the address decoder 424 sequentially calculates two weight offsets (i.e., 0x2 and 0x6) of two NZ elements (wnz[1] and wnz[3]) and respectively adds the two weight offset and the base-address b-addr2 in SRAM 422 to output two weight addresses (i.e., 0x2+b-addr2 and 0x6 b-addr2).[0034]) and
	The same motivation to combine dependent claim 12 applies here.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121                                    

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121